Re: OCTET_LENGTH is wrong
От | Hannu Krosing |
---|---|
Тема | Re: OCTET_LENGTH is wrong |
Дата | |
Msg-id | 3BF75391.9090209@sid.tm.ee обсуждение исходный текст |
Ответ на | OCTET_LENGTH is wrong (Peter Eisentraut <peter_e@gmx.net>) |
Список | pgsql-hackers |
Tom Lane wrote: >Peter Eisentraut <peter_e@gmx.net> writes: > >>... Moreover, it eliminates the standard useful behaviour of >>OCTET_LENGTH, which is to show the length in bytes of a multibyte string. >> > >While I don't necessarily dispute this, I do kinda wonder where you >derive the statement. AFAICS, SQL92 defines OCTET_LENGTH in terms >of BIT_LENGTH: > >6.6 General Rule 5: > > a) Let S be the <string value expression>. If the value of S is > not the null value, then the result is the smallest integer > not less than the quotient of the division (BIT_LENGTH(S)/8). > b) Otherwise, the result is the null value. > >and BIT_LENGTH is defined in the next GR: > > a) Let S be the <string value expression>. If the value of S is > not the null value, then the result is the number of bits in > the value of S. > b) Otherwise, the result is the null value. > >While SQL92 is pretty clear about <bit string>, I'm damned if I can see >anywhere that they define how many bits are in a character string value >So who's to say what representation is to be used to count the bits? >If, say, UTF-16 and UTF-8 are equally reasonable choices, then why >shouldn't a compressed representation be reasonable too? > One objection I have to this, is the fact that nobody uses the compressed representation in client libraries whrereas they do use both UTF-16 and UTF-8. At least UTF-8 is available as client encoding. And probably it is possible that the length of the "possibly compressed" representation can change without the underlying data changing (for example when you set a bit somewhere that disables compression and UPDATE some other field in the tuple) making the result of OCTET_LENGTH dependent on other things than the argument string. I also like the propery of _uncompressed_ OCTET_LENGTH that OCTET_LENGTH(s||s) == 2 * OCTET_LENGTH(s) which is almost never true for compressed length ---------------- Hannu
В списке pgsql-hackers по дате отправления: