Re: OCTET_LENGTH is wrong
От | Hannu Krosing |
---|---|
Тема | Re: OCTET_LENGTH is wrong |
Дата | |
Msg-id | 3BF96E43.20907@tm.ee обсуждение исходный текст |
Ответ на | Re: OCTET_LENGTH is wrong (Peter Eisentraut <peter_e@gmx.net>) |
Список | pgsql-hackers |
Tom Lane wrote: >Barry Lind <barry@xythos.com> writes: > >>While the text datatypes have additional issues with encodings, that is >>not true for the bytea type. I think it does make sense that a client >>be able to get the size in bytes that the bytea type value will return >>to the client. >> > >bytea does that already. It's only text that has (or had, till a few >minutes ago) the funny behavior. > >I'm not set on the notion that octet_length should return on-disk size; >that's clearly not what's contemplated by SQL92, so I'm happy to agree >that if we want that we should add a new function to get it. >("storage_length", maybe.) What's bothering me right now is the >difference between client and server encodings. It seems that the only >plausible use for octet_length is to do memory allocation on the client >side, > Allocating memory seems for me to be drivers (libpq, JDBC, ODBC,...) problem and not something to be done by client code beforehand - at least for libpq (AFAIK) we don't have any means of giving it a pre-allocated storage area for one field. There is enough information in wire protocol for allocating right-sized chunks at the time query result is read. An additional call of "SELECT OCTET_LENGTH(someCol)" seems orders of magnitude slower than doing it at the right time in the driver . >and for that purpose the length ought to be measured in the client >encoding. People seem to be happy with letting octet_length take the >easy way out (measure in the server encoding), and I'm trying to get >someone to explain to me why that's the right behavior. I don't see it. > perhaps we need another function "OCTET_LENGTH(someCol, encoding)" for getting what we want and also client_encoding() and server_encoding() for supplying it some universal defaults ? OTOH, from reading on Unicode I've came to a conlusion that there are often several ways for expressing the same string in Unicode, so for server encoding not unicode and client requesting unicode (say UTF-8) there can be several different ways to express the same string. Thus there is no absolute OCTET_LENGTH for client_encoding for all cases. Thus giving the actual uncompressed length seems most reasonable. For unicode both in backend and frontend we could also make OCTET_LENGTH return not int but an integer-interval of shortest and longest possible encoding ;) ------------------ Hannu
В списке pgsql-hackers по дате отправления: