Re: sortsupport for text
От | Martijn van Oosterhout |
---|---|
Тема | Re: sortsupport for text |
Дата | |
Msg-id | 20120319212304.GA15141@svana.org обсуждение исходный текст |
Ответ на | Re: sortsupport for text (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: sortsupport for text
|
Список | pgsql-hackers |
On Mon, Mar 19, 2012 at 12:19:53PM -0400, Robert Haas wrote: > On Sat, Mar 17, 2012 at 6:58 PM, Greg Stark <stark@mit.edu> wrote: > > On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas <robertmhaas@gmail.com> wrote: > >> 12789 28.2686 libc-2.13.so strcoll_l > >> 6802 15.0350 postgres text_cmp > > > > I'm still curious how it would compare to call strxfrm and sort the > > resulting binary blobs. I don't think the sortsupport stuff actually > > makes this any easier though. Since using it requires storing the > > binary blob somewhere I think the support would have to be baked into > > tuplesort (or hacked into the sortkey as an expr that was evaluated > > earlier somehow). > > Well, the real problem here is that the strxfrm'd representations > aren't just bigger - they are huge. On MacBook Pro, if the input > representation is n characters, the strxfrm'd representation is 9x+3 > characters. Ouch. I was holding out hope that you could get a meaningful improvement if we could use the first X bytes of the strxfrm output so you only need to do a strcoll on strings that actually nearly match. But with an information density of 9 bytes for one 1 character it doesn't seem worthwhile. That and this gem in the strxfrm manpage: RETURN VALUE The strxfrm() function returns the number of bytes required to store the transformed string in destexcluding the terminating '\0' character. If the value returned is n or more, the contents of dest are indeterminate. Which means that you have to take the entire transformed string, you can't just ask for the first bit. I think that kind of leaves the whole idea dead in the water. Just for interest I looked at the ICU API for this and they have the same restriction. There is another function which you can use to return partial sort keys (ucol_nextSortKeyPart) but it produces "uncompressed sortkeys", which it seems is what Mac OSX is doing, which seems useless for our purposes. Either this is a hard problem or we're nowhere near a target use case. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer
В списке pgsql-hackers по дате отправления: