Re: Improved ICU patch - WAS: Implementing full UTF-8 support (aka supporting 0x00)
От | Palle Girgensohn |
---|---|
Тема | Re: Improved ICU patch - WAS: Implementing full UTF-8 support (aka supporting 0x00) |
Дата | |
Msg-id | 7D9265F9-2E7B-4136-BCA1-AAE8481561B7@pingpong.net обсуждение исходный текст |
Ответ на | Re: Improved ICU patch - WAS: Implementing full UTF-8 support (aka supporting 0x00) (Peter Geoghegan <pg@heroku.com>) |
Ответы |
Re: Improved ICU patch - WAS: Implementing full UTF-8 support (aka supporting 0x00)
|
Список | pgsql-hackers |
> 11 aug. 2016 kl. 03:05 skrev Peter Geoghegan <pg@heroku.com>: > > On Wed, Aug 10, 2016 at 1:42 PM, Palle Girgensohn <girgen@pingpong.net> wrote: >> They've been used for the FreeBSD ports since 2005, and have served us well. I have of course updated them regularly.In this latest version, I've removed support for other encodings beside UTF-8, mostly since I don't know how totest them, but also, I see little point in supporting anything else using ICU. > > Looks like you're not using the ICU equivalent of strxfrm(). While 9.5 > is not the release that introduced its use, it did expand it > significantly. I think you need to fix this, even though it isn't > actually used to sort text at present, since presumably FreeBSD builds > of 9.5 don't TRUST_STRXFRM. Since you're using ICU, though, you could > reasonably trust the ICU equivalent of strxfrm(), so that's a missed > opportunity. (See the wiki page on the abbreviated keys issue [1] if > you don't know what I'm talking about.) My plan was to get it working without TRUST_STRXFRM first, and then add that functinality. I've made some preliminary testsusing ICU:s ucol_getSortKey but I will have to test it a bit more. For now, I just expect not to trust strxfrm. It isthe first iteration wrt strxfrm, the plan is to use that code base. > > Shouldn't you really have a strxfrm() wrapper, used across the board, > including for callers outside of varlena.c? convert_string_datum() has > been calling strxfrm() for many releases now. These calls are still > used in FreeBSD builds, I would think, which seems like a bug that is > not dodged by simply not defining TRUST_STRXFRM. Isn't its assumption > that that matching the ordering used elsewhere not really hold on > FreeBSD builds? I was not aware of convert_string_datum, I will check that, thanks! Using a wrapper across the board seems like a good ideafor refactoring. > > [1] https://wiki.postgresql.org/wiki/Abbreviated_keys_glibc_issue > -- > Peter Geoghegan
В списке pgsql-hackers по дате отправления: