Re: case insensitive match in unicode
От | Martijn van Oosterhout |
---|---|
Тема | Re: case insensitive match in unicode |
Дата | |
Msg-id | 20060327114037.GD30791@svana.org обсуждение исходный текст |
Ответ на | Re: case insensitive match in unicode (SunWuKung <Balazs.Klein@axelero.hu>) |
Список | pgsql-general |
On Mon, Mar 27, 2006 at 12:45:05PM +0200, SunWuKung wrote: > This sounds like a very interesting concept. > It wouldn't be 'case insensitive' just insensitive. > > The way I imagine it now is a special case of the ~ function. > I create matchgroups in a table and check each character if it is in the > group. If it is I will replace the character with the group in [éÉE], > [oóOÓ??] and do a regexp with that. No need to reinvent the wheel. ICU provides a range of services to deal with this. For example the following filter in ICU: NFD; [:Nonspacing Mark:] Remove; NFC. Will remove all accents from characters. And it works for all Unicode characters. With a bit more thinking you can work with case variations also. There is also a locale-independant case-mapping module there plus various locale specific ones also. http://icu.sourceforge.net/userguide/Transform.html http://icu.sourceforge.net/userguide/caseMappings.html http://icu.sourceforge.net/userguide/normalization.html Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Вложения
В списке pgsql-general по дате отправления: