Re: Accent insensitive search?
От | Stuart Bishop |
---|---|
Тема | Re: Accent insensitive search? |
Дата | |
Msg-id | 6bc73d4c0903300202r7495c46eo97697842d2221f1d@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Accent insensitive search? (Jasen Betts <jasen@xnet.co.nz>) |
Список | pgsql-general |
On Tue, Mar 24, 2009 at 4:53 PM, Jasen Betts <jasen@xnet.co.nz> wrote: > On 2009-03-18, cifroes <cifroes@netcabo.pt> wrote: >> This is a multi-part message in MIME format. >> >> ------_=_NextPart_001_01C9A7E6.B32BBA87 >> Content-Type: text/plain; >> charset="iso-8859-1" >> Content-Transfer-Encoding: quoted-printable >> >> Hi, >> >> I have a DB in utf-8 and postgres 8.3.x.=20 >> >> How can I do an accent insensitive search (like ...) ? > > use a posix regular expression that matches the string you want: > > select * from foo where colname ~ '^[CcĆćĈĉĊċČč][ĀāĂ㥹Aa][Pp][ŌōŎŏŐő]$'; > > you could write a function to do the translation. Unicode normalization is an interesting way to strip accents I've found: >>> import unicodedata >>> >>> s = u'Björn' >>> s = unicodedata.normalize("NFKD", s) >>> s = ''.join(c for c in s if ord(c) < 127) >>> print s Bjorn You can also use the character names to map many more characters to the ascii equivalent. A large number of these can me smashed into ASCII using regular expressions and some manual mappings to map LETTER THORN -> th, LETTER LATERAL CLICK -> X etc. Just mapping CAPITAL LETTER XX -> XX and SMALL LETTER XX -> xx seems to get you most of europe if you special case SHARP S -> ss and THORN -> th. >>> s = u'ァ' >>> print unicodedata.name(s) KATAKANA LETTER SMALL A -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/
В списке pgsql-general по дате отправления: