Re: Unicode normalization
От | Andreas Kalsch |
---|---|
Тема | Re: Unicode normalization |
Дата | |
Msg-id | 4AB13DE6.3040800@gmx.de обсуждение исходный текст |
Ответ на | Re: Unicode normalization (David Fetter <david@fetter.org>) |
Ответы |
Re: Unicode normalization
Re: Unicode normalization Re: Unicode normalization |
Список | pgsql-general |
No, I need a solution which is as generic as possible. I use UTF-8 encoded unicode strings on all levels. This is what I have done so far: 1) Writing a separate Python command line script for testing - works as expected: #!/usr/bin/python import sys import unicodedata str = sys.argv[1].decode('UTF-8') str = unicodedata.normalize('NFKD', str) str = ''.join(c for c in str if unicodedata.combining(c) == 0) print str 2) Transfering this to PL/Python: CREATE OR REPLACE FUNCTION test (str text) RETURNS text AS $$ import unicodedata return unicodedata.normalize('NFKD', str.decode('UTF-8')) $$ LANGUAGE plpythonu; Problem: plpython throws an error, where my commandline script did it correctly: # select test('aÄÖÜ'); ERROR: plpython: function "test" could not create return value DETAIL: <type 'exceptions.UnicodeEncodeError'>: 'ascii' codec can't encode character u'\u0308' in position 2: ordinal not in range(128) I use PG 8.3 and Python 2.5.2. How can I make plpython behaving like in a normal python environment? In the end it should look like this: CREATE TABLE t ( ... ts ts_vector NOT NULL ); INSERT INTO t (ts) VALUES(to_tsvector(normalize(?))); Andi David Fetter schrieb: > On Wed, Sep 16, 2009 at 07:20:21PM +0200, Andreas Kalsch wrote: > >> Has somebody integrated Unicode normalization into Postgres? if not, I >> would have to implement my own function by using this CPAN module: >> http://search.cpan.org/~sadahiro/Unicode-Normalize-1.03/ . >> >> I need a function which removes all diacritics (1) and transforms some >> characters to a more compatible form (2) to get a better index on >> strings. >> >> Best, >> >> Andi >> >> >> 1) à,ä, ... => a >> 2) ø => o, ƒ => f, ª => a >> > > You mean something like this? > > http://wiki.postgresql.org/wiki/Strip_accents_from_strings%2C_and_output_in_lowercase > > Cheers, > David. >
В списке pgsql-general по дате отправления: