full-text indexing, locales, triggers, SPI & more fun
От | Charlie Hornberger |
---|---|
Тема | full-text indexing, locales, triggers, SPI & more fun |
Дата | |
Msg-id | 200006010334.WAA11409@SLUTMONKEY.K4AZL.NET обсуждение исходный текст |
Ответы |
Re: full-text indexing, locales, triggers, SPI & more fun
|
Список | pgsql-hackers |
I've been doing some poking at the full-text indexing code in /contrib/fulltextindex to try to get it to work with non-ASCII locales (among other things), but I'm having a bit of trouble trying to figure out how to properly parse non-ASCII strings from inside the fti() trigger function (which is written in C). My problem is this: I want to aggregate text in multiple languages in a single full-text index much like the current structure used by the current fti() function. In order to correctly parse the strings, however, I've got to know what locale they're written in/for (otherwise, isalpha() thinks that characters such as the Hungarian letter u" -- that's a 'u' with a double acute accent -- aren't very alphabetic.) My initial thinking (which could certainly be very wrong) is that the easiest way to get around this would be to allow client apps to set their LC_ALL environment variables, and then to have the new fti() function use that locale while doing string manipulation. But the way I'm doing things, it doesn't appear that the LC_ALL environment variable is available. (Maybe it was never meant to be ... but I'm not a very skilled C programmer, and I don't know the first thing about the SPI interface, so please forgive me if I'm asking why the sun doesn't rise in the west more often ;-)). Here's what's happening: bash# LC_ALL=hu_HUbash# export LC_ALLbash# psql testWelcome to psql, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help on internal slash commands \g or terminate with semicolon to execute query \q to quit test=# INSERT INTO ttxt (t1) values ('FELELÕSSÉGÛ');INSERT 513377 1test=#select * from ttxt_fti; string | id --------+--------felel | 513377 ss | 513377(2 rows) Which isn't quite what I'm looking for ;-). Inside the C source of fti(), I added a call to getenv("LC_ALL") to make sure that LC_ALL really isn't set: locale = getenv("LC_ALL"); elog(NOTICE,"Locale is '%s'\n",locale); And sure enough, it outputs: NOTICE: Locale is '(null)' If, on the other hand, I do: setlocale("LC_ALL","hu_HU") inside fti(), everything works out perfectly: test=# INSERT INTO ttxt (t1) values ('FELELÕSSÉGÛ');INSERT 513410 1test=# select * from ttxt_fti; string | id -------------+--------felelõsségû | 513410(1 row) Any ideas? Cheers, Charlie P.S. I only subscribe to the hackers digest, so please CC me with your replies... Thanks!
В списке pgsql-hackers по дате отправления: