Re: Notes about fixing regexes and UTF-8 (yet again)
От | Robert Haas |
---|---|
Тема | Re: Notes about fixing regexes and UTF-8 (yet again) |
Дата | |
Msg-id | CA+TgmoZEDfak7tSUZw8bGjGBGMTjW2tnxc+P-1sv3Ldaq3V=Hw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Notes about fixing regexes and UTF-8 (yet again) (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Ответы |
Re: Notes about fixing regexes and UTF-8 (yet again)
|
Список | pgsql-hackers |
On Fri, Feb 17, 2012 at 3:48 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Here's a wild idea: keep the class of each codepoint in a hash table. > Initialize it with all codepoints up to 0xFFFF. After that, whenever a > string contains a character that's not in the hash table yet, query the > class of that character, and add it to the hash table. Then recompile the > whole regex and restart the matching engine. > > Recompiling is expensive, but if you cache the results for the session, it > would probably be acceptable. What if you did this ONCE and wrote the results to a file someplace? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: