Re: Text-indexing UTF-8 bytea, convert_from() immutability, null bytes...
От | Phil Endecott |
---|---|
Тема | Re: Text-indexing UTF-8 bytea, convert_from() immutability, null bytes... |
Дата | |
Msg-id | 1538916307632@dmwebmail.dmwebmail.chezphil.org обсуждение исходный текст |
Ответ на | Re: Text-indexing UTF-8 bytea, convert_from() immutability, null bytes... (Andrew Gierth <andrew@tao11.riddles.org.uk>) |
Список | pgsql-general |
Hi Andrew, Thanks for your great reply. Andrew Gierth wrote: >>>>>> "Phil" == Phil Endecott <spam_from_pgsql_lists@chezphil.org> writes: > Phil> As a hack I tried ALTER FUNCTION to make it immutable, > > A better approach is to wrap it in a function of your own which is > declared immutable, rather than hacking the catalogs: > > create function from_utf8(bytea) returns text language plpgsql immutable > as $$ begin return convert_from($1, 'UTF8'); end; $$; Thanks. I'm a bit surprised that it will allow me to declare a function immutable if it calls functions that aren't themselves immutable, but if it works... great. > Phil> Anyway: given the problem of creating a text search index over > Phil> bytea data that contains UTF-8 text, which may include oddities > Phil> like null bytes, what would you do? > > You can search for 0x00 in a bytea using position() or LIKE. What do you > want to do with values that contain null bytes? or values which you > think are supposed to be valid utf8 text but are not? As long as it doesn't crash I don't really care; it would be better if the text search worked for the valid parts of the text but even that isn't essential. I think I will probably need to do some more preprocessing on the data when I load it, at least to remove the null bytes. Regards, Phil.
В списке pgsql-general по дате отправления: