Re: Text-indexing UTF-8 bytea, convert_from() immutability, null bytes...

Поиск

Список

Период

Сортировка

От	Phil Endecott
Тема	Re: Text-indexing UTF-8 bytea, convert_from() immutability, null bytes...
Дата	7 октября 2018 г. 15:45:07
Msg-id	1538916307632@dmwebmail.dmwebmail.chezphil.org обсуждение исходный текст
Ответ на	Re: Text-indexing UTF-8 bytea, convert_from() immutability, null bytes... (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Список	pgsql-general

Дерево обсуждения

Hi Andrew,

Thanks for your great reply.

Andrew Gierth wrote:
>>>>>> "Phil" == Phil Endecott <spam_from_pgsql_lists@chezphil.org> writes:
>  Phil> As a hack I tried ALTER FUNCTION to make it immutable,
>
> A better approach is to wrap it in a function of your own which is
> declared immutable, rather than hacking the catalogs:
>
> create function from_utf8(bytea) returns text language plpgsql immutable
>   as $$ begin return convert_from($1, 'UTF8'); end; $$;

Thanks.  I'm a bit surprised that it will allow me to declare a
function immutable if it calls functions that aren't themselves
immutable, but if it works... great.

>  Phil> Anyway: given the problem of creating a text search index over
>  Phil> bytea data that contains UTF-8 text, which may include oddities
>  Phil> like null bytes, what would you do?
>
> You can search for 0x00 in a bytea using position() or LIKE. What do you
> want to do with values that contain null bytes? or values which you
> think are supposed to be valid utf8 text but are not?

As long as it doesn't crash I don't really care; it would be better
if the text search worked for the valid parts of the text but even
that isn't essential.

I think I will probably need to do some more preprocessing on the
data when I load it, at least to remove the null bytes.

Regards, Phil.

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Text-indexing UTF-8 bytea, convert_from() immutability, null bytes...