Re: unexpected result from to_tsvector
От | Shulgin, Oleksandr |
---|---|
Тема | Re: unexpected result from to_tsvector |
Дата | |
Msg-id | CACACo5SMkOU3cYhKHiLcOCkKvkeh9MYqQTbA95apZ38iwPL5qQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: unexpected result from to_tsvector (Artur Zakirov <a.zakirov@postgrespro.ru>) |
Ответы |
Re: unexpected result from to_tsvector
(Artur Zakirov <a.zakirov@postgrespro.ru>)
Re: unexpected result from to_tsvector (Dmitrii Golub <dmitrii.golub@gmail.com>) |
Список | pgsql-hackers |
On Mon, Mar 7, 2016 at 10:46 PM, Artur Zakirov <a.zakirov@postgrespro.ru> wrote:
=# select ts_debug('simple', '1abc_yyy.zzz');
Hello,
On 07.03.2016 23:55, Dmitrii Golub wrote:
Hello,
Should we added tests for this case?
I think we should. I have added tests for teodor@123-stack.net and 123@stack.net emails.
123_reg.ro <http://123_reg.ro> is not valid domain name, bacause of
symbol "_"
https://tools.ietf.org/html/rfc1035 page 8.
Dmitrii Golub
Thank you for the information. Fixed.
Hm... now that doesn't look all that consistent to me (after applying the patch):
=# select ts_debug('simple', 'aaa@123-yyy.zzz');
ts_debug
---------------------------------------------------------------------------
(email,"Email address",aaa@123-yyy.zzz,{simple},simple,{aaa@123-yyy.zzz})
(1 row)
But:
=# select ts_debug('simple', 'aaa@123_yyy.zzz');
ts_debug
---------------------------------------------------------
(asciiword,"Word, all ASCII",aaa,{simple},simple,{aaa})
(blank,"Space symbols",@,{},,)
(uint,"Unsigned integer",123,{simple},simple,{123})
(blank,"Space symbols",_,{},,)
(host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
(5 rows)
One can also see that if we only keep the domain name, the result is similar:
=# select ts_debug('simple', '123-yyy.zzz');
ts_debug
-------------------------------------------------------
(host,Host,123-yyy.zzz,{simple},simple,{123-yyy.zzz})
(1 row)
=# select ts_debug('simple', '123_yyy.zzz');
ts_debug
-----------------------------------------------------
(uint,"Unsigned integer",123,{simple},simple,{123})
(blank,"Space symbols",_,{},,)
(host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
(3 rows)
But, this only has to do with 123 being recognized as a number, not with the underscore:
=# select ts_debug('simple', 'abc_yyy.zzz');
ts_debug
-------------------------------------------------------
(host,Host,abc_yyy.zzz,{simple},simple,{abc_yyy.zzz})
(1 row)
=# select ts_debug('simple', '1abc_yyy.zzz');
ts_debug
-------------------------------------------------------
(host,Host,1abc_yyy.zzz,{simple},simple,{1abc_yyy.zzz})
(1 row)
In fact, the 123-yyy.zzz domain is not valid either according to the RFC (subdomain can't start with a digit), but since we already allow it, should we not allow 123_yyy.zzz to be recognized as a Host? Then why not recognize aaa@123_yyy.zzz as an email address?
Another option is to prohibit underscore in recognized host names, but this has more breakage potential IMO.
--
Alex
В списке pgsql-hackers по дате отправления:
Следующее
От: David SteeleДата:
Сообщение: Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check