Re: Text search parser's treatment of URLs and emails
От | Thom Brown |
---|---|
Тема | Re: Text search parser's treatment of URLs and emails |
Дата | |
Msg-id | AANLkTinTqNEZG2jwb_R2xB5GSL56Q=VkFOV6U6+qQh1U@mail.gmail.com обсуждение исходный текст |
Ответ на | Text search parser's treatment of URLs and emails (Thom Brown <thom@linux.com>) |
Список | pgsql-general |
On 8 September 2010 21:48, Thom Brown <thom@linux.com> wrote: > Hi, > > I noticed that if I run this: > > SELECT alias, description, token FROM > ts_debug('http://www.postgresql.org:2345/directory/page.html?version=9.1&build=alpha1#summary'); > > I get: > > alias | description | token > ----------+---------------+----------------------------------------------------------------- > protocol | Protocol head | http:// > url | URL | > www.postgresql.org:2345/directory/page.html?version=9.1&build=alpha1#summary > host | Host | www.postgresql.org:2345 > url_path | URL path | > /directory/page.html?version=9.1&build=alpha1#summary > (4 rows) > > > It could be me being picky, but I don't regard parameters or page > fragments as part of the URL path. Ideally, I'd sort of expect: > > alias | description | token > --------------+---------------+----------------------------------------------------------------- > protocol | Protocol head | http:// > url | URL | > www.postgresql.org:2345/directory/page.html?version=9.1&build=alpha1#summary > host | Host | www.postgresql.org > port | Port | 2345 > url_path | URL path | /directory/page.html > query_string | Query string | version=9.1&build=alpha1 > fragment | Page fragment | summary > (7 rows) > > ... of course that's if there was support for query strings and page > fragments, which there isn't. But if changes were made to support my > definition of a URL path, they'd have to be considered breaking > changes. > > But my main gripe is with the name "url_path". > > Also: > > SELECT alias, description, token FROM ts_debug('myname+priority@gmail.com'); > > Yields: > > alias | description | token > -----------+-----------------+-------------------- > asciiword | Word, all ASCII | myname > blank | Space symbols | + > email | Email address | priority@gmail.com > (3 rows) > > The entire string I entered is a valid email address, and isn't > totally uncommon. Shouldn't that take such email address styles be > taken into account? The example above incorrectly identifies the > email address since the real destination address would most likely be > myname@gmail.com. No opinions? -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935
В списке pgsql-general по дате отправления: