Re: How to use full-text search URL parser to filter query results bydomain name?
От | Arthur Zakirov |
---|---|
Тема | Re: How to use full-text search URL parser to filter query results bydomain name? |
Дата | |
Msg-id | 941c4c08-23e3-ac5c-9e2e-8af76cadd3e3@postgrespro.ru обсуждение исходный текст |
Ответ на | How to use full-text search URL parser to filter query results bydomain name? (Jess Wren <jess.wren@interference.cc>) |
Ответы |
Re: How to use full-text search URL parser to filter query results bydomain name?
|
Список | pgsql-general |
On 07.04.2019 07:06, Jess Wren wrote: > However, I can't figure out how I would integrate this into the above > query to filter out duplicate domains from the results. And because this > is the docs for "testing and debugging text search > <https://www.postgresql.org/docs/11/textsearch-debugging.html#TEXTSEARCH-PARSER-TESTING>", > I don't know if this use of `ts_parse()` is even related to how the URL > parser is intended to be used in practice. > > How would I use the "host" parser in my query above to return one row > per domain? Also, how would I appropriately index the "links" table for > "host" and "url" token lookup? I think it is normal to use ts_parse(). And I suppose you might use windows functions. For example, you have table links: =# create table links (score int, link text); =# insert into links values (1, 'http://www.foo.com/bar'), (2, 'http://www.foo.com/foo'), (2, 'http://www.bar.com/foo'), (1, 'http://www.bar.com/bar'); You can use the following query: =# with l as ( select score, token, link, rank() over (partition by token order by score) as rank from links, lateral ts_parse('default', link) where tokid = 6) select score, token, link from l where rank = 1; score | token | link -------+-------------+------------------------ 1 | www.bar.com | http://www.bar.com/bar 1 | www.foo.com | http://www.foo.com/bar It is just the idea, probably the query might be simpler. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
В списке pgsql-general по дате отправления: