Re: Term positions in GIN fulltext index
От | Florian Pflug |
---|---|
Тема | Re: Term positions in GIN fulltext index |
Дата | |
Msg-id | 25F8CB23-35D6-481A-8AC6-F8396838D7C8@phlo.org обсуждение исходный текст |
Ответ на | Re: Term positions in GIN fulltext index (Yoann Moreau <yoann.moreau@univ-avignon.fr>) |
Ответы |
Re: Term positions in GIN fulltext index
|
Список | pgsql-hackers |
On Nov4, 2011, at 11:15 , Yoann Moreau wrote: > On 03/11/11 19:19, Florian Pflug wrote: >> Postgres doesn't seem to contain such a function currently (don't believe that, >> though - go and recheck the documentation. I don't know all thousands of built-in >> functions by heart). But it's easy to add one. You could either use PL/pgSQL >> to parse the tsvector's textual representation, or write a C function. If you >> go the PL/pgSQL route, regexp_split_to_table() might come in handy. > > This seems easier to program than what I was thinking about, I'm going to do that. > But I'm wondering about size of database with the GIN index plus the tsvector column, > and performance about parsing the whole tsvectors for each document I need positions > from (as I need them for a very few terms). AFAICS, the internal storage layout of tsvector should allow you to extract an individual lexem's positions quite efficiently (with time complexity log(N) where N is the number of lexems in the tsvector). Doing so will require you to implement your function in C though - any solution that works from a tsvector's textual representation will obviously have time complexity N. best regards, Florian Pflug
В списке pgsql-hackers по дате отправления: