Re: Dumb question involving to_tsvector and a view
От | Kevin Grittner |
---|---|
Тема | Re: Dumb question involving to_tsvector and a view |
Дата | |
Msg-id | 1361725014.4245.YahooMailNeo@web162905.mail.bf1.yahoo.com обсуждение исходный текст |
Ответ на | Re: Dumb question involving to_tsvector and a view (Jasen Betts <jasen@xnet.co.nz>) |
Список | pgsql-general |
Jasen Betts <jasen@xnet.co.nz> wrote:=0A>On 2013-02-23, Raymond C. Rodgers = <sinful622@gmail.com> wrote:=0A=0A>> On 02/23/2013 05:26 AM, Tom Lane wrote= :=0A>>> A "virtual" tsvector like that is probably going to be useless for= =0A>>> searching as soon as you get a meaningful amount of data, because th= e=0A>>> only way the DB can implement a search is to compute the tsvector= =0A>>> value for each table row and then examine it for the target word(s).= =0A>>> What you want is a GIST or GIN index on the contents of the tsvector= .=0A>=0A>> I think the only real advantage to using something like this wou= ld be a=0A>> space savings in terms of storing the tsvector data, but I don= 't see=0A>> that being a significant enough reason to go ahead and use this= idea in=0A>> a production situation. As mentioned [by pretty much all of u= s], once=0A>> the table size is sufficiently large there would be a perform= ance=0A>> penalty by to_tsvector being executed on every record in the tabl= e.=0A>=0A>Unless the plan comes out as a table scan the index will be used= =0A>instead ot to_tsvector()=0A>=0A>When there is a table scan to_tsvector = will be used instead of reading=0A>from disk, I don't know how fast to_tsve= ctor is compared to disk, but=0A>usually computing a result is faster than = reading it from disk.=0A>=0A>Storing the tsvector in the table is likely to= be faster only when a=0A>tablescan is done and the table is fully cached i= n ram.=0A=0AI guess I was being dumb in assuming that it was obvious that a= GIN=0Aor GiST index would be needed for decent performance at scale. =0AWi= thout that, a scan of the whole table (or at least all rows=0Amatching othe= r search criteria) is needed, which is going to hurt. =0AThe benchmarks I m= entioned were for a GIN index on the results of=0Athe function which genera= ted the tsvector, versus a GIN index on=0Athe stored tsvector.=A0 In our ca= se, a typical scan for document text=0Aagainst years of accumulated court d= ocuments was about 300 ms=0Aversus about 1.5 seconds.=A0 It may matter that= we weren't just=0Alooking for matches, but the top K matches based on the = ranking=0Afunction.=0A=0A-- =0AKevin Grittner=0AEnterpriseDB: http://www.en= terprisedb.com=0AThe Enterprise PostgreSQL Company
В списке pgsql-general по дате отправления: