Re: tsvector pg_stats seems quite a bit off.
От | Jan Urbański |
---|---|
Тема | Re: tsvector pg_stats seems quite a bit off. |
Дата | |
Msg-id | 4BFF7EAB.6040706@wulczer.org обсуждение исходный текст |
Ответ на | Re: tsvector pg_stats seems quite a bit off. (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
On 28/05/10 04:47, Tom Lane wrote: > Jan Urbański <wulczer@wulczer.org> writes: >> On 19/05/10 21:01, Jesper Krogh wrote: >>> In practice, just cranking the statistics estimate up high enough seems >>> to solve the problem, but doesn't >>> there seem to be something wrong in how the statistics are collected? > >> The algorithm to determine most common vals does not do it accurately. >> That would require keeping all lexemes from the analysed tsvectors in >> memory, which would be impractical. If you want to learn more about the >> algorithm being used, try reading >> http://www.vldb.org/conf/2002/S10P03.pdf and corresponding comments in >> ts_typanalyze.c > > I re-scanned that paper and realized that there is indeed something > wrong with the way we are doing it. > So I think we have to fix this. Hm, I'll try to take another look this evening (CEST). Cheers, Jan
В списке pgsql-hackers по дате отправления: