Re: Google Summer of Code 2008
От | Jan Urbański |
---|---|
Тема | Re: Google Summer of Code 2008 |
Дата | |
Msg-id | 47D2DFDA.5010302@students.mimuw.edu.pl обсуждение исходный текст |
Ответ на | Re: Google Summer of Code 2008 (Oleg Bartunov <oleg@sai.msu.su>) |
Ответы |
Re: Google Summer of Code 2008
|
Список | pgsql-hackers |
Oleg Bartunov wrote: > Jan, > > the problem is known and well requested. From your promotion it's not > clear what's an idea ? >> Tom Lane wrote: >>> Jan Urbański <j.urbanski@students.mimuw.edu.pl> >>> writes: >>>> 2. Implement better selectivity estimates for FTS. OK, after reading through the some of the code the idea is to write a custom typanalyze function for tsvector columns. It could look inside the tsvectors, compute the most commonly appearing lexemes and store that information in pg_statistics. Then there should be a custom selectivity function for @@ and friends, that would look at the lexemes in pg_statistics, see if the tsquery it got matches some/any of them and return a result based on that. I have a feeling that in many cases identifying the top 50 to 300 lexemes would be enough to talk about text search selectivity with a degree of confidence. At least we wouldn't give overly low estimates for queries looking for very popular words, which I believe is worse than givng an overly high estimate for a obscure query (am I wrong here?). Regards, Jan -- Jan Urbanski GPG key ID: E583D7D2 ouden estin
В списке pgsql-hackers по дате отправления: