Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
От | Jules Bean |
---|---|
Тема | Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan |
Дата | |
Msg-id | 20000824101113.N17510@grommit.office.vi.net обсуждение исходный текст |
Ответ на | Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
On Wed, Aug 23, 2000 at 10:30:30AM -0400, Tom Lane wrote: > Jules Bean <jules@jellybean.co.uk> writes: > > I have in a table a 'category' column which takes a small number of > > (basically fixed) values. Here by 'small', I mean ~1000, while the > > table itself has ~10 000 000 rows. Some categories have many, many > > more rows than others. In particular, there's one category which hits > > over half the rows. Because of this (AIUI) postgresql assumes > > that the query > > select ... from thistable where category='something' > > is best served by a seqscan, even though there is an index on > > category. > > Yes, we know about that one. We have stats about the most common value > in a column, but no information about how the less-common values are > distributed. We definitely need stats about several top values not just > one, because this phenomenon of a badly skewed distribution is pretty > common. ISTM that that might be enough, in fact. If you have stats telling you that the most popular value is 'xyz', and that it constitutes 50% of the rows (i.e. 5 000 000) then you can conclude that, on average, other entries constitute a mere 5 000 000/999 ~~ 5000 entries, and it would be definitely be enough. (That's assuming you store the number of distinct values somewhere). > BTW, if your highly-popular value is actually a dummy value ('UNKNOWN' > or something like that), a fairly effective workaround is to replace the > dummy entries with NULL. The system does account for NULLs separately > from real values, so you'd then get stats based on the most common > non-dummy value. I can't really do that. Even if I could, the distribution is very skewed -- so the next most common makes up a very high proportion of what's left. I forget the figures exactly. Jules
В списке pgsql-hackers по дате отправления: