Re: estimating # of distinct values
От | Tomas Vondra |
---|---|
Тема | Re: estimating # of distinct values |
Дата | |
Msg-id | 4D192D4F.7090902@fuzzy.cz обсуждение исходный текст |
Ответ на | Re: estimating # of distinct values ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
Список | pgsql-hackers |
Dne 28.12.2010 00:04, Kevin Grittner napsal(a): > Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Well, first, those scans occur only once every few hundred million >> transactions, which is not likely a suitable timescale for >> maintaining statistics. > > I was assuming that the pass of the entire table was priming for the > incremental updates described at the start of this thread. I'm not > clear on how often the base needs to be updated for the incremental > updates to keep the numbers "close enough". Well, that really depends on the workload. If you never remove all occurences of a given value (e.g. a ZIP code), you don't need to rescan the table at all. All you need is to build the stats once and then update them incrementally - and I belive this could be handled by autovacuum. >> And second, we keep on having discussions about rejiggering >> the whole tuple-freezing strategy. Even if piggybacking on those >> scans looked useful, it'd be unwise to assume it'll continue to >> work the same way it does now. > > Sure, it might need to trigger its own scan in the face of heavy > deletes anyway, since the original post points out that the > algorithm handles inserts better than deletes, but as long as we > currently have some sequential pass of the data, it seemed sane to > piggyback on it when possible. And maybe we should be considering > things like this when we weigh the pros and cons of rejiggering. > This issue of correlated values comes up pretty often.... Again, there are two types of stats - one of them needs to scan the whole table (estimate of distinct values), the other one does not (multi-dimentional histograms). These two cases are independent - you don't necessarily need both. Better ndistinct estimates would actually solve some of the current issues, it's not just a matter of cross-column stats. regards Tomas
В списке pgsql-hackers по дате отправления: