Re: Merging statistics from children instead of re-sampling everything
От | Tomas Vondra |
---|---|
Тема | Re: Merging statistics from children instead of re-sampling everything |
Дата | |
Msg-id | 4e86ae74-4e2c-b40f-4405-035d2f818e5d@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Merging statistics from children instead of re-sampling everything (Andrey Lepikhov <a.lepikhov@postgrespro.ru>) |
Ответы |
Re: Merging statistics from children instead of re-sampling everything
|
Список | pgsql-hackers |
On 2/10/22 12:50, Andrey Lepikhov wrote: > On 21/1/2022 01:25, Tomas Vondra wrote: >> But I don't have a very good idea what to do about statistics that we >> can't really merge. For some types of statistics it's rather tricky to >> reasonably merge the results - ndistinct is a simple example, although >> we could work around that by building and merging hyperloglog counters. > > I think, as a first step on this way we can reduce a number of pulled > tuples. We don't really needed to pull all tuples from a remote server. > To construct a reservoir, we can pull only a tuple sample. Reservoir > method needs only a few arguments to return a sample like you read > tuples locally. Also, to get such parts of samples asynchronously, we > can get size of each partition on a preliminary step of analysis. > In my opinion, even this solution can reduce heaviness of a problem > drastically. > Oh, wow! I haven't realized we're fetching all the rows from foreign (postgres_fdw) partitions. For local partitions we already do that, because that uses the usual acquire function, with a reservoir proportional to partition size. I have assumed we use tablesample to fetch just a small fraction of rows from FDW partitions, and I agree doing that would be a pretty huge benefit. I actually tried hacking that together - there's a couple problems with that (e.g. determining what fraction to sample using bernoulli/system), but in principle it seems quite doable. Some minor changes to the FDW API may be necessary, not sure. Not sure about the async execution - that seems way more complicated, and the sampling reduces the total cost, async just parallelizes it. That being said, this thread was not really about foreign partitions, but about re-analyzing inheritance trees in general. And sampling foreign partitions doesn't really solve that - we'll still do the sampling over and over. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: