Re: Yet another abort-early plan disaster on 9.3
От | Greg Stark |
---|---|
Тема | Re: Yet another abort-early plan disaster on 9.3 |
Дата | |
Msg-id | CAM-w4HNfZymQmTu3+TxQQD-e6_410-sDnZBaW8neurmTFh4GbA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Yet another abort-early plan disaster on 9.3 (Josh Berkus <josh@agliodbs.com>) |
Ответы |
Re: Yet another abort-early plan disaster on 9.3
|
Список | pgsql-performance |
On Thu, Oct 2, 2014 at 8:56 PM, Josh Berkus <josh@agliodbs.com> wrote: > Yes, it's only intractable if you're wedded to the idea of a tiny, > fixed-size sample. If we're allowed to sample, say, 1% of the table, we > can get a MUCH more accurate n_distinct estimate using multiple > algorithms, of which HLL is one. While n_distinct will still have some > variance, it'll be over a much smaller range. I've gone looking for papers on this topic but from what I read this isn't so. To get any noticeable improvement you need to read 10-50% of the table and that's effectively the same as reading the entire table -- and it still had pretty poor results. All the research I could find went into how to analyze the whole table while using a reasonable amount of scratch space and how to do it incrementally. -- greg
В списке pgsql-performance по дате отправления: