Re: Yet another abort-early plan disaster on 9.3
От | Peter Geoghegan |
---|---|
Тема | Re: Yet another abort-early plan disaster on 9.3 |
Дата | |
Msg-id | CAEYLb_UTz=dtxYOCYF_D1s7wQJfoZUrey6Fjxq08VFueXWoEKQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Yet another abort-early plan disaster on 9.3 (Simon Riggs <simon@2ndquadrant.com>) |
Список | pgsql-performance |
On Thu, Oct 2, 2014 at 1:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> I disagree that (1) is not worth fixing just because we've provided >> users with an API to override the stats. It would unquestionably be >> better for us to have a better n_distinct estimate in the first place. >> Further, this is an easier problem to solve, and fixing n_distinct >> estimates would fix a large minority of currently pathological queries. >> It's like saying "hey, we don't need to fix the leak in your radiator, >> we've given you a funnel in the dashboard you can pour water into." > > Having read papers on it, I believe the problem is intractable. Coding > is not the issue. To anyone: please prove me wrong, in detail, with > references so it can be coded. I think it might be close to intractable if you're determined to use a sampling model. HyperLogLog looks very interesting for n_distinct estimation, though. My abbreviated key patch estimates the cardinality of abbreviated keys (and original strings that are to be sorted) with high precision and fixed overhead. Maybe we can figure out a way to do opportunistic streaming of HLL. Believe it or not, the way I use HLL for estimating cardinality is virtually free. Hashing is really cheap when the CPU is bottlenecked on memory bandwidth. If you're interested, download the patch, and enable the debug traces. You'll see HyperLogLog accurately indicate the cardinality of text datums as they're copied into local memory before sorting. -- Regards, Peter Geoghegan
В списке pgsql-performance по дате отправления: