Re: Query Performance / Planner estimate off

Поиск

Список

Период

Сортировка

От	Mats Olsen
Тема	Re: Query Performance / Planner estimate off
Дата	22 октября 2020 г. 06:21:46
Msg-id	0e7bd7e2-84c9-f261-9129-6eb4cf380b48@duneanalytics.com обсуждение исходный текст
Ответ на	Re: Query Performance / Planner estimate off (Michael Lewis <mlewis@entrata.com>)
Список	pgsql-performance

Дерево обсуждения

On 10/21/20 5:29 PM, Michael Lewis wrote:

On Wed, Oct 21, 2020, 8:42 AM Mats Olsen <mats@duneanalytics.com> wrote:

On 10/21/20 2:38 PM, Sebastian Dressler wrote:
Hi Mats,

On 20. Oct 2020, at 11:37, Mats Julian Olsen <mats@duneanalytics.com> wrote:

[...]

1) Vanilla plan (16 min) : https://explain.depesz.com/s/NvDR
2) enable_nestloop=off (4 min): https://explain.depesz.com/s/buKK
3) enable_nestloop=off; enable_seqscan=off (2 min): https://explain.depesz.com/s/0WXx

How can I get Postgres not to loop over 12M rows?

I looked at the plans and your config and there are some thoughts I'm having:

- The row estimate is off, as you possibly noticed. This can be possibly solved by raising `default_statistics_target` to e.g. 2500 (we typically use that) and run ANALYZE
I've `set default_statistics_target=2500` and ran analyze on both tables involved, unfortunately the plan is the same. The columns we use for joining here are hashes and we expect very few duplicates in the tables. Hence I think extended statistics (storing most common values and histogram bounds) aren't useful for this kind of data. Would you say the same thing?

Have you checked if ndistinct is roughly accurate? It can be set manually on a column, or set to some value less than one with the calculation depending on reltuples.

Thank you for your reply!

I included ndistinct-counts in the gist: see https://gist.githubusercontent.com/mewwts/9f11ae5e6a5951593b8999559f5418cf/raw/24ca1f227940b48842a03435b731f82364f3576d/stats%2520Mint and https://gist.githubusercontent.com/mewwts/9f11ae5e6a5951593b8999559f5418cf/raw/24ca1f227940b48842a03435b731f82364f3576d/stats%2520transactions.

The join keys `transactions.hash` (unique) and `"Pair_evt_Mint".evt_tx_hash` (nearly unique) both have ndistinct=-1 which seems to make sense to me. The Mint-table has -0.8375 for evt_block_time whereas this query returns 0.56 `select count(distinct evt_block_time)::numeric/count(*) from uniswap_v2."Pair_evt_Mint";`. Should I adjust that one?

Many of the other ndistinct-values for `transactions` seem strange, as it's a giant (partitioned) table, but I don't know enough about the statistics to draw any conclusions from it. What do you think?

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Query Performance / Planner estimate off