Interesting case of index un-usage

Поиск

Список

Период

Сортировка

От	Claudio Freire
Тема	Interesting case of index un-usage
Дата	9 апреля 2014 г. 21:56:28
Msg-id	CAGTBQpb_jfZbfM3JPYZEwphrqScJwMCasx3D6Lx+Vw46obArcQ@mail.gmail.com обсуждение исходный текст
Список	pgsql-performance

Дерево обсуждения

I have this table that is quite big (several gig).

I was looking for a row manually (because a query would take too long)
- I know there is correlation between id and date, so I was doing
manual binary search for the id range that holds certain date, and I
found an interesting case where the planner makes a significant snafu:

select created from non_bid_logs where id >= 788991892 order by id limit 100;

> Limit  (cost=0.00..185.15 rows=100 width=16)
>   ->  Index Scan using non_bid_logs_pkey on non_bid_logs  (cost=0.00..33973433.99 rows=18349427 width=16)
>         Index Cond: (id >= 788991892)


That uses the pk over id to get the first 100 rows above that. Quite
straightforward and correct - and fast.

Now... I originally tried:

select created from non_bid_logs where id >= 788991892 limit 100;

The same plan should work, and still be fast. But I get:

> Limit  (cost=0.00..12.30 rows=100 width=8)
>   ->  Seq Scan on non_bid_logs  (cost=0.00..2257215.96 rows=18350037 width=8)
>         Filter: (id >= 788991892)

This seems like a snafu of cost estimation. The planner should know
about the spatial correlation of "id", it's not clustered manually,
but quite naturally clustered, and yet it estimates the limit will
find the rows so fast?

If I do:

select correlation from pg_stats where tablename = 'non_bid_logs' and
attname = 'id';

I get:

0.272682

I don't know if that's realistic, I don't really know how to interpret
that number. But, experimentally, the seqscan performs horribly.

If I set enable_seqscan=off, and retry, I get:

> Limit  (cost=0.00..185.16 rows=100 width=8)
>   ->  Index Scan using non_bid_logs_pkey on non_bid_logs  (cost=0.00..33978925.99 rows=18351396 width=8)
>         Index Cond: (id >= 788991892)

So the planner knows about the index, it's just that it believes
(somehow foolishly) that the seqscan will be faster.

В списке pgsql-performance по дате отправления:

Предыдущее

От: Bruce Momjian
Дата: 09 апреля 2014 г., 19:51:31
Сообщение: Re: PGSQL, checkpoints, and file system syncs

Следующее

От: Robert Burgholzer
Дата: 09 апреля 2014 г., 22:49:40
Сообщение: Re: Optimizing Time Series Access

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Interesting case of index un-usage

Предыдущее

Следующее