Interesting case of index un-usage

Поиск
Список
Период
Сортировка
От Claudio Freire
Тема Interesting case of index un-usage
Дата
Msg-id CAGTBQpb_jfZbfM3JPYZEwphrqScJwMCasx3D6Lx+Vw46obArcQ@mail.gmail.com
обсуждение исходный текст
Список pgsql-performance
I have this table that is quite big (several gig).

I was looking for a row manually (because a query would take too long)
- I know there is correlation between id and date, so I was doing
manual binary search for the id range that holds certain date, and I
found an interesting case where the planner makes a significant snafu:

select created from non_bid_logs where id >= 788991892 order by id limit 100;

> Limit  (cost=0.00..185.15 rows=100 width=16)
>   ->  Index Scan using non_bid_logs_pkey on non_bid_logs  (cost=0.00..33973433.99 rows=18349427 width=16)
>         Index Cond: (id >= 788991892)


That uses the pk over id to get the first 100 rows above that. Quite
straightforward and correct - and fast.

Now... I originally tried:

select created from non_bid_logs where id >= 788991892 limit 100;

The same plan should work, and still be fast. But I get:

> Limit  (cost=0.00..12.30 rows=100 width=8)
>   ->  Seq Scan on non_bid_logs  (cost=0.00..2257215.96 rows=18350037 width=8)
>         Filter: (id >= 788991892)

This seems like a snafu of cost estimation. The planner should know
about the spatial correlation of "id", it's not clustered manually,
but quite naturally clustered, and yet it estimates the limit will
find the rows so fast?

If I do:

select correlation from pg_stats where tablename = 'non_bid_logs' and
attname = 'id';

I get:

0.272682

I don't know if that's realistic, I don't really know how to interpret
that number. But, experimentally, the seqscan performs horribly.

If I set enable_seqscan=off, and retry, I get:

> Limit  (cost=0.00..185.16 rows=100 width=8)
>   ->  Index Scan using non_bid_logs_pkey on non_bid_logs  (cost=0.00..33978925.99 rows=18351396 width=8)
>         Index Cond: (id >= 788991892)

So the planner knows about the index, it's just that it believes
(somehow foolishly) that the seqscan will be faster.


В списке pgsql-performance по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: PGSQL, checkpoints, and file system syncs
Следующее
От: Robert Burgholzer
Дата:
Сообщение: Re: Optimizing Time Series Access