Re: Using Btree to Provide Sorting on Suffix Keys with LIMIT

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Using Btree to Provide Sorting on Suffix Keys with LIMIT
Дата
Msg-id CAH2-WzmAV3297_+c8t6H=N+nUicxNoA5EYgxf17G+UkYWMMQKA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Using Btree to Provide Sorting on Suffix Keys with LIMIT  (James Coleman <jtc331@gmail.com>)
Список pgsql-hackers
On Fri, Jan 18, 2019 at 12:15 PM James Coleman <jtc331@gmail.com> wrote:
> I'll attempt to describe a more real world scenario: suppose we have a
> schema like:
>
> users(id serial primary key)
> orders(id serial primary key, user_id integer, created_at timestamp)
>
> And wanted to find the most recent N orders for a specific group of
> users (e.g., in a report or search). Your query might look like:
>
> SELECT *
> FROM orders
> WHERE orders.user_id IN (1, 2, 3)
> ORDER BY orders.created_at DESC
> LIMIT 25
>
> Currently an index on orders(user_id, created_at) will be used for
> this query, but only to satisfy the scalar array op qual. Then all
> matching orders (say, years worth) will be fetched, a sort node will
> sort all of those results, and then a limit node will take the top N.
>
> Generalized the problem is something like "find the top N rows across
> a group of foreign keys" (though saying foreign keys probably is too
> specific).
>
> But under the scheme I'm proposing that same index would be able to
> provide both the filter and guarantee ordering as well.
>
> Does that more real-world-ish example help place the usefulness of this?

Yes. It didn't make much sense back in 2019, but I understand what you
meant now, I think. The latest version of my ScalarArrayOpExpr patch
(v2) can execute queries like this efficiently:

https://postgr.es/m/CAH2-WzkEyBU9UQM-5GWPcB=WEShAUKcJdvgFuqVHuPuO-iYW0Q@mail.gmail.com

Note that your example is similar to the test case from the latest
update on the thread. The test case from Benoit Tigeot, that appears
here:

https://gist.github.com/benoittgt/ab72dc4cfedea2a0c6a5ee809d16e04d?permalink_comment_id=4690491#gistcomment-4690491

You seemed to want to use an index that started with user_id/bar_fk.
But I think you'd have to have an index on "created_at DESC, user_id".
It could work the other way around with your suggested index, for a
query written to match -- "ORDER BY user_id, created_at DESC".

With an index on "created_at DESC, user_id", you'd be able to
efficiently execute your limit query. The index scan could only
terminate when it found (say) 25 matching tuples, so you might still
have to scan quite a few index pages. But, you wouldn't have to do
heap access to eliminate non-matches (with or without the VM being
set) -- you could eliminate all of those non-matches using true SAOP
index quals, that don't need to operate on known visible rows.

This is possible with the patch, despite the fact that the user_id
column is a low-order column (so this isn't one of the cases where
it's useful to "skip"). Avoiding heap hits just to eliminate
non-matching rows on user_id is what really matters here, though --
not skipping. It would be helpful if you could confirm this
understanding, though.

Thanks
--
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Fix bug in VACUUM and ANALYZE docs
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: XLog size reductions: Reduced XLog record header size for PG17