Re: [GENERAL] Performance with high correlation in group by on PK
От | Alban Hertroys |
---|---|
Тема | Re: [GENERAL] Performance with high correlation in group by on PK |
Дата | |
Msg-id | CAF-3MvP56vSbgiuADOwjEFQNUMdAJkvQ0fBTwXUwVRFAAtykCw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [GENERAL] Performance with high correlation in group by on PK (Jeff Janes <jeff.janes@gmail.com>) |
Ответы |
Re: [GENERAL] Performance with high correlation in group by on PK
|
Список | pgsql-general |
On 28 August 2017 at 21:32, Jeff Janes <jeff.janes@gmail.com> wrote: > On Mon, Aug 28, 2017 at 5:22 AM, Alban Hertroys <haramrae@gmail.com> wrote: >> >> Hi all, >> >> It's been a while since I actually got to use PG for anything serious, >> but we're finally doing some experimentation @work now to see if it is >> suitable for our datawarehouse. So far it's been doing well, but there >> is a particular type of query I run into that I expect we will >> frequently use and that's choosing a sequential scan - and I can't >> fathom why. >> >> This is on: >> >> >> The query in question is: >> select "VBAK_MANDT", max("VBAK_VBELN") >> from staging.etl00001_vbak >> group by "VBAK_MANDT"; >> >> This is the header-table for another detail table, and in this case >> we're already seeing a seqscan. The thing is, there are 15M rows in >> the table (disk usage is 15GB), while the PK is on ("VBAK_MANDT", >> "VBAK_VBELN") with very few distinct values for "VBAK_MANDT" (in fact, >> we only have 1 at the moment!). > > > You need an "index skip-scan" or "loose index scan". PostgreSQL doesn't > currently detect and implement them automatically, but you can use a > recursive CTE to get it to work. There are some examples at > https://wiki.postgresql.org/wiki/Loose_indexscan Thanks Jeff, that's an interesting approach. It looks very similar to correlated subqueries. Unfortunately, it doesn't seem to help with my issue. The CTE is indeed fast, but when querying the results from the 2nd level ov the PK with the CTE results, I'm back at a seqscan on pdw00002_vbak again. Just the CTE plan is in skipScan-top.sql. The complete plan is in skipScan-full.sql Note: I cloned the original etl00001_vbak table into a new pdw00002_vbak table that has varchars instead of chars, which reduced the table size to just over half the original's size. Hence the different table names, but the behaviour for this particular issue is the same between them. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Вложения
В списке pgsql-general по дате отправления: