Обсуждение: The plan changes when the limit gets above ~850,000

Поиск
Список
Период
Сортировка

The plan changes when the limit gets above ~850,000

От
"Wm.A.Stafford"
Дата:
First of all, thanks to Albe Laurenz for pointing out how to analyze the
situation where there was a change in behavior when the limit was above
about 850,000.

I looked at the plan for both versions of the query and, as Albe
suggested, the plans were different.  I don't know how to analyze the
plans but I'm guessing that when the number of records returned gets
larger setup time is an increasingly smaller part of the entire process
so more time can be spent on setup.  This would result in the apparent
inactivity of the application running the query when limit went from
850,000 to 1,000,000

I think I would like the query to work the same regardless of the number
of records returned but I'm not sure: 1. if that can be done, 2. it
would be substantially slower or 3. if there is any choice in the matter
at all.

The two plans are below.  Any help or advice would be appreciated.

-=bill

the query:
SELECT A.cache_id, A.validname, B.resource_full_name, B.resource_id ,
C.worms_id
FROM cache A, resources B, taxa C
WHERE A.resource_id=B.resource_id AND A.taxon_id=C.taxon_id
OFFSET 0 LIMIT 800000

the plan when limit = 800000
"Limit  (cost=0.00..5868507.75 rows=800000 width=614)"
"  ->  Nested Loop  (cost=0.00..104095524.49 rows=14190391 width=614)"
"        ->  Merge Join  (cost=0.00..2615512.71 rows=14190391 width=94)"
"              Merge Cond: (c.taxon_id = a.taxon_id)"
"              ->  Index Scan using ix_taxon_id on taxa c
(cost=0.00..2020343.76 rows=519932 width=12)"
"              ->  Index Scan using ix_taxon_id2 on "cache" a
(cost=0.00..819727.67 rows=14190391 width=94)"
"        ->  Index Scan using obis_resources_pkey on obis_resources b
(cost=0.00..7.12 rows=1 width=524)"
"              Index Cond: (a.resource_id = b.resource_id)"

the plan when limit = 1000000
"Limit  (cost=5941567.20..5974067.35 rows=1000000 width=614)"
"  ->  Merge Join  (cost=5941567.20..6402757.08 rows=14190391 width=614)"
"        Merge Cond: (b.resource_id = a.resource_id)"
"        ->  Sort  (cost=105.54..106.63 rows=437 width=524)"
"              Sort Key: b.resource_id"
"              ->  Seq Scan on obis_resources b  (cost=0.00..86.37
rows=437 width=524)"
"        ->  Sort  (cost=5941461.66..5976937.64 rows=14190391 width=94)"
"              Sort Key: a.resource_id"
"              ->  Merge Join  (cost=97698.57..1151605.96 rows=14190391
width=94)"
"                    Merge Cond: (a.taxon_id = c.taxon_id)"
"                    ->  Index Scan using ix_taxon_id2 on "cache" a
(cost=0.00..819727.67 rows=14190391 width=94)"
"                    ->  Sort  (cost=97698.57..98998.40 rows=519932
width=12)"
"                          Sort Key: c.taxon_id"
"                          ->  Seq Scan on taxa c  (cost=0.00..27007.32
rows=519932 width=12)"


Re: The plan changes when the limit gets above ~850,000

От
Tom Lane
Дата:
"Wm.A.Stafford" <stafford@marine.rutgers.edu> writes:
> I looked at the plan for both versions of the query and, as Albe
> suggested, the plans were different.  I don't know how to analyze the
> plans but I'm guessing that when the number of records returned gets
> larger setup time is an increasingly smaller part of the entire process
> so more time can be spent on setup.  This would result in the apparent
> inactivity of the application running the query when limit went from
> 850,000 to 1,000,000

Yeah, it's supposed to do that.  The question you really need to be
asking is what's the total elapsed time to do the whole query.  If
there's a serious slowdown then the planner is choosing to cut over
too soon, which could probably be improved by fooling with the cost
parameter.

But, do you really care about the behavior with intermediate LIMIT
values?  I would think that you might have a small limit on this query
while doing testing, but when you go to do the real data processing
you won't have a LIMIT at all.

            regards, tom lane

Re: The plan changes when the limit gets above ~850,000

От
"Wm.A.Stafford"
Дата:
Tom,

    Thanks for the reply.  You are correct, I don't really care about
the behavior at less than the maximum number of rows.  I guess I was
just a bit mystified by the difference in behavior.  As of now, I stop
worrying about it.

    -=beeky

Tom Lane wrote:
> "Wm.A.Stafford" <stafford@marine.rutgers.edu> writes:
>
>> I looked at the plan for both versions of the query and, as Albe
>> suggested, the plans were different.  I don't know how to analyze the
>> plans but I'm guessing that when the number of records returned gets
>> larger setup time is an increasingly smaller part of the entire process
>> so more time can be spent on setup.  This would result in the apparent
>> inactivity of the application running the query when limit went from
>> 850,000 to 1,000,000
>>
>
> Yeah, it's supposed to do that.  The question you really need to be
> asking is what's the total elapsed time to do the whole query.  If
> there's a serious slowdown then the planner is choosing to cut over
> too soon, which could probably be improved by fooling with the cost
> parameter.
>
> But, do you really care about the behavior with intermediate LIMIT
> values?  I would think that you might have a small limit on this query
> while doing testing, but when you go to do the real data processing
> you won't have a LIMIT at all.
>
>             regards, tom lane
>