Обсуждение: BUG #8410: out of binary heap slots
The following bug has been logged on the website:
Bug reference: 8410
Logged by: Terje Elde
Email address: terje@elde.net
PostgreSQL version: Unsupported/Unknown
Operating system: FreeBSD
Description:
Running:
PostgreSQL 9.3beta1 on amd64-portbld-freebsd9.1, compiled by cc (GCC) 4.2.1
20070831 patched [FreeBSD], 64-bit
I'm getting "out of binary heap slots", which offcourse spoils the fun of
the query.
I'm having trouble reproducing it, as I'm only seeing the issue in about 1
in 20 000 queries.
I can get the error, turn right around and run the same again manually, and
it'll run just fine.
I'd love to see if I can reproduce in on 9.3rc1, but seeing it is rare
enough as it is.
Slightly anonymised from the logs:
ERROR: out of binary heap slots
CONTEXT: PL/pgSQL function foo(integer,bigint,character[],timestamp without
time zone,integer,timestamp without time zone,integer,timestamp without time
zone,integer,inet,character varying) line 233 at FETCH
STATEMENT: SELECT a, b, c, d, e, f, g, h, i, j, k, l, m, n, o
FROM foo( 100, 2221::bigint, ARRAY['m', 'f', '', ' ', NULL]::char[],
'2013-07-05T19:11:41.958154'::timestamp, 30::int, NULL::timestamp, 10::int,
'2013-08-30T19:11:41.958168'::timestamp::timestamp, 100::int,
'123.123.123.123'::inet, 'FOO/1.1 CFNetwork/609.1.4 Darwin/13.0.0'::varchar
)
The FETCH-line is running off of a cursor, with a query going pretty much
like this:
OPEN curs FOR SELECT * FROM (
SELECT
'n'::char AS noo, p.id, p.pub, p.details, i.hash AS foo,
p.bam, p.goo,
e.name AS bar, p.meh, p.startt, p.endt, p.v, ps.s_likes AS
likes,
p.last_change, ARRAY( SELECT tag FROM foo_tag WHERE barzz =
p.id ) AS tags
FROM
p, ps, i, e, s
WHERE
s.tihi = i_cid AND
s.v = True AND
p.pub = s.id AND
p.id = ps.id AND
i.id = p.foo AND
e.id = p.bar AND
i.baz IS NOT NULL AND
p.bam = ANY( i_bam ) AND
endt > NOW() AND
startt < NOW() + INTERVAL '15 minutes' AND
p.startt > i_newerthan
ORDER BY p.startt DESC LIMIT i_maxnew
FOR UPDATE OF ps ) AS newer_than
UNION SELECT * FROM (
SELECT
'u'::char AS noo, p.id, p.pub, p.details, i.hash AS foo,
p.bam, p.goo,
e.name AS bar, p.meh, p.startt, p.endt, p.v, ps.s_likes AS
likes,
p.last_change, ARRAY( SELECT tag FROM foo_tag WHERE bazz =
p.id ) AS tags
FROM
p, ps, i, e, s
WHERE
s.tihi = i_cid AND
s.v = True AND
p.pub = s.id AND
p.id = ps.id AND
i.id = p.foo AND
e.id = p.bar AND
i.baz IS NOT NULL AND
p.bam = ANY( i_bam ) AND
endt > NOW() AND
startt < NOW() + INTERVAL '15 minutes' AND
startt > i_oldest AND
p.last_change > i_newerthan
ORDER BY p.last_change DESC LIMIT i_maxupdates
FOR UPDATE OF ps ) AS updated;
Yeah, I know. Mangled/obfuscated bits in bug-reports are no fun, but the
code isn't mine to put in a public list.
If you'd like to take a look, I can't imagine it'd be much of a problem for
me to mail you the proper query (and whole function) off-list.
This is not a crisis for us, and I've been meaning to clean up or rewrite
the query anyway, but I figured I should fire off an email anyway, just in
case there's a bug in PostgreSQL that hasn't been caught since beta1.
(again, sorry about the old version).
Explain analyze gives this plan (again anonymized a bit, but can send proper
off-list):
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=423.84..424.15 rows=31 width=223) (actual
time=0.668..0.675 rows=30 loops=1)
-> Append (cost=35.50..422.67 rows=31 width=223) (actual
time=0.123..0.608 rows=30 loops=1)
-> Subquery Scan on apples (cost=35.50..412.18 rows=30 width=223)
(actual time=0.123..0.600 rows=30 loops=1)
-> Limit (cost=35.50..411.88 rows=30 width=217) (actual
time=0.122..0.591 rows=30 loops=1)
-> LockRows (cost=35.50..2494.50 rows=196 width=217)
(actual time=0.121..0.584 rows=30 loops=1)
-> Nested Loop (cost=35.50..2492.54 rows=196
width=217) (actual time=0.113..0.544 rows=30 loops=1)
-> Nested Loop (cost=35.23..410.51
rows=196 width=207) (actual time=0.085..0.343 rows=30 loops=1)
-> Nested Loop (cost=34.95..319.23
rows=202 width=164) (actual time=0.078..0.264 rows=32 loops=1)
-> Nested Loop
(cost=34.81..263.23 rows=211 width=147) (actual time=0.073..0.203 rows=32
loops=1)
-> Merge Append
(cost=34.66..172.40 rows=354 width=141) (actual time=0.067..0.126 rows=32
loops=1)
Sort Key: p.startt
-> Sort
(cost=0.01..0.02 rows=1 width=669) (actual time=0.016..0.016 rows=0
loops=1)
Sort Key:
p.startt
Sort Method:
quicksort Memory: 25kB
-> Seq Scan
on cars p (cost=0.00..0.00 rows=1 width=669) (actual time=0.001..0.001
rows=0 loops=1)
Filter:
((startt > '2013-07-05 19:11:41.958154'::timestamp without time zone) AND
(endt > now()) AND (bus = ANY ('{m,f," "," ",NULL}'::character(1)[])) AND
(startt < (now() + '00:15:00'::interval)))
-> Index Scan
Backward using cars_startt on cars_2013 p_1 (cost=0.28..129.49 rows=351
width=136) (actual time=0.043..0.096 rows=32 loops=1)
Index Cond:
((startt < (now() + '00:15:00'::interval)) AND (startt > '2013-07-05
19:11:41.958154'::timestamp without time zone))
Filter:
((endt > now()) AND (bus = ANY ('{m,f," "," ",NULL}'::character(1)[])))
-> Sort
(cost=13.45..13.45 rows=1 width=669) (actual time=0.004..0.004 rows=0
loops=1)
Sort Key:
p_2.startt
Sort Method:
quicksort Memory: 25kB
-> Seq Scan
on cars_2014 p_2 (cost=0.00..13.44 rows=1 width=669) (actual
time=0.000..0.000 rows=0 loops=1)
Filter:
((startt > '2013-07-05 19:11:41.958154'::timestamp without time zone) AND
(endt > now()) AND (bus = ANY ('{m,f," "," ",NULL}'::character(1)[])) AND
(startt < (now() + '00:15:00'::interval)))
-> Sort
(cost=13.45..13.45 rows=1 width=669) (actual time=0.004..0.004 rows=0
loops=1)
Sort Key:
p_3.startt
Sort Method:
quicksort Memory: 25kB
-> Seq Scan
on cars_2015 p_3 (cost=0.00..13.44 rows=1 width=669) (actual
time=0.000..0.000 rows=0 loops=1)
Filter:
((startt > '2013-07-05 19:11:41.958154'::timestamp without time zone) AND
(endt > now()) AND (bus = ANY ('{m,f," "," ",NULL}'::character(1)[])) AND
(startt < (now() + '00:15:00'::interval)))
-> Index Scan using
oranges_pkey on oranges s (cost=0.14..0.25 rows=1 width=10) (actual
time=0.002..0.002 rows=1 loops=32)
Index Cond: (id =
p.agent)
Filter: (visible
AND (carpool = 100))
-> Index Scan using
roadworkers_pkey on roadworkers e (cost=0.14..0.26 rows=1 width=25) (actual
time=0.001..0.001 rows=1 loops=32)
Index Cond: (id =
p.employee)
-> Index Scan using bikes_pkey on
bikes i (cost=0.28..0.44 rows=1 width=55) (actual time=0.002..0.002 rows=1
loops=32)
Index Cond: (id = p.bike)
Filter: (source IS NOT NULL)
Rows Removed by Filter: 0
-> Index Scan using cars_stats_pkey on
cars_stats ps (cost=0.28..0.48 rows=1 width=18) (actual time=0.003..0.003
rows=1 loops=30)
Index Cond: (id = p.id)
SubPlan 2
-> Index Only Scan using fruit_tags_pkey
on fruit_tags fruit_tags_1 (cost=0.28..10.14 rows=3 width=4) (actual
time=0.001..0.002 rows=2 loops=30)
Index Cond: (fruit = p.id)
Heap Fetches: 60
-> Subquery Scan on updated (cost=10.16..10.18 rows=1 width=221)
(actual time=0.005..0.005 rows=0 loops=1)
-> Limit (cost=10.16..10.17 rows=1 width=0) (actual
time=0.005..0.005 rows=0 loops=1)
-> LockRows (cost=10.16..10.17 rows=1 width=0)
(actual time=0.005..0.005 rows=0 loops=1)
-> Sort (cost=10.16..10.16 rows=1 width=0)
(actual time=0.005..0.005 rows=0 loops=1)
Sort Key: last_change
Sort Method: quicksort Memory: 25kB
-> Result (cost=0.00..10.15 rows=1
width=0) (actual time=0.000..0.000 rows=0 loops=1)
One-Time Filter: false
SubPlan 1
-> Index Only Scan using
fruit_tags_pkey on fruit_tags (cost=0.28..10.14 rows=3 width=4) (never
executed)
Index Cond: (fruit = id)
Heap Fetches: 0
Total runtime: 0.885 ms
(57 rows)
Terje
On Aug 30, 2013, at 10:46 PM, terje@elde.net wrote:
> The following bug has been logged on the website:
That didn't work out too well, rendering-wize.
Sending the explain analyze by email as well, hoping it'll be more =
readable:
=
QUERY PLAN =
=
=20
=
--------------------------------------------------------------------------=
--------------------------------------------------------------------------=
--------------------------------------------------------------------------=
-------------------------------------------------
HashAggregate (cost=3D423.84..424.15 rows=3D31 width=3D223) (actual =
time=3D0.668..0.675 rows=3D30 loops=3D1)
-> Append (cost=3D35.50..422.67 rows=3D31 width=3D223) (actual =
time=3D0.123..0.608 rows=3D30 loops=3D1)
-> Subquery Scan on apples (cost=3D35.50..412.18 rows=3D30 =
width=3D223) (actual time=3D0.123..0.600 rows=3D30 loops=3D1)
-> Limit (cost=3D35.50..411.88 rows=3D30 width=3D217) =
(actual time=3D0.122..0.591 rows=3D30 loops=3D1)
-> LockRows (cost=3D35.50..2494.50 rows=3D196 =
width=3D217) (actual time=3D0.121..0.584 rows=3D30 loops=3D1)
-> Nested Loop (cost=3D35.50..2492.54 =
rows=3D196 width=3D217) (actual time=3D0.113..0.544 rows=3D30 loops=3D1)
-> Nested Loop (cost=3D35.23..410.51 =
rows=3D196 width=3D207) (actual time=3D0.085..0.343 rows=3D30 loops=3D1)
-> Nested Loop =
(cost=3D34.95..319.23 rows=3D202 width=3D164) (actual time=3D0.078..0.264 =
rows=3D32 loops=3D1)
-> Nested Loop =
(cost=3D34.81..263.23 rows=3D211 width=3D147) (actual time=3D0.073..0.203 =
rows=3D32 loops=3D1)
-> Merge Append =
(cost=3D34.66..172.40 rows=3D354 width=3D141) (actual time=3D0.067..0.126 =
rows=3D32 loops=3D1)
Sort Key: =
p.startt
-> Sort =
(cost=3D0.01..0.02 rows=3D1 width=3D669) (actual time=3D0.016..0.016 =
rows=3D0 loops=3D1)
Sort Key: =
p.startt
Sort =
Method: quicksort Memory: 25kB
-> Seq =
Scan on cars p (cost=3D0.00..0.00 rows=3D1 width=3D669) (actual =
time=3D0.001..0.001 rows=3D0 loops=3D1)
=
Filter: ((startt > '2013-07-05 19:11:41.958154'::timestamp without time =
zone) AND (endt > now()) AND (bus =3D ANY ('{m,f," "," =
",NULL}'::character(1)[])) AND (startt < (now() + =
'00:15:00'::interval)))
-> Index Scan =
Backward using cars_startt on cars_2013 p_1 (cost=3D0.28..129.49 =
rows=3D351 width=3D136) (actual time=3D0.043..0.096 rows=3D32 loops=3D1)
Index =
Cond: ((startt < (now() + '00:15:00'::interval)) AND (startt > =
'2013-07-05 19:11:41.958154'::timestamp without time zone))
Filter: =
((endt > now()) AND (bus =3D ANY ('{m,f," "," =
",NULL}'::character(1)[])))
-> Sort =
(cost=3D13.45..13.45 rows=3D1 width=3D669) (actual time=3D0.004..0.004 =
rows=3D0 loops=3D1)
Sort Key: =
p_2.startt
Sort =
Method: quicksort Memory: 25kB
-> Seq =
Scan on cars_2014 p_2 (cost=3D0.00..13.44 rows=3D1 width=3D669) (actual =
time=3D0.000..0.000 rows=3D0 loops=3D1)
=
Filter: ((startt > '2013-07-05 19:11:41.958154'::timestamp without time =
zone) AND (endt > now()) AND (bus =3D ANY ('{m,f," "," =
",NULL}'::character(1)[])) AND (startt < (now() + =
'00:15:00'::interval)))
-> Sort =
(cost=3D13.45..13.45 rows=3D1 width=3D669) (actual time=3D0.004..0.004 =
rows=3D0 loops=3D1)
Sort Key: =
p_3.startt
Sort =
Method: quicksort Memory: 25kB
-> Seq =
Scan on cars_2015 p_3 (cost=3D0.00..13.44 rows=3D1 width=3D669) (actual =
time=3D0.000..0.000 rows=3D0 loops=3D1)
=
Filter: ((startt > '2013-07-05 19:11:41.958154'::timestamp without time =
zone) AND (endt > now()) AND (bus =3D ANY ('{m,f," "," =
",NULL}'::character(1)[])) AND (startt < (now() + =
'00:15:00'::interval)))
-> Index Scan using =
oranges_pkey on oranges s (cost=3D0.14..0.25 rows=3D1 width=3D10) =
(actual time=3D0.002..0.002 rows=3D1 loops=3D32)
Index Cond: (id =
=3D p.agent)
Filter: =
(visible AND (carpool =3D 100))
-> Index Scan using =
roadworkers_pkey on roadworkers e (cost=3D0.14..0.26 rows=3D1 width=3D25)=
(actual time=3D0.001..0.001 rows=3D1 loops=3D32)
Index Cond: (id =3D =
p.employee)
-> Index Scan using bikes_pkey =
on bikes i (cost=3D0.28..0.44 rows=3D1 width=3D55) (actual =
time=3D0.002..0.002 rows=3D1 loops=3D32)
Index Cond: (id =3D p.bike)
Filter: (source IS NOT =
NULL)
Rows Removed by Filter: 0
-> Index Scan using cars_stats_pkey on =
cars_stats ps (cost=3D0.28..0.48 rows=3D1 width=3D18) (actual =
time=3D0.003..0.003 rows=3D1 loops=3D30)
Index Cond: (id =3D p.id)
SubPlan 2
-> Index Only Scan using =
fruit_tags_pkey on fruit_tags fruit_tags_1 (cost=3D0.28..10.14 rows=3D3 =
width=3D4) (actual time=3D0.001..0.002 rows=3D2 loops=3D30)
Index Cond: (fruit =3D p.id)
Heap Fetches: 60
-> Subquery Scan on updated (cost=3D10.16..10.18 rows=3D1 =
width=3D221) (actual time=3D0.005..0.005 rows=3D0 loops=3D1)
-> Limit (cost=3D10.16..10.17 rows=3D1 width=3D0) =
(actual time=3D0.005..0.005 rows=3D0 loops=3D1)
-> LockRows (cost=3D10.16..10.17 rows=3D1 =
width=3D0) (actual time=3D0.005..0.005 rows=3D0 loops=3D1)
-> Sort (cost=3D10.16..10.16 rows=3D1 =
width=3D0) (actual time=3D0.005..0.005 rows=3D0 loops=3D1)
Sort Key: last_change
Sort Method: quicksort Memory: 25kB
-> Result (cost=3D0.00..10.15 rows=3D1 =
width=3D0) (actual time=3D0.000..0.000 rows=3D0 loops=3D1)
One-Time Filter: false
SubPlan 1
-> Index Only Scan using =
fruit_tags_pkey on fruit_tags (cost=3D0.28..10.14 rows=3D3 width=3D4) =
(never executed)
Index Cond: (fruit =3D =
id)
Heap Fetches: 0
Total runtime: 0.885 ms
(57 rows)
Hi, On 2013-08-30 20:46:27 +0000, terje@elde.net wrote: > I'm getting "out of binary heap slots", which offcourse spoils the fun of > the query. > Explain analyze gives this plan (again anonymized a bit, but can send proper > off-list): Since I reviewed the patch that introduced that message, I'd be interested in getting that. Ideally in a state where I can reproduce the issue in a new cluster. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2013-08-30 23:00:15 +0200, Andres Freund wrote: > Hi, > > > On 2013-08-30 20:46:27 +0000, terje@elde.net wrote: > > I'm getting "out of binary heap slots", which offcourse spoils the fun of > > the query. > > > Explain analyze gives this plan (again anonymized a bit, but can send proper > > off-list): > > Since I reviewed the patch that introduced that message, I'd be > interested in getting that. Ideally in a state where I can reproduce the > issue in a new cluster. No need, found the bug. And I think can build a testcase myself. ExecReScanMergeAppend resets ms_initialized, but doesn't clear the binaryheap. Thus no new elements fit. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 30. aug. 2013, at 23:05, Andres Freund <andres@2ndquadrant.com> wrote: > No need, found the bug. And I think can build a testcase myself. Sounds good. :) Please do let me know if there's anything I can do. Terje
Andres Freund wrote: > No need, found the bug. And I think can build a testcase myself. > > ExecReScanMergeAppend resets ms_initialized, but doesn't clear the > binaryheap. Thus no new elements fit. Um. Are we missing a binaryheap_clear() method? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes:
> No need, found the bug. And I think can build a testcase myself.
> ExecReScanMergeAppend resets ms_initialized, but doesn't clear the
> binaryheap. Thus no new elements fit.
Sounds like a bug all right, but I'm not convinced it explains Terje's
report. The thing this theory doesn't explain is why would Terje be
having trouble reproducing the failure? Seems like re-running the same
query ought to produce the same failure.
regards, tom lane
On 2013-08-30 17:15:32 -0400, Alvaro Herrera wrote: > Andres Freund wrote: > > > No need, found the bug. And I think can build a testcase myself. > > > > ExecReScanMergeAppend resets ms_initialized, but doesn't clear the > > binaryheap. Thus no new elements fit. > > Um. Are we missing a binaryheap_clear() method? In the patch I am patch-to-be it's binaryheap_reset(), but yes ;). Are you already patching it, or do you want me to finish it? I have a not so nice testcase and I can confirm that this is the issue and that a binaryheap_reset() fixes it: SELECT (SELECT g.i FROM ((SELECT random()::int ORDER BY 1 OFFSET 0) UNION ALL (SELECT random()::int ORDER BY 1 OFFSET 0))f(i) ORDER BY f.i LIMIT 1) FROM generate_series(1, 10) g(i); Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2013-08-30 17:23:51 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > No need, found the bug. And I think can build a testcase myself. > > > ExecReScanMergeAppend resets ms_initialized, but doesn't clear the > > binaryheap. Thus no new elements fit. > > Sounds like a bug all right, but I'm not convinced it explains Terje's > report. The thing this theory doesn't explain is why would Terje be > having trouble reproducing the failure? Seems like re-running the same > query ought to produce the same failure. The number of rescans can be rather data-dependant, so I'd guess that's the reason. If many of the subplans don't return a tuple, it can take several resets to actually ever reach the heap's limit as we don't add the subplan to the heap in that case. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Aug 30, 2013, at 11:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Sounds like a bug all right, but I'm not convinced it explains Terje's > report. The thing this theory doesn't explain is why would Terje be > having trouble reproducing the failure? Seems like re-running the = same > query ought to produce the same failure. Same query, but possibly different context. The query originally runs as part of a larger stored procedure. (mailing the thing out in a separate mail, omitting the list for that = part) Other than the context, the explain analyze was run with the same = parameters as the failed query (copypasted from the logged failure). = It's possibly that execution might be different as I ran it perhaps half = an hour later. Possible, but not likely. Terje
On 2013-08-30 23:05:25 +0200, Andres Freund wrote: > On 2013-08-30 23:00:15 +0200, Andres Freund wrote: > > Hi, > > > > > > On 2013-08-30 20:46:27 +0000, terje@elde.net wrote: > > > I'm getting "out of binary heap slots", which offcourse spoils the fun of > > > the query. > > > > > Explain analyze gives this plan (again anonymized a bit, but can send proper > > > off-list): > > > > Since I reviewed the patch that introduced that message, I'd be > > interested in getting that. Ideally in a state where I can reproduce the > > issue in a new cluster. > > No need, found the bug. And I think can build a testcase myself. > > ExecReScanMergeAppend resets ms_initialized, but doesn't clear the > binaryheap. Thus no new elements fit. Ok, patch for that attached. Should we add SELECT (SELECT g.i FROM ((SELECT random()::int ORDER BY 1 OFFSET 0) UNION ALL (SELECT random()::int ORDER BY 1 OFFSET 0))f(i) ORDER BY f.i LIMIT 1) FROM generate_series(1, 10) g(i); as a regression test? I slightly on the "no" side... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On 2013-08-30 17:23:51 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > No need, found the bug. And I think can build a testcase myself. > > > ExecReScanMergeAppend resets ms_initialized, but doesn't clear the > > binaryheap. Thus no new elements fit. > > Sounds like a bug all right, but I'm not convinced it explains Terje's > report. The thing this theory doesn't explain is why would Terje be > having trouble reproducing the failure? Seems like re-running the same > query ought to produce the same failure. Even better explanation: The merge append is some steps below a LockRows node, so the number of rescans might depend on the concurrency because we'll do the EvalPlanQual dance on a concurrent row update. Terje, do you use read committed or repeatable read/serializable? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Aug 30, 2013, at 11:49 PM, Andres Freund <andres@2ndquadrant.com> wrote: > Even better explanation: The merge append is some steps below a LockRows > node, so the number of rescans might depend on the concurrency because > we'll do the EvalPlanQual dance on a concurrent row update. > > Terje, do you use read committed or repeatable read/serializable? Using Read Committed. Terje
Andres Freund <andres@2ndquadrant.com> writes:
> Terje, do you use read committed or repeatable read/serializable?
Or even more to the point, can you apply the just-posted patch and see
if the problem goes away for you?
regards, tom lane
On Aug 31, 2013, at 12:22 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Or even more to the point, can you apply the just-posted patch and see > if the problem goes away for you? Will do. At our current organic load though, we likely wouldn't get any good = confirmation either way this week. I'll see if I can set up some synthetic load to provoke the error more = rapidly, then patch and re-check. It's getting quite late, and has been a bit of a long day, so it'll = unfortunately have to wait until tomorrow I think, sorry. :( Terje
Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-08-30 23:05:25 +0200, Andres Freund wrote:
>> ExecReScanMergeAppend resets ms_initialized, but doesn't clear the
>> binaryheap. Thus no new elements fit.
> Ok, patch for that attached.
I think the comments need a bit of copy-editing, but looks good otherwise.
Will fix and commit.
> Should we add
> SELECT (SELECT g.i FROM ((SELECT random()::int ORDER BY 1 OFFSET 0) UNION ALL (SELECT random()::int ORDER BY 1 OFFSET
0))f(i) ORDER BY f.i LIMIT 1) FROM generate_series(1, 10) g(i);
> as a regression test? I slightly on the "no" side...
Not sure. It's pretty disturbing that this wasn't caught earlier;
it seems to me that means there's no regression coverage that hits
ExecReScanMergeAppend. However, I don't much like this specific test case
because it seems like hitting the bug could depend on what series of
random values you get.
regards, tom lane
On 2013-08-30 18:55:53 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2013-08-30 23:05:25 +0200, Andres Freund wrote:
> >> ExecReScanMergeAppend resets ms_initialized, but doesn't clear the
> >> binaryheap. Thus no new elements fit.
>
> > Ok, patch for that attached.
>
> I think the comments need a bit of copy-editing, but looks good otherwise.
> Will fix and commit.
Thanks.
> > Should we add
> > SELECT (SELECT g.i FROM ((SELECT random()::int ORDER BY 1 OFFSET 0) UNION ALL (SELECT random()::int ORDER BY 1
OFFSET0)) f(i) ORDER BY f.i LIMIT 1) FROM generate_series(1, 10) g(i);
> > as a regression test? I slightly on the "no" side...
>
> Not sure. It's pretty disturbing that this wasn't caught earlier;
> it seems to me that means there's no regression coverage that hits
> ExecReScanMergeAppend. However, I don't much like this specific test case
> because it seems like hitting the bug could depend on what series of
> random values you get.
Hm, that should be fixable. How about:
SELECT
-- correlated subquery, with dependency on outer query, to force rescans
-- which will be planned as a merge-append.
(SELECT g.i FROM (
(SELECT * FROM generate_series(1, 2) ORDER BY 1)
UNION ALL
(SELECT * FROM generate_series(1, 2) ORDER BY 1)
) f(i)
ORDER BY f.i LIMIT 1)
FROM generate_series(1, 3) g(i);
I couldn't find a simpler testcase within some minutes...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-08-30 18:55:53 -0400, Tom Lane wrote:
>> Not sure. It's pretty disturbing that this wasn't caught earlier;
>> it seems to me that means there's no regression coverage that hits
>> ExecReScanMergeAppend. However, I don't much like this specific test case
>> because it seems like hitting the bug could depend on what series of
>> random values you get.
> Hm, that should be fixable. How about:
Looks good, applied.
regards, tom lane
On 2013-08-30 19:28:39 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2013-08-30 18:55:53 -0400, Tom Lane wrote: > >> Not sure. It's pretty disturbing that this wasn't caught earlier; > >> it seems to me that means there's no regression coverage that hits > >> ExecReScanMergeAppend. However, I don't much like this specific test case > >> because it seems like hitting the bug could depend on what series of > >> random values you get. > > > Hm, that should be fixable. How about: > > Looks good, applied. On second thought, it might not be so good looking - the queries results are independent of the data from merge-append. So we only check that we don't crash and not that the results make any sense. How about the attached patch? I verified that it fails without the binaryheap_reset(). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
Andres Freund <andres@2ndquadrant.com> writes:
> On second thought, it might not be so good looking - the queries results
> are independent of the data from merge-append. So we only check that we
> don't crash and not that the results make any sense. How about the
> attached patch?
Good point --- pushed.
regards, tom lane