Обсуждение: dsa_allocate could not find 4 free pages

Поиск
Список
Период
Сортировка

dsa_allocate could not find 4 free pages

От
Mark Dilger
Дата:
Hackers,

I'm getting the error in $subject and am only posting here because (a) the comments
in src/backend/utils/mmgr/dsa.c circa line 720 suggest that this is a bug, and (b) the
DSA code appears to be fairly new to the postgresql project, and perhaps not fully
debugged.

I am running against a build from 291a31c42c893574e9676e00121e6c6915a59de5
REL_10_STABLE, on centos linux.

I'm not asking for help configuring my system.  If this is a run of the mill out of memory
type error, I'll deal with it myself.  But if this indicates a bug, I'll be happy to try to distill
what I'm doing to a test case that I can share.  (I've only hit this once, and have little
idea whether it would be reproducible.)  The gist of the query was an array_agg over
the results of a parallel query over a partitioned table, executed from within a plpgsql
function, like:

SELECT ARRAY_AGG((a, latest, timelogged, b, cnt)::custom_composite_type) AS trans FROM
    (SELECT a, b, timelogged, MAX(timelogged) OVER (PARTITION BY a) AS latest, SUM(cnt) AS cnt FROM
        (SELECT a, b, timelogged, COUNT(*) AS cnt
            FROM mytable
            WHERE a= ANY('{65537,65538,65539,65540,65541,65542,65543,65544}'::OID[])
            AND timelogged >= '2017-12-04 16:12:50-08'::TIMESTAMP
            AND timelogged < '2017-12-04 17:12:50-08'::TIMESTAMP
            GROUP BY a, b, timelogged
        ) AS ss
        GROUP BY a, b, timelogged
) AS s

If this is not a bug, but rather something that users can hit through normal resource
exhaustion, then perhaps the comments in dsa.c can be changed to make that clear.

Thanks!

mark



Re: dsa_allocate could not find 4 free pages

От
Thomas Munro
Дата:
On Wed, Dec 6, 2017 at 6:17 AM, Mark Dilger <hornschnorter@gmail.com> wrote:
> I'm not asking for help configuring my system.  If this is a run of the mill out of memory
> type error, I'll deal with it myself.  But if this indicates a bug, I'll be happy to try to distill
> what I'm doing to a test case that I can share.  (I've only hit this once, and have little
> idea whether it would be reproducible.)  The gist of the query was an array_agg over
> the results of a parallel query over a partitioned table, executed from within a plpgsql
> function, like:
>
> SELECT ARRAY_AGG((a, latest, timelogged, b, cnt)::custom_composite_type) AS trans FROM
>     (SELECT a, b, timelogged, MAX(timelogged) OVER (PARTITION BY a) AS latest, SUM(cnt) AS cnt FROM
>         (SELECT a, b, timelogged, COUNT(*) AS cnt
>             FROM mytable
>             WHERE a= ANY('{65537,65538,65539,65540,65541,65542,65543,65544}'::OID[])
>             AND timelogged >= '2017-12-04 16:12:50-08'::TIMESTAMP
>             AND timelogged < '2017-12-04 17:12:50-08'::TIMESTAMP
>             GROUP BY a, b, timelogged
>         ) AS ss
>         GROUP BY a, b, timelogged
> ) AS s

Hi Mark,

Does the plan have multiple Gather nodes with Parallel Bitmap Heap Scan?

https://www.postgresql.org/message-id/flat/CAEepm%3D0Mv9BigJPpribGQhnHqVGYo2%2BkmzekGUVJJc9Y_ZVaYA%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com


Re: dsa_allocate could not find 4 free pages

От
Mark Dilger
Дата:
> On Dec 5, 2017, at 11:25 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>
> On Wed, Dec 6, 2017 at 6:17 AM, Mark Dilger <hornschnorter@gmail.com> wrote:
>> I'm not asking for help configuring my system.  If this is a run of the mill out of memory
>> type error, I'll deal with it myself.  But if this indicates a bug, I'll be happy to try to distill
>> what I'm doing to a test case that I can share.  (I've only hit this once, and have little
>> idea whether it would be reproducible.)  The gist of the query was an array_agg over
>> the results of a parallel query over a partitioned table, executed from within a plpgsql
>> function, like:
>>
>> SELECT ARRAY_AGG((a, latest, timelogged, b, cnt)::custom_composite_type) AS trans FROM
>>    (SELECT a, b, timelogged, MAX(timelogged) OVER (PARTITION BY a) AS latest, SUM(cnt) AS cnt FROM
>>        (SELECT a, b, timelogged, COUNT(*) AS cnt
>>            FROM mytable
>>            WHERE a= ANY('{65537,65538,65539,65540,65541,65542,65543,65544}'::OID[])
>>            AND timelogged >= '2017-12-04 16:12:50-08'::TIMESTAMP
>>            AND timelogged < '2017-12-04 17:12:50-08'::TIMESTAMP
>>            GROUP BY a, b, timelogged
>>        ) AS ss
>>        GROUP BY a, b, timelogged
>> ) AS s
>
> Hi Mark,
>
> Does the plan have multiple Gather nodes with Parallel Bitmap Heap Scan?

This was encountered and logged by a java client.  The only data I got was:

org.postgresql.util.PSQLException: ERROR: dsa_allocate could not find 4 free pages
  Where: parallel worker

I know what the query was, because it is the only query this client issues, but there is
no record of what the plan might have been.  The java client runs queries spanning a long
time period followed by multiple queries spanning short time periods.  The sql query is
the same in both cases; only the bind parameters for the time frame differ.  The short time
period queries do not generate parallel plans (in my experience), but the long time period
ones do, with plans that look something like this:

 Aggregate  (cost=31552146.47..31552146.48 rows=1 width=32)
   ->  WindowAgg  (cost=14319575.96..31538146.47 rows=800000 width=43)
         ->  GroupAggregate  (cost=14319575.96..31526146.47 rows=800000 width=39)
               Group Key: acct_r.mta, acct_r.dtype, (moment(acct_r.timelogged))
               ->  Finalize GroupAggregate  (cost=14319575.96..31356146.47 rows=8000000 width=15)
                     Group Key: acct_r.mta, acct_r.dtype, (moment(acct_r.timelogged))
                     ->  Gather Merge  (cost=14319575.96..30056146.47 rows=120000000 width=15)
                           Workers Planned: 15
                           ->  Partial GroupAggregate  (cost=14318575.64..14755146.15 rows=8000000 width=15)
                                 Group Key: acct_r.mta, acct_r.dtype, (moment(acct_r.timelogged))
                                 ->  Sort  (cost=14318575.64..14385889.74 rows=26925641 width=7)
                                       Sort Key: acct_r.mta, acct_r.dtype, (moment(acct_r.timelogged))
                                       ->  Result  (cost=0.00..10627491.52 rows=26925641 width=7)
                                             ->  Append  (cost=0.00..10290921.01 rows=26925641 width=7)
                                                   ->  Parallel Seq Scan on acct_r  (cost=0.00..784882.50 rows=1805625
width=7)
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_d  (cost=0.00..542032.80 rows=1757640
width=7)
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_b  (cost=0.00..111584.91 rows=206096
width=7)
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_x  (cost=0.00..20661.29 rows=48165
width=7)
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_r acct_r_1  (cost=0.00..795917.43
rows=1831322width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_d acct_d_1  (cost=0.00..549630.82
rows=1782141width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_b acct_b_1  (cost=0.00..113211.40
rows=208970width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_x acct_x_1  (cost=0.00..20667.64
rows=46732width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_r acct_r_2  (cost=0.00..790341.78
rows=1818139width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_d acct_d_2  (cost=0.00..545702.84
rows=1769442width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_b acct_b_2  (cost=0.00..112481.26
rows=207663width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_x acct_x_2  (cost=0.00..20834.24
rows=48462width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_r acct_r_3  (cost=0.00..791592.41
rows=1822370width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_d acct_d_3  (cost=0.00..546542.70
rows=1773285width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_b acct_b_3  (cost=0.00..112643.48
rows=208174width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_x acct_x_3  (cost=0.00..20488.79
rows=46540width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_r acct_r_4  (cost=0.00..790958.39
rows=1820569width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_d acct_d_4  (cost=0.00..546166.22
rows=1772061width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_b acct_b_4  (cost=0.00..112614.67
rows=207933width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_x acct_x_4  (cost=0.00..20505.07
rows=46503width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_r acct_r_5  (cost=0.00..796322.24
rows=1832912width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_d acct_d_5  (cost=0.00..549874.51
rows=1784126width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_b acct_b_5  (cost=0.00..113290.21
rows=209260width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_x acct_x_5  (cost=0.00..20703.26
rows=47013width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_r acct_r_6  (cost=0.00..785534.98
rows=1807499width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_d acct_d_6  (cost=0.00..542409.68
rows=1759134width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_b acct_b_6  (cost=0.00..111713.68
rows=206334width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_x acct_x_6  (cost=0.00..20667.40
rows=48120width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_r acct_r_7  (cost=0.00..11.29 rows=5
width=7)
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_d acct_d_7  (cost=0.00..782.52
rows=2576width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_b acct_b_7  (cost=0.00..130.95
rows=698width=7) 
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                                   ->  Parallel Seq Scan on acct_x acct_x_7  (cost=0.00..19.65 rows=132
width=7)
                                                         Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
(78 rows)

The short time period ones tend to look like this:

Aggregate  (cost=9499062.32..9499062.33 rows=1 width=32)
   ->  WindowAgg  (cost=9487746.22..9497897.43 rows=66565 width=43)
         ->  GroupAggregate  (cost=9487746.22..9496898.95 rows=66565 width=39)
               Group Key: acct_r.mta, acct_r.dtype, (moment(acct_r.timelogged))
               ->  Sort  (cost=9487746.22..9489410.35 rows=665654 width=15)
                     Sort Key: acct_r.mta, acct_r.dtype, (moment(acct_r.timelogged))
                     ->  HashAggregate  (cost=9408385.57..9416706.25 rows=665654 width=15)
                           Group Key: acct_r.mta, acct_r.dtype, moment(acct_r.timelogged)
                           ->  Result  (cost=12668.25..9341820.14 rows=6656543 width=7)
                                 ->  Append  (cost=12668.25..9258613.35 rows=6656543 width=7)
                                       ->  Bitmap Heap Scan on acct_r  (cost=12668.25..739601.94 rows=525895 width=7)
                                             Recheck Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                             ->  Bitmap Index Scan on acct_r_mta_moment  (cost=0.00..12536.77
rows=525895width=0) 
                                                   Index Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                       ->  Bitmap Heap Scan on acct_d  (cost=12119.70..546172.53 rows=498469 width=7)
                                             Recheck Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                             ->  Bitmap Index Scan on acct_d_mta_moment  (cost=0.00..11995.08
rows=498469width=0) 
                                                   Index Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                       ->  Index Scan using acct_b_mta_moment on acct_b  (cost=0.43..63926.98
rows=36865width=7) 
                                             Index Cond: (moment(timelogged) >= '2017-12-05 11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                       ->  Index Scan using acct_x_mta_moment on acct_x  (cost=0.42..16974.14 rows=9732
width=7)
                                             Index Cond: (moment(timelogged) >= '2017-12-05 11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                       ->  Bitmap Heap Scan on acct_r acct_r_1  (cost=13188.15..756606.11 rows=548076
width=7)
                                             Recheck Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                             ->  Bitmap Index Scan on acct_r_mta_moment  (cost=0.00..13051.13
rows=548076width=0) 
                                                   Index Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                       ->  Bitmap Heap Scan on acct_d acct_d_1  (cost=9583.46..521988.16 rows=394697
width=7)
                                             Recheck Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                             ->  Bitmap Index Scan on acct_d_mta_moment  (cost=0.00..9484.79
rows=394697width=0) 
                                                   Index Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                       ->  Index Scan using acct_b_mta_moment on acct_b acct_b_1  (cost=0.43..63528.17
rows=36631width=7) 
                                             Index Cond: (moment(timelogged) >= '2017-12-05 11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                       ->  Index Scan using acct_x_mta_moment on acct_x acct_x_1  (cost=0.42..12437.69
rows=7174width=7) 
                                             Index Cond: (moment(timelogged) >= '2017-12-05 11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                       ->  Index Scan using acct_r_mta_moment on acct_r acct_r_2  (cost=0.56..577183.17
rows=335206width=7) 
                                             Index Cond: (moment(timelogged) >= '2017-12-05 11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                       ->  Bitmap Heap Scan on acct_d acct_d_2  (cost=9955.04..524788.79 rows=409610
width=7)
                                             Recheck Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                             ->  Bitmap Index Scan on acct_d_mta_moment  (cost=0.00..9852.64
rows=409610width=0) 
                                                   Index Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                       ->  Index Scan using acct_b_mta_moment on acct_b acct_b_2  (cost=0.43..91630.01
rows=52910width=7) 
                                             Index Cond: (moment(timelogged) >= '2017-12-05 11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                       ->  Bitmap Heap Scan on acct_x acct_x_2  (cost=271.31..17389.00 rows=10179
width=7)
                                             Recheck Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                             ->  Bitmap Index Scan on acct_x_mta_moment  (cost=0.00..268.76 rows=10179
width=0)
                                                   Index Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                       ->  Bitmap Heap Scan on acct_r acct_r_3  (cost=9617.11..669375.23 rows=399555
width=7)
                                             Recheck Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                             ->  Bitmap Index Scan on acct_r_mta_moment  (cost=0.00..9517.23
rows=399555width=0) 
                                                   Index Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                       ->  Bitmap Heap Scan on acct_d acct_d_3  (cost=7954.15..489086.26 rows=327431
width=7)
                                             Recheck Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                             ->  Bitmap Index Scan on acct_d_mta_moment  (cost=0.00..7872.30
rows=327431width=0) 
                                                   Index Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                       ->  Index Scan using acct_b_mta_moment on acct_b acct_b_3  (cost=0.43..57803.57
rows=33318width=7) 
                                             Index Cond: (moment(timelogged) >= '2017-12-05 11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                       ->  Bitmap Heap Scan on acct_x acct_x_3  (cost=266.63..17170.64 rows=10091
width=7)
                                             Recheck Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                             ->  Bitmap Index Scan on acct_x_mta_moment  (cost=0.00..264.11 rows=10091
width=0)
                                                   Index Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                       ->  Bitmap Heap Scan on acct_r acct_r_4  (cost=14364.05..772579.42 rows=596579
width=7)
                                             Recheck Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                             ->  Bitmap Index Scan on acct_r_mta_moment  (cost=0.00..14214.90
rows=596579width=0) 
                                                   Index Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                       ->  Bitmap Heap Scan on acct_d acct_d_4  (cost=8948.13..508920.98 rows=368460
width=7)
                                             Recheck Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                             ->  Bitmap Index Scan on acct_d_mta_moment  (cost=0.00..8856.01
rows=368460width=0) 
                                                   Index Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                       ->  Index Scan using acct_b_mta_moment on acct_b acct_b_4  (cost=0.43..66568.72
rows=38394width=7) 
                                             Index Cond: (moment(timelogged) >= '2017-12-05 11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                       ->  Index Scan using acct_x_mta_moment on acct_x acct_x_4  (cost=0.42..16054.04
rows=9268width=7) 
                                             Index Cond: (moment(timelogged) >= '2017-12-05 11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                       ->  Bitmap Heap Scan on acct_r acct_r_5  (cost=12191.39..737360.94 rows=506687
width=7)
                                             Recheck Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                             ->  Bitmap Index Scan on acct_r_mta_moment  (cost=0.00..12064.72
rows=506687width=0) 
                                                   Index Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                       ->  Bitmap Heap Scan on acct_d acct_d_5  (cost=10295.39..532639.53 rows=424107
width=7)
                                             Recheck Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                             ->  Bitmap Index Scan on acct_d_mta_moment  (cost=0.00..10189.36
rows=424107width=0) 
                                                   Index Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                       ->  Index Scan using acct_b_mta_moment on acct_b acct_b_5  (cost=0.43..71723.09
rows=41379width=7) 
                                             Index Cond: (moment(timelogged) >= '2017-12-05 11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                       ->  Bitmap Heap Scan on acct_x acct_x_5  (cost=277.93..17587.45 rows=10517
width=7)
                                             Recheck Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                             ->  Bitmap Index Scan on acct_x_mta_moment  (cost=0.00..275.30 rows=10517
width=0)
                                                   Index Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                       ->  Bitmap Heap Scan on acct_r acct_r_6  (cost=14015.51..763250.58 rows=581541
width=7)
                                             Recheck Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                             ->  Bitmap Index Scan on acct_r_mta_moment  (cost=0.00..13870.12
rows=581541width=0) 
                                                   Index Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                       ->  Bitmap Heap Scan on acct_d acct_d_6  (cost=9523.94..515952.84 rows=391661
width=7)
                                             Recheck Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                             ->  Bitmap Index Scan on acct_d_mta_moment  (cost=0.00..9426.02
rows=391661width=0) 
                                                   Index Cond: (moment(timelogged) >= '2017-12-05
11:20:00-08'::secondstamp)
                                       ->  Index Scan using acct_b_mta_moment on acct_b acct_b_6  (cost=0.43..75659.10
rows=43660width=7) 
                                             Index Cond: (moment(timelogged) >= '2017-12-05 11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                       ->  Index Scan using acct_x_mta_moment on acct_x acct_x_6  (cost=0.42..14623.44
rows=8445width=7) 
                                             Index Cond: (moment(timelogged) >= '2017-12-05 11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                       ->  Seq Scan on acct_r acct_r_7  (cost=0.00..12.75 rows=3 width=7)
                                             Filter: ((moment(timelogged) >= '2017-12-05 11:20:00-08'::secondstamp) AND
(mta= ANY ('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))) 
                                       ->  Index Scan using acct_d_mta_moment on acct_d acct_d_7  (cost=0.29..6.06
rows=1width=7) 
                                             Index Cond: (moment(timelogged) >= '2017-12-05 11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                       ->  Index Scan using acct_b_mta_moment on acct_b acct_b_7  (cost=0.28..6.06
rows=1width=7) 
                                             Index Cond: (moment(timelogged) >= '2017-12-05 11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
                                       ->  Index Scan using acct_x_mta_moment on acct_x acct_x_7  (cost=0.14..5.95
rows=1width=7) 
                                             Index Cond: (moment(timelogged) >= '2017-12-05 11:20:00-08'::secondstamp)
                                             Filter: (mta = ANY
('{65537,65538,65539,65540,65541,65542,65543,65544}'::smallid[]))
(137 rows)




Re: dsa_allocate could not find 4 free pages

От
Thomas Munro
Дата:
On Wed, Dec 6, 2017 at 9:35 AM, Mark Dilger <hornschnorter@gmail.com> wrote:
>> On Dec 5, 2017, at 11:25 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>> Does the plan have multiple Gather nodes with Parallel Bitmap Heap Scan?
>
> This was encountered and logged by a java client.  The only data I got was:
>
> org.postgresql.util.PSQLException: ERROR: dsa_allocate could not find 4 free pages
>   Where: parallel worker

This means that the DSA area is corrupted.  Presumably
get_best_segment(area, 4) returned a segment that wasn't actually good
for 4 pages, either because it was incorrectly binned or because its
free space btree was corrupted.  Another path would be that
make_new_segment(area, 4) returned a segment that couldn't find 4
pages, but that seems unlikely.

> [query plan with one Gather and no Parallel Bitmap Heap Scan]

I'm not sure why this plan would ever call dsa_allocate().

> [query plan with no Gather but plenty of Btimap Heap Scans]

And this one certainly can't.  I guess you must sometimes get a
different variation that has Gather nodes and uses Parallel Bitmap
Heap Scan.  Then the question is whether the es_query_dsa multiple
Gather bug can explain this: for example, if dsa_free(wrong_dsa_area,
p) was called, perhaps it could produce this type of corruption.
Otherwise we have a different bug.  Any clues on how to reproduce the
problem would be very welcome.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: dsa_allocate could not find 4 free pages

От
Mark Dilger
Дата:
> On Dec 5, 2017, at 4:07 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>
> On Wed, Dec 6, 2017 at 9:35 AM, Mark Dilger <hornschnorter@gmail.com> wrote:
>>> On Dec 5, 2017, at 11:25 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>>> Does the plan have multiple Gather nodes with Parallel Bitmap Heap Scan?
>>
>> This was encountered and logged by a java client.  The only data I got was:
>>
>> org.postgresql.util.PSQLException: ERROR: dsa_allocate could not find 4 free pages
>>  Where: parallel worker
>
> This means that the DSA area is corrupted.  Presumably
> get_best_segment(area, 4) returned a segment that wasn't actually good
> for 4 pages, either because it was incorrectly binned or because its
> free space btree was corrupted.  Another path would be that
> make_new_segment(area, 4) returned a segment that couldn't find 4
> pages, but that seems unlikely.
>
>> [query plan with one Gather and no Parallel Bitmap Heap Scan]
>
> I'm not sure why this plan would ever call dsa_allocate().
>
>> [query plan with no Gather but plenty of Btimap Heap Scans]
>
> And this one certainly can't.  I guess you must sometimes get a
> different variation that has Gather nodes and uses Parallel Bitmap
> Heap Scan.

Yes, I can believe that the plan is sometimes different.  This error has
occurred several times now, but it is still rather infrequent, so either the
plan that triggers it is rare, or the bug is intermittent even with the same
plan being chosen, or perhaps both.

>  Then the question is whether the es_query_dsa multiple
> Gather bug can explain this: for example, if dsa_free(wrong_dsa_area,
> p) was called, perhaps it could produce this type of corruption.
> Otherwise we have a different bug.  Any clues on how to reproduce the
> problem would be very welcome.

I have written (and rewritten, and rewritten) a tap test in the hopes of
getting a test case that reproduces this reliably (or even once), but
without luck so far.  I will keep trying.

mark