Обсуждение: pg15b3: crash in paralell vacuum
Immediately after upgrading an internal instance, a loop around "vacuum" did
this:
TRAP: FailedAssertion("indstats->status == PARALLEL_INDVAC_STATUS_INITIAL", File: "vacuumparallel.c", Line: 611, PID:
27635)
postgres: postgres pryzbyj [local] VACUUM(ExceptionalCondition+0x8d)[0x99d9fd]
postgres: postgres pryzbyj [local] VACUUM[0x6915db]
postgres: postgres pryzbyj [local] VACUUM(heap_vacuum_rel+0x12b6)[0x5083e6]
postgres: postgres pryzbyj [local] VACUUM[0x68e97a]
postgres: postgres pryzbyj [local] VACUUM(vacuum+0x48e)[0x68fe9e]
postgres: postgres pryzbyj [local] VACUUM(ExecVacuum+0x2ae)[0x69065e]
postgres: postgres pryzbyj [local] VACUUM(standard_ProcessUtility+0x530)[0x8567b0]
/usr/pgsql-15/lib/pg_stat_statements.so(+0x5450)[0x7f52b891c450]
postgres: postgres pryzbyj [local] VACUUM[0x85490a]
postgres: postgres pryzbyj [local] VACUUM[0x854a53]
postgres: postgres pryzbyj [local] VACUUM(PortalRun+0x179)[0x855029]
postgres: postgres pryzbyj [local] VACUUM[0x85099b]
postgres: postgres pryzbyj [local] VACUUM(PostgresMain+0x199a)[0x85268a]
postgres: postgres pryzbyj [local] VACUUM[0x496a21]
postgres: postgres pryzbyj [local] VACUUM(PostmasterMain+0x11c0)[0x7b3980]
postgres: postgres pryzbyj [local] VACUUM(main+0x1c6)[0x4986a6]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f52c4b893d5]
postgres: postgres pryzbyj [local] VACUUM[0x498c59]
< 2022-08-18 07:56:51.963 CDT >LOG: server process (PID 27635) was terminated by signal 6: Aborted
< 2022-08-18 07:56:51.963 CDT >DETAIL: Failed process was running: VACUUM ANALYZE alarms
Unfortunately, it looks like the RPM packages are compiled with -O2, so this is
of limited use.
Core was generated by `postgres: postgres pryzbyj [local] VACUUM '.
Program terminated with signal 6, Aborted.
#0 0x00007f52c4b9d207 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install audit-libs-2.8.4-4.el7.x86_64 bzip2-libs-1.0.6-13.el7.x86_64
cyrus-sasl-lib-2.1.26-23.el7.x86_64elfutils-libelf-0.176-5.el7.x86_64 elfutils-libs-0.176-5.el7.x86_64
glibc-2.17-260.el7_6.3.x86_64keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-51.el7_9.x86_64
libattr-2.4.46-13.el7.x86_64libcap-2.22-9.el7.x86_64 libcap-ng-0.7.5-4.el7.x86_64 libcom_err-1.42.9-19.el7.x86_64
libgcc-4.8.5-39.el7.x86_64libgcrypt-1.5.3-14.el7.x86_64 libgpg-error-1.12-3.el7.x86_64 libicu-50.1.2-17.el7.x86_64
libselinux-2.5-15.el7.x86_64libstdc++-4.8.5-39.el7.x86_64 libxml2-2.9.1-6.el7_9.6.x86_64 libzstd-1.5.2-1.el7.x86_64
lz4-1.7.5-2.el7.x86_64nspr-4.19.0-1.el7_5.x86_64 nss-3.36.0-7.1.el7_6.x86_64 nss-softokn-freebl-3.36.0-5.el7_5.x86_64
nss-util-3.36.0-1.1.el7_6.x86_64openldap-2.4.44-21.el7_6.x86_64 openssl-libs-1.0.2k-22.el7_9.x86_64
pam-1.1.8-22.el7.x86_64pcre-8.32-17.el7.x86_64 systemd-libs-219-62.el7_6.5.x86_64 xz-libs-5.2.2-1.el7.x86_64
zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0 0x00007f52c4b9d207 in raise () from /lib64/libc.so.6
#1 0x00007f52c4b9e8f8 in abort () from /lib64/libc.so.6
#2 0x000000000099da1e in ExceptionalCondition (conditionName=conditionName@entry=0xafae40 "indstats->status ==
PARALLEL_INDVAC_STATUS_INITIAL",errorType=errorType@entry=0x9fb4b7 "FailedAssertion",
fileName=fileName@entry=0xafb0c0 "vacuumparallel.c", lineNumber=lineNumber@entry=611) at assert.c:69
#3 0x00000000006915db in parallel_vacuum_process_all_indexes (pvs=0x2e85f80, num_index_scans=<optimized out>,
vacuum=<optimizedout>) at vacuumparallel.c:611
#4 0x00000000005083e6 in heap_vacuum_rel (rel=<optimized out>, params=<optimized out>, bstrategy=<optimized out>) at
vacuumlazy.c:2679
#5 0x000000000068e97a in table_relation_vacuum (bstrategy=<optimized out>, params=0x7fff46de9a80, rel=0x7f52c7bc2c10)
at../../../src/include/access/tableam.h:1680
#6 vacuum_rel (relid=52187497, relation=<optimized out>, params=0x7fff46de9a80) at vacuum.c:2092
#7 0x000000000068fe9e in vacuum (relations=0x2dbeee8, params=params@entry=0x7fff46de9a80, bstrategy=<optimized out>,
bstrategy@entry=0x0,isTopLevel=isTopLevel@entry=true) at vacuum.c:475
#8 0x000000000069065e in ExecVacuum (pstate=pstate@entry=0x2dc38d0, vacstmt=vacstmt@entry=0x2d9f3a0,
isTopLevel=isTopLevel@entry=true)at vacuum.c:275
#9 0x00000000008567b0 in standard_ProcessUtility (pstmt=pstmt@entry=0x2d9f7a0, queryString=queryString@entry=0x2d9e8a0
"VACUUMANALYZE alarms", readOnlyTree=<optimized out>, context=context@entry=PROCESS_UTILITY_TOPLEVEL,
params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, dest=dest@entry=0x2d9f890, qc=qc@entry=0x7fff46dea0c0) at
utility.c:866
#10 0x00007f52b891c450 in pgss_ProcessUtility (pstmt=0x2d9f7a0, queryString=0x2d9e8a0 "VACUUM ANALYZE alarms",
readOnlyTree=<optimizedout>, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x2d9f890,
qc=0x7fff46dea0c0) at pg_stat_statements.c:1143
#11 0x000000000085490a in PortalRunUtility (portal=portal@entry=0x2e20fc0, pstmt=0x2d9f7a0,
isTopLevel=isTopLevel@entry=true,setHoldSnapshot=setHoldSnapshot@entry=false, dest=0x2d9f890, qc=0x7fff46dea0c0) at
pquery.c:1158
#12 0x0000000000854a53 in PortalRunMulti (portal=portal@entry=0x2e20fc0, isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false,dest=dest@entry=0x2d9f890, altdest=altdest@entry=0x2d9f890,
qc=qc@entry=0x7fff46dea0c0) at pquery.c:1322
#13 0x0000000000855029 in PortalRun (portal=0x2e20fc0, count=9223372036854775807, isTopLevel=<optimized out>,
run_once=<optimizedout>, dest=0x2d9f890, altdest=0x2d9f890, qc=0x7fff46dea0c0) at pquery.c:791
#14 0x000000000085099b in exec_simple_query (query_string=0x2d9e8a0 "VACUUM ANALYZE alarms") at postgres.c:1250
#15 0x000000000085268a in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4581
#16 0x0000000000496a21 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4504
#17 BackendStartup (port=0x2dbe9c0) at postmaster.c:4232
#18 ServerLoop () at postmaster.c:1806
#19 0x00000000007b3980 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x2d99280) at postmaster.c:1478
#20 0x00000000004986a6 in main (argc=3, argv=0x2d99280) at main.c:202
(gdb) p *pvs
$2 = {pcxt = 0x2e84490, indrels = 0x2e84220, nindexes = 8, shared = 0x2aaaaf142380, indstats = 0x2aaaaf1423c0,
dead_items= 0x2aaaab142380, buffer_usage = 0x2aaaab142260, wal_usage = 0x2aaaab142220,
will_parallel_vacuum = 0x2e8f750, nindexes_parallel_bulkdel = 5, nindexes_parallel_cleanup = 0,
nindexes_parallel_condcleanup= 5, bstrategy = 0x2dbed40, relnamespace = 0x0, relname = 0x0, indname = 0x0,
status = PARALLEL_INDVAC_STATUS_INITIAL}
(gdb) info locals
indstats = <optimized out>
i = <optimized out>
nworkers = 2
(gdb) p *pvs
$4 = {pcxt = 0x2e84490, indrels = 0x2e84220, nindexes = 8, shared = 0x2aaaaf142380, indstats = 0x2aaaaf1423c0,
dead_items= 0x2aaaab142380, buffer_usage = 0x2aaaab142260, wal_usage = 0x2aaaab142220,
will_parallel_vacuum = 0x2e8f750, nindexes_parallel_bulkdel = 5, nindexes_parallel_cleanup = 0,
nindexes_parallel_condcleanup= 5, bstrategy = 0x2dbed40, relnamespace = 0x0, relname = 0x0, indname = 0x0,
status = PARALLEL_INDVAC_STATUS_INITIAL}
I reproduced it like this:
pryzbyj=# VACUUM (PARALLEL 2,VERBOSE,INDEX_CLEANUP on) alarms; -- DISABLE_PAGE_SKIPPING true
INFO: vacuuming "pryzbyj.public.alarms"
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
So I'll be back shortly with more...
--
Justin
On Thu, Aug 18, 2022 at 08:34:06AM -0500, Justin Pryzby wrote:
> Unfortunately, it looks like the RPM packages are compiled with -O2, so this is
> of limited use. So I'll be back shortly with more...
#3 0x00000000006874f1 in parallel_vacuum_process_all_indexes (pvs=0x25bdce0, num_index_scans=0,
vacuum=vacuum@entry=false)at vacuumparallel.c:611
611 Assert(indstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
(gdb) p *pvs
$1 = {pcxt = 0x25bc1e0, indrels = 0x25bbf70, nindexes = 8, shared = 0x7fc5184393a0, indstats = 0x7fc5184393e0,
dead_items= 0x7fc5144393a0, buffer_usage = 0x7fc514439280, wal_usage = 0x7fc514439240,
will_parallel_vacuum = 0x266d818, nindexes_parallel_bulkdel = 5, nindexes_parallel_cleanup = 0,
nindexes_parallel_condcleanup= 5, bstrategy = 0x264f120, relnamespace = 0x0, relname = 0x0, indname = 0x0,
status = PARALLEL_INDVAC_STATUS_INITIAL}
(gdb) p *indstats
$2 = {status = 11, parallel_workers_can_process = false, istat_updated = false, istat = {num_pages = 0, estimated_count
=false, num_index_tuples = 0, tuples_removed = 0, pages_newly_deleted = 0, pages_deleted = 1,
pages_free = 0}}
(gdb) bt f
...
#3 0x00000000006874f1 in parallel_vacuum_process_all_indexes (pvs=0x25bdce0, num_index_scans=0,
vacuum=vacuum@entry=false)at vacuumparallel.c:611
indstats = 0x7fc5184393e0
i = 0
nworkers = 2
new_status = PARALLEL_INDVAC_STATUS_NEED_CLEANUP
__func__ = "parallel_vacuum_process_all_indexes"
#4 0x0000000000687ef0 in parallel_vacuum_cleanup_all_indexes (pvs=<optimized out>,
num_table_tuples=num_table_tuples@entry=409149,num_index_scans=<optimized out>,
estimated_count=estimated_count@entry=true)
at vacuumparallel.c:486
No locals.
#5 0x00000000004f80b8 in lazy_cleanup_all_indexes (vacrel=vacrel@entry=0x25bc510) at vacuumlazy.c:2679
reltuples = 409149
estimated_count = true
#6 0x00000000004f884a in lazy_scan_heap (vacrel=vacrel@entry=0x25bc510) at vacuumlazy.c:1278
rel_pages = 67334
blkno = 67334
next_unskippable_block = 67334
next_failsafe_block = 0
next_fsm_block_to_vacuum = 0
dead_items = 0x7fc5144393a0
vmbuffer = 1300
next_unskippable_allvis = true
skipping_current_range = false
initprog_index = {0, 1, 5}
initprog_val = {1, 67334, 11184809}
__func__ = "lazy_scan_heap"
#7 0x00000000004f925f in heap_vacuum_rel (rel=0x7fc52df6b820, params=0x7ffd74f74620, bstrategy=0x264f120) at
vacuumlazy.c:534
vacrel = 0x25bc510
verbose = true
instrument = <optimized out>
aggressive = false
skipwithvm = true
frozenxid_updated = false
minmulti_updated = false
OldestXmin = 32759288
FreezeLimit = 4277726584
OldestMxact = 157411
MultiXactCutoff = 4290124707
orig_rel_pages = 67334
new_rel_pages = <optimized out>
new_rel_allvisible = 4
ru0 = {tv = {tv_sec = 1660830451, tv_usec = 473980}, ru = {ru_utime = {tv_sec = 0, tv_usec = 317891}, ru_stime
={tv_sec = 1, tv_usec = 212372}, {ru_maxrss = 74524, __ru_maxrss_word = 74524}, {ru_ixrss = 0,
__ru_ixrss_word = 0}, {ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0},
{ru_minflt= 18870, __ru_minflt_word = 18870}, {ru_majflt = 0, __ru_majflt_word = 0}, {ru_nswap = 0,
__ru_nswap_word = 0}, {ru_inblock = 1124750, __ru_inblock_word = 1124750}, {ru_oublock = 0,
__ru_oublock_word= 0}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0, __ru_msgrcv_word = 0}, {ru_nsignals = 0,
__ru_nsignals_word = 0}, {ru_nvcsw = 42, __ru_nvcsw_word = 42}, {ru_nivcsw = 35, __ru_nivcsw_word =
35}}}
starttime = 714145651473980
startreadtime = 0
startwritetime = 0
startwalusage = {wal_records = 2, wal_fpi = 0, wal_bytes = 421}
StartPageHit = 50
StartPageMiss = 0
StartPageDirty = 0
errcallback = {previous = 0x0, callback = 0x4f5f41 <vacuum_error_callback>, arg = 0x25bc510}
indnames = 0x266d838
__func__ = "heap_vacuum_rel"
This is a qemu VM which (full disclosure) has crashed a few times recently due
to OOM. This is probably a postgres bug, but conceivably it's being tickled by
bad data (although the vm crashing shouldn't cause that, either, following
recovery). This is also an instance that was pg_upgraded from v14 (and earlier
versions) to v15b1 and then b2, so it's conceivably possible there's weird data
pages that wouldn't be written by beta3. But that doesn't seem to be the issue
here anyway.
--
Justin
Hi,
On Thu, Aug 18, 2022 at 10:34 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> Immediately after upgrading an internal instance, a loop around "vacuum" did
> this:
Thank you for the report!
>
> TRAP: FailedAssertion("indstats->status == PARALLEL_INDVAC_STATUS_INITIAL", File: "vacuumparallel.c", Line: 611, PID:
27635)
> postgres: postgres pryzbyj [local] VACUUM(ExceptionalCondition+0x8d)[0x99d9fd]
> postgres: postgres pryzbyj [local] VACUUM[0x6915db]
> postgres: postgres pryzbyj [local] VACUUM(heap_vacuum_rel+0x12b6)[0x5083e6]
> postgres: postgres pryzbyj [local] VACUUM[0x68e97a]
> postgres: postgres pryzbyj [local] VACUUM(vacuum+0x48e)[0x68fe9e]
> postgres: postgres pryzbyj [local] VACUUM(ExecVacuum+0x2ae)[0x69065e]
> postgres: postgres pryzbyj [local] VACUUM(standard_ProcessUtility+0x530)[0x8567b0]
> /usr/pgsql-15/lib/pg_stat_statements.so(+0x5450)[0x7f52b891c450]
> postgres: postgres pryzbyj [local] VACUUM[0x85490a]
> postgres: postgres pryzbyj [local] VACUUM[0x854a53]
> postgres: postgres pryzbyj [local] VACUUM(PortalRun+0x179)[0x855029]
> postgres: postgres pryzbyj [local] VACUUM[0x85099b]
> postgres: postgres pryzbyj [local] VACUUM(PostgresMain+0x199a)[0x85268a]
> postgres: postgres pryzbyj [local] VACUUM[0x496a21]
> postgres: postgres pryzbyj [local] VACUUM(PostmasterMain+0x11c0)[0x7b3980]
> postgres: postgres pryzbyj [local] VACUUM(main+0x1c6)[0x4986a6]
> /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f52c4b893d5]
> postgres: postgres pryzbyj [local] VACUUM[0x498c59]
> < 2022-08-18 07:56:51.963 CDT >LOG: server process (PID 27635) was terminated by signal 6: Aborted
> < 2022-08-18 07:56:51.963 CDT >DETAIL: Failed process was running: VACUUM ANALYZE alarms
>
> Unfortunately, it looks like the RPM packages are compiled with -O2, so this is
> of limited use.
>
> Core was generated by `postgres: postgres pryzbyj [local] VACUUM '.
> Program terminated with signal 6, Aborted.
> #0 0x00007f52c4b9d207 in raise () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install audit-libs-2.8.4-4.el7.x86_64 bzip2-libs-1.0.6-13.el7.x86_64
cyrus-sasl-lib-2.1.26-23.el7.x86_64elfutils-libelf-0.176-5.el7.x86_64 elfutils-libs-0.176-5.el7.x86_64
glibc-2.17-260.el7_6.3.x86_64keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-51.el7_9.x86_64
libattr-2.4.46-13.el7.x86_64libcap-2.22-9.el7.x86_64 libcap-ng-0.7.5-4.el7.x86_64 libcom_err-1.42.9-19.el7.x86_64
libgcc-4.8.5-39.el7.x86_64libgcrypt-1.5.3-14.el7.x86_64 libgpg-error-1.12-3.el7.x86_64 libicu-50.1.2-17.el7.x86_64
libselinux-2.5-15.el7.x86_64libstdc++-4.8.5-39.el7.x86_64 libxml2-2.9.1-6.el7_9.6.x86_64 libzstd-1.5.2-1.el7.x86_64
lz4-1.7.5-2.el7.x86_64nspr-4.19.0-1.el7_5.x86_64 nss-3.36.0-7.1.el7_6.x86_64 nss-softokn-freebl-3.36.0-5.el7_5.x86_64
nss-util-3.36.0-1.1.el7_6.x86_64openldap-2.4.44-21.el7_6.x86_64 openssl-libs-1.0.2k-22.el7_9.x86_64
pam-1.1.8-22.el7.x86_64pcre-8.32-17.el7.x86_64 systemd-libs-219-62.el7_6.5.x86_64 xz-libs-5.2.2-1.el7.x86_64
zlib-1.2.7-18.el7.x86_64
> (gdb) bt
> #0 0x00007f52c4b9d207 in raise () from /lib64/libc.so.6
> #1 0x00007f52c4b9e8f8 in abort () from /lib64/libc.so.6
> #2 0x000000000099da1e in ExceptionalCondition (conditionName=conditionName@entry=0xafae40 "indstats->status ==
PARALLEL_INDVAC_STATUS_INITIAL",errorType=errorType@entry=0x9fb4b7 "FailedAssertion",
> fileName=fileName@entry=0xafb0c0 "vacuumparallel.c", lineNumber=lineNumber@entry=611) at assert.c:69
> #3 0x00000000006915db in parallel_vacuum_process_all_indexes (pvs=0x2e85f80, num_index_scans=<optimized out>,
vacuum=<optimizedout>) at vacuumparallel.c:611
> #4 0x00000000005083e6 in heap_vacuum_rel (rel=<optimized out>, params=<optimized out>, bstrategy=<optimized out>) at
vacuumlazy.c:2679
It seems that parallel_vacuum_cleanup_all_indexes() got called[1],
which means this was the first time to perform parallel vacuum (i.e.,
index cleanup).
I'm not convinced yet but it could be a culprit that we missed doing
memset(0) for the shared array of PVIndStats in
parallel_vacuum_init(). This shared array was introduced in PG15.
[1] https://github.com/postgres/postgres/blob/REL_15_STABLE/src/backend/access/heap/vacuumlazy.c#L2679
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Thu, Aug 18, 2022 at 11:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Hi,
>
> On Thu, Aug 18, 2022 at 10:34 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > Immediately after upgrading an internal instance, a loop around "vacuum" did
> > this:
>
> Thank you for the report!
>
> >
> > TRAP: FailedAssertion("indstats->status == PARALLEL_INDVAC_STATUS_INITIAL", File: "vacuumparallel.c", Line: 611,
PID:27635)
> > postgres: postgres pryzbyj [local] VACUUM(ExceptionalCondition+0x8d)[0x99d9fd]
> > postgres: postgres pryzbyj [local] VACUUM[0x6915db]
> > postgres: postgres pryzbyj [local] VACUUM(heap_vacuum_rel+0x12b6)[0x5083e6]
> > postgres: postgres pryzbyj [local] VACUUM[0x68e97a]
> > postgres: postgres pryzbyj [local] VACUUM(vacuum+0x48e)[0x68fe9e]
> > postgres: postgres pryzbyj [local] VACUUM(ExecVacuum+0x2ae)[0x69065e]
> > postgres: postgres pryzbyj [local] VACUUM(standard_ProcessUtility+0x530)[0x8567b0]
> > /usr/pgsql-15/lib/pg_stat_statements.so(+0x5450)[0x7f52b891c450]
> > postgres: postgres pryzbyj [local] VACUUM[0x85490a]
> > postgres: postgres pryzbyj [local] VACUUM[0x854a53]
> > postgres: postgres pryzbyj [local] VACUUM(PortalRun+0x179)[0x855029]
> > postgres: postgres pryzbyj [local] VACUUM[0x85099b]
> > postgres: postgres pryzbyj [local] VACUUM(PostgresMain+0x199a)[0x85268a]
> > postgres: postgres pryzbyj [local] VACUUM[0x496a21]
> > postgres: postgres pryzbyj [local] VACUUM(PostmasterMain+0x11c0)[0x7b3980]
> > postgres: postgres pryzbyj [local] VACUUM(main+0x1c6)[0x4986a6]
> > /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f52c4b893d5]
> > postgres: postgres pryzbyj [local] VACUUM[0x498c59]
> > < 2022-08-18 07:56:51.963 CDT >LOG: server process (PID 27635) was terminated by signal 6: Aborted
> > < 2022-08-18 07:56:51.963 CDT >DETAIL: Failed process was running: VACUUM ANALYZE alarms
> >
> > Unfortunately, it looks like the RPM packages are compiled with -O2, so this is
> > of limited use.
> >
> > Core was generated by `postgres: postgres pryzbyj [local] VACUUM '.
> > Program terminated with signal 6, Aborted.
> > #0 0x00007f52c4b9d207 in raise () from /lib64/libc.so.6
> > Missing separate debuginfos, use: debuginfo-install audit-libs-2.8.4-4.el7.x86_64 bzip2-libs-1.0.6-13.el7.x86_64
cyrus-sasl-lib-2.1.26-23.el7.x86_64elfutils-libelf-0.176-5.el7.x86_64 elfutils-libs-0.176-5.el7.x86_64
glibc-2.17-260.el7_6.3.x86_64keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-51.el7_9.x86_64
libattr-2.4.46-13.el7.x86_64libcap-2.22-9.el7.x86_64 libcap-ng-0.7.5-4.el7.x86_64 libcom_err-1.42.9-19.el7.x86_64
libgcc-4.8.5-39.el7.x86_64libgcrypt-1.5.3-14.el7.x86_64 libgpg-error-1.12-3.el7.x86_64 libicu-50.1.2-17.el7.x86_64
libselinux-2.5-15.el7.x86_64libstdc++-4.8.5-39.el7.x86_64 libxml2-2.9.1-6.el7_9.6.x86_64 libzstd-1.5.2-1.el7.x86_64
lz4-1.7.5-2.el7.x86_64nspr-4.19.0-1.el7_5.x86_64 nss-3.36.0-7.1.el7_6.x86_64 nss-softokn-freebl-3.36.0-5.el7_5.x86_64
nss-util-3.36.0-1.1.el7_6.x86_64openldap-2.4.44-21.el7_6.x86_64 openssl-libs-1.0.2k-22.el7_9.x86_64
pam-1.1.8-22.el7.x86_64pcre-8.32-17.el7.x86_64 systemd-libs-219-62.el7_6.5.x86_64 xz-libs-5.2.2-1.el7.x86_64
zlib-1.2.7-18.el7.x86_64
> > (gdb) bt
> > #0 0x00007f52c4b9d207 in raise () from /lib64/libc.so.6
> > #1 0x00007f52c4b9e8f8 in abort () from /lib64/libc.so.6
> > #2 0x000000000099da1e in ExceptionalCondition (conditionName=conditionName@entry=0xafae40 "indstats->status ==
PARALLEL_INDVAC_STATUS_INITIAL",errorType=errorType@entry=0x9fb4b7 "FailedAssertion",
> > fileName=fileName@entry=0xafb0c0 "vacuumparallel.c", lineNumber=lineNumber@entry=611) at assert.c:69
> > #3 0x00000000006915db in parallel_vacuum_process_all_indexes (pvs=0x2e85f80, num_index_scans=<optimized out>,
vacuum=<optimizedout>) at vacuumparallel.c:611
> > #4 0x00000000005083e6 in heap_vacuum_rel (rel=<optimized out>, params=<optimized out>, bstrategy=<optimized out>)
atvacuumlazy.c:2679
>
> It seems that parallel_vacuum_cleanup_all_indexes() got called[1],
> which means this was the first time to perform parallel vacuum (i.e.,
> index cleanup).
Sorry, this explanation is wrong. But according to the recent
information from Justin it was the first time to perform parallel
vacuum:
#3 0x00000000006874f1 in parallel_vacuum_process_all_indexes
(pvs=0x25bdce0, num_index_scans=0, vacuum=vacuum@entry=false) at
vacuumparallel.c:611
611 Assert(indstats->status ==
PARALLEL_INDVAC_STATUS_INITIAL);
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Thu, Aug 18, 2022 at 11:04 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Thu, Aug 18, 2022 at 08:34:06AM -0500, Justin Pryzby wrote:
> > Unfortunately, it looks like the RPM packages are compiled with -O2, so this is
> > of limited use. So I'll be back shortly with more...
>
> #3 0x00000000006874f1 in parallel_vacuum_process_all_indexes (pvs=0x25bdce0, num_index_scans=0,
vacuum=vacuum@entry=false)at vacuumparallel.c:611
> 611 Assert(indstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
>
> (gdb) p *pvs
> $1 = {pcxt = 0x25bc1e0, indrels = 0x25bbf70, nindexes = 8, shared = 0x7fc5184393a0, indstats = 0x7fc5184393e0,
dead_items= 0x7fc5144393a0, buffer_usage = 0x7fc514439280, wal_usage = 0x7fc514439240,
> will_parallel_vacuum = 0x266d818, nindexes_parallel_bulkdel = 5, nindexes_parallel_cleanup = 0,
nindexes_parallel_condcleanup= 5, bstrategy = 0x264f120, relnamespace = 0x0, relname = 0x0, indname = 0x0,
> status = PARALLEL_INDVAC_STATUS_INITIAL}
>
> (gdb) p *indstats
> $2 = {status = 11, parallel_workers_can_process = false, istat_updated = false, istat = {num_pages = 0,
estimated_count= false, num_index_tuples = 0, tuples_removed = 0, pages_newly_deleted = 0, pages_deleted = 1,
> pages_free = 0}}
The status = 11 is invalid value. Probably because indstats was not
initialized to 0 as I mentioned.
Justin, if it's reproducible in your environment, could you please try
it again with the attached patch?
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Вложения
On Thu, Aug 18, 2022 at 11:24:22PM +0900, Masahiko Sawada wrote: > The status = 11 is invalid value. Probably because indstats was not > initialized to 0 as I mentioned. > > Justin, if it's reproducible in your environment, could you please try > it again with the attached patch? Yes, this seems to resolve the problem. Thanks, -- Justin
On Thu, Aug 18, 2022 at 09:52:36AM -0500, Justin Pryzby wrote: > On Thu, Aug 18, 2022 at 11:24:22PM +0900, Masahiko Sawada wrote: > > The status = 11 is invalid value. Probably because indstats was not > > initialized to 0 as I mentioned. > > > > Justin, if it's reproducible in your environment, could you please try > > it again with the attached patch? > > Yes, this seems to resolve the problem. It seems a bit crazy that this escaped detection until now. Are these allocations especially vulnerable to uninitialized data ? -- Justin
On Thu, Aug 18, 2022 at 7:25 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Justin, if it's reproducible in your environment, could you please try > it again with the attached patch? Pushed, thanks. I wonder how this issue could have been caught earlier, or even avoided in the first place. Would the bug have been caught if Valgrind had known to mark dynamic shared memory VALGRIND_MAKE_MEM_UNDEFINED() when it is first allocated? ISTM that we should do something that is analogous to aset.c's Valgrind handling for palloc() requests. Similar work on buffers in shared memory led to us catching a tricky bug involving unsafe access to a buffer, a little while ago -- see bugfix commit 7b7ed046. The bug in question would probably have taken much longer to catch without the instrumentation. In fact, it seems like a good idea to use Valgrind for *anything* where it *might* catch bugs, just in case. Valgrind can work well for shared memory without any extra work. The backend's own idea of the memory (the memory mapping used by the process) is all that Valgrind cares about. You don't have to worry about Valgrind instrumentation in one backend causing confusion in another backend. It's very practical, and very general purpose. I think that most of the protection comes from a basic understanding of "this memory is unsafe to access, this memory contains uninitialized data that cannot be assumed to have any particular value, this memory is initialized and safe". -- Peter Geoghegan