Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
От | Alexander Lakhin |
---|---|
Тема | Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune() |
Дата | |
Msg-id | 5cbe0b03-d6f3-501d-3849-534568b0e776@gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune() (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
|
Список | pgsql-bugs |
Hi Robert, 05.04.2024 23:20, Robert Haas wrote: > On Fri, Oct 29, 2021 at 9:30 AM Alexander Lakhin <exclusion@gmail.com> wrote: >> I can propose the debugging patch to reproduce the issue that replaces >> the hang with the assert and modifies a pair of crash-causing test >> scripts to simplify the reproducing. (Sorry, I have no time now to prune >> down the scripts further as I have to leave for a week.) > Just FYI, I tried to reproduce this today on v16, using this formula, > with some hacking around to try to get it working on my MacBook, and I > couldn't get it to crash. I've refreshed the script and simplified it a bit not to use Linux specifics. This works for me (on REL_14_0, with the patch applied, CPPFLAGS="-O0" ./configure --enable-debug --enable-cassert ...): echo " autovacuum=off fsync=off " >> "$PGDATA/postgresql.conf" pg_ctl -w -l server.log start export PGDATABASE=regression createdb regression echo " vacuum (verbose, skip_locked, index_cleanup off) pg_catalog.pg_class; select pg_sleep(random()/50); " >/tmp/17257/pseudo-autovacuum.sql export PGDATABASE=regression createdb regression pgbench -n -f /tmp/17257/inherit.sql -C -T 1200 >pgbench-1.log 2>&1 & pgbench -n -f /tmp/17257/vacuum.sql -C -T 1200 >pgbench-2.log 2>&1 & pgbench -n -f /tmp/17257/pseudo-autovacuum.sql -C -c 10 -T 1200 >pgbench-3.log 2>&1 & wait grep -E "(TRAP|terminated)" server.log (Please use the attached inherit.sql, vacuum.sql (excerpts from src/test/sql/{inherit,vacuum}.sql).) With PGDATA placed on tmpfs, this script failed for me after 1m31s, 2m35s, 4m12s: TRAP: FailedAssertion("numretries < 100", File: "vacuumlazy.c", Line: 1726, PID: 951498) Another possible outcome: TRAP: FailedAssertion("relid == targetRelId", File: "relcache.c", Line: 1062, PID: 1257766) And also: 2024-04-07 05:03:21.656 UTC [2905313] LOG: server process (PID 2984687) was terminated by signal 6: Aborted 2024-04-07 05:03:21.656 UTC [2905313] DETAIL: Failed process was running: create table matest0 (id serial primary key, name text); With the stack trace: ... #4 0x00007fc30b4007f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x0000559f50220719 in index_delete_sort_cmp (deltid1=0x559f523a9f40, deltid2=0x7ffd2f9623f8) at heapam.c:7582 #6 0x0000559f50220847 in index_delete_sort (delstate=0x7ffd2f9636f0) at heapam.c:7623 ... (as in [1]) But on dad1539ae I got no failures for 3 runs (the same is on REL_16_STABLE with a slightly modified lazy_scan_prune patch). [1] https://www.postgresql.org/message-id/17255-14c0ac58d0f9b583%40postgresql.org Best regards, Alexander
Вложения
В списке pgsql-bugs по дате отправления: