Обсуждение: BRIN autosummarization lacking a snapshot
If you have a BRIN index with autosummarize=on, and it's based on a function that requires a snapshot, autovacuum will simply error out when trying to summarize a range. Here's a reproducer: create table journal (d timestamp); create function packdate(d timestamp) returns text as $$ begin return to_char(d, 'yyyymm'); end; $$ language plpgsql returns null on null input immutable; create index on journal using brin (packdate(d)) with (autosummarize = on, pages_per_range = 1); Now insert some data, insert into journal select generate_series(timestamp '2025-01-01', '2025-12-31', '1 day'); and wait for autovacuum to fire. You'll get an error like 2025-11-03 12:17:42.263 CET [536755] ERROR: cannot execute SQL without an outer snapshot or portal This patch fixes it. I haven't given much thought to adding a good way to test this yet ... -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Someone said that it is at least an order of magnitude more work to do production software than a prototype. I think he is wrong by at least an order of magnitude." (Brian Kernighan)
Вложения
On Mon, Nov 03, 2025 at 12:21:50PM +0100, Alvaro Herrera wrote: > This patch fixes it. I haven't given much thought to adding a good way > to test this yet ... Spawning an autovacuum worker can feel artistic as we try to make the tests run fast, but it's not that bad. The trick is to use an "autovacuum_naptime = 1". Then you could either scan the server logs for some 'autovacuum: processing database "blah"', or just a polling query based on pg_stat_all_tables.autovacuum_count. See for example 006_signal_autovacuum.pl. -- Michael
Вложения
On 2025-Nov-04, Michael Paquier wrote: > Spawning an autovacuum worker can feel artistic as we try to make the > tests run fast, but it's not that bad. The trick is to use an > "autovacuum_naptime = 1". Then you could either scan the server logs > for some 'autovacuum: processing database "blah"', or just a polling > query based on pg_stat_all_tables.autovacuum_count. See for example > 006_signal_autovacuum.pl. Ah yes ... and, actually, we already have a file doing a closely related thing, so I added to it. Here's the patch for master. Backbranches are essentially identical, modulo these changes for 13 and 14: -use Test::More tests => 2; +use Test::More tests => 4; I'm glad we got rid of that :-) With my initial try of this test, just counting the number of BRIN tuples, I was _really_ surprised that the index did indeed contain the expected number of tuples, even when the error was being thrown. This turned out to be expected, because the way BRIN summarization works is that we insert a placeholder tuple first, then update it to the correct value, and the error only aborts the second part. That's why I needed to add a WHERE clause to only count non-placeholder tuples. I also added a 'sleep(1)', to avoid looping on the query when we know autovacuum can't possibly have had a chance to run yet. I unleashed CI on branches 15 and master, and will push soon if they both turn green. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "La virtud es el justo medio entre dos defectos" (Aristóteles)
Вложения
> With my initial try of this test, just counting the number of BRIN > tuples, I was _really_ surprised that the index did indeed contain the > expected number of tuples, even when the error was being thrown. This > turned out to be expected, because the way BRIN summarization works is > that we insert a placeholder tuple first, then update it to the correct > value, and the error only aborts the second part. One thing that's not fully clear to me, but will test later, is that if this has happened to you, then the placeholder tuple remains in place and doesn't ever become non-placeholder. If vacuum (incl. autovacuum) sees such a tuple, it will gladly ignore the page range, as if it were already summarized. This makes your index scans potentially inefficient, because for placeholder tuples bringetbitmap will always include the affected page range. I think we should make vacuum clean it up somehow, but I'm not yet sure how safe it is. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "El Maquinismo fue proscrito so pena de cosquilleo hasta la muerte" (Ijon Tichy en Viajes, Stanislaw Lem)
Hi, On Tue, Nov 4, 2025 at 5:22 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > > > > With my initial try of this test, just counting the number of BRIN > > tuples, I was _really_ surprised that the index did indeed contain the > > expected number of tuples, even when the error was being thrown. This > > turned out to be expected, because the way BRIN summarization works is > > that we insert a placeholder tuple first, then update it to the correct > > value, and the error only aborts the second part. > > One thing that's not fully clear to me, but will test later, is that if > this has happened to you, then the placeholder tuple remains in place > and doesn't ever become non-placeholder. If vacuum (incl. autovacuum) > sees such a tuple, it will gladly ignore the page range, as if it were > already summarized. This makes your index scans potentially > inefficient, because for placeholder tuples bringetbitmap will always > include the affected page range. > > I think we should make vacuum clean it up somehow, but I'm not yet sure > how safe it is. > +1. I noticed the same behavior while working on amcheck support for BRIN. I think it should be safe because we hold ShareUpdateExclusiveLock during any summarization, desummarization, or vacuum. So it means we can't have concurrent summarization during the vacuum. So if we encounter a placeholder during vacuum, it is always the result of "aborted" summarization. At least brinRevmapDesummarizeRange always treats placeholders this way. Maybe we can use brinRevmapDesummarizeRange for this? FWIW there is a deadlock issue with brinRevmapDesummarizeRange [0], maybe it is worth addressing if we want to start using it during vacuum. [0] https://www.postgresql.org/message-id/flat/261e68bc-f5f5-5234-fb2c-af4f583513c0%40enterprisedb.com Best regards, Arseniy Mukhin
On 2025-Nov-04, Álvaro Herrera wrote:
> With my initial try of this test, just counting the number of BRIN
> tuples, I was _really_ surprised that the index did indeed contain the
> expected number of tuples, even when the error was being thrown. This
> turned out to be expected, because the way BRIN summarization works is
> that we insert a placeholder tuple first, then update it to the correct
> value, and the error only aborts the second part. That's why I needed
> to add a WHERE clause to only count non-placeholder tuples.
I see that skink (buildfarm animal that runs under valgrind) has failed.
Gotta vacate the premises, will study later.
2025-11-04 16:44:46.271 CET [2118443][autovacuum worker][108/6:0] LOG: process 2118443 still waiting for
ShareUpdateExclusiveLockon relation 16436 of database 5 after 1016.528 ms
2025-11-04 16:44:46.271 CET [2118443][autovacuum worker][108/6:0] DETAIL: Process holding the lock: 2118078. Wait
queue:2118443.
2025-11-04 16:44:46.271 CET [2118443][autovacuum worker][108/6:0] CONTEXT: waiting for ShareUpdateExclusiveLock on
relation16436 of database 5
2025-11-04 16:44:46.298 CET [2118078][autovacuum worker][103/9:766] ERROR: canceling autovacuum task
2025-11-04 16:44:46.298 CET [2118078][autovacuum worker][103/9:766] CONTEXT: automatic analyze of table
"postgres.public.journal"
2025-11-04 16:44:46.382 CET [2118443][autovacuum worker][108/6:0] LOG: process 2118443 acquired
ShareUpdateExclusiveLockon relation 16436 of database 5 after 1188.860 ms
2025-11-04 16:44:46.382 CET [2118443][autovacuum worker][108/6:0] CONTEXT: waiting for ShareUpdateExclusiveLock on
relation16436 of database 5
2025-11-04 16:44:46.975 CET [2118946][autovacuum worker][110/7:0] LOG: skipping analyze of "journal" --- lock not
available
2025-11-04 16:44:47.490 CET [2118078][autovacuum worker][103/10:0] LOG: process 2118078 still waiting for
ShareUpdateExclusiveLockon relation 16436 of database 5 after 1017.402 ms
2025-11-04 16:44:47.490 CET [2118078][autovacuum worker][103/10:0] DETAIL: Process holding the lock: 2118443. Wait
queue:2118078, 2118946.
2025-11-04 16:44:47.490 CET [2118078][autovacuum worker][103/10:0] CONTEXT: waiting for ShareUpdateExclusiveLock on
relation16436 of database 5
2025-11-04 16:44:47.792 CET [2118443][autovacuum worker][108/6:0] ERROR: canceling autovacuum task
2025-11-04 16:44:47.792 CET [2118443][autovacuum worker][108/6:0] CONTEXT: processing work entry for relation
"postgres.public.brin_packdate_idx"
2025-11-04 16:44:47.810 CET [2118078][autovacuum worker][103/10:0] LOG: process 2118078 acquired
ShareUpdateExclusiveLockon relation 16436 of database 5 after 1414.103 ms
2025-11-04 16:44:47.810 CET [2118078][autovacuum worker][103/10:0] CONTEXT: waiting for ShareUpdateExclusiveLock on
relation16436 of database 5
==2118443== VALGRINDERROR-BEGIN
==2118443== Invalid read of size 8
==2118443== at 0x4634F39: PopActiveSnapshot (snapmgr.c:777)
==2118443== by 0x43F693F: do_autovacuum (autovacuum.c:2561)
==2118443== by 0x43F6E2B: AutoVacWorkerMain (autovacuum.c:1604)
==2118443== by 0x43FA9C9: postmaster_child_launch (launch_backend.c:268)
==2118443== by 0x43FDD9E: StartChildProcess (postmaster.c:3991)
==2118443== by 0x43FE008: StartAutovacuumWorker (postmaster.c:4055)
==2118443== by 0x43FF078: process_pm_pmsignal (postmaster.c:3812)
==2118443== by 0x43FF93C: ServerLoop (postmaster.c:1706)
==2118443== by 0x4401080: PostmasterMain (postmaster.c:1403)
==2118443== by 0x432A55F: main (main.c:231)
==2118443== Address 0x10 is not stack'd, malloc'd or (recently) free'd
==2118443==
==2118443== VALGRINDERROR-END
{
<insert_a_suppression_name_here>
Memcheck:Addr8
fun:PopActiveSnapshot
fun:do_autovacuum
fun:AutoVacWorkerMain
fun:postmaster_child_launch
fun:StartChildProcess
fun:StartAutovacuumWorker
fun:process_pm_pmsignal
fun:ServerLoop
fun:PostmasterMain
fun:main
}
==2118443==
==2118443== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==2118443== Access not within mapped region at address 0x10
==2118443== at 0x4634F39: PopActiveSnapshot (snapmgr.c:777)
==2118443== by 0x43F693F: do_autovacuum (autovacuum.c:2561)
==2118443== by 0x43F6E2B: AutoVacWorkerMain (autovacuum.c:1604)
==2118443== by 0x43FA9C9: postmaster_child_launch (launch_backend.c:268)
==2118443== by 0x43FDD9E: StartChildProcess (postmaster.c:3991)
==2118443== by 0x43FE008: StartAutovacuumWorker (postmaster.c:4055)
==2118443== by 0x43FF078: process_pm_pmsignal (postmaster.c:3812)
==2118443== by 0x43FF93C: ServerLoop (postmaster.c:1706)
==2118443== by 0x4401080: PostmasterMain (postmaster.c:1403)
==2118443== by 0x432A55F: main (main.c:231)
==2118443== If you believe this happened as a result of a stack
==2118443== overflow in your program's main thread (unlikely but
==2118443== possible), you can try to increase the size of the
==2118443== main thread stack using the --main-stacksize= flag.
==2118443== The main thread stack size used in this run was 8388608.
2025-11-04 16:44:48.010 CET [2118946][autovacuum worker][110/8:0] LOG: process 2118946 still waiting for
ShareUpdateExclusiveLockon relation 16436 of database 5 after 1013.910 ms
2025-11-04 16:44:48.010 CET [2118946][autovacuum worker][110/8:0] DETAIL: Process holding the lock: 2118078. Wait
queue:2118946.
2025-11-04 16:44:48.010 CET [2118946][autovacuum worker][110/8:0] CONTEXT: waiting for ShareUpdateExclusiveLock on
relation16436 of database 5
2025-11-04 16:44:48.098 CET [2118946][autovacuum worker][110/8:0] LOG: process 2118946 acquired
ShareUpdateExclusiveLockon relation 16436 of database 5 after 1111.825 ms
2025-11-04 16:44:48.098 CET [2118946][autovacuum worker][110/8:0] CONTEXT: waiting for ShareUpdateExclusiveLock on
relation16436 of database 5
2025-11-04 16:44:48.482 CET [2119181][client backend][8/2:0] LOG: statement: select count(*) from
brin_page_items(get_raw_page('brin_wi_idx',2), 'brin_wi_idx'::regclass)
where not placeholder;
2025-11-04 16:44:48.799 CET [2109989][postmaster][:0] LOG: autovacuum worker (PID 2118443) was terminated by signal
11:Segmentation fault
2025-11-04 16:44:48.799 CET [2109989][postmaster][:0] DETAIL: Failed process was running: autovacuum: BRIN summarize
public.brin_packdate_idx1
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
On 2025-Nov-04, Álvaro Herrera wrote: > 2025-11-04 16:44:47.792 CET [2118443][autovacuum worker][108/6:0] ERROR: canceling autovacuum task > 2025-11-04 16:44:47.792 CET [2118443][autovacuum worker][108/6:0] CONTEXT: processing work entry for relation "postgres.public.brin_packdate_idx" > 2025-11-04 16:44:47.810 CET [2118078][autovacuum worker][103/10:0] LOG: process 2118078 acquired ShareUpdateExclusiveLockon relation 16436 of database 5 after 1414.103 ms > ==2118443== VALGRINDERROR-BEGIN > ==2118443== Invalid read of size 8 > ==2118443== at 0x4634F39: PopActiveSnapshot (snapmgr.c:777) > ==2118443== by 0x43F693F: do_autovacuum (autovacuum.c:2561) > ==2118443== by 0x43F6E2B: AutoVacWorkerMain (autovacuum.c:1604) Ah, the problem is that perform_work_item() pushes an active snapshot before calling the BRIN function, but because of the error, we terminate the transaction before returning, so when we try to Pop that snapshot, it doesn't exist anymore. Other ways for this to happen would be an autovacuum worker trying to run an item for a dropped table (the "goto deleted2" case). Hmm ... -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On 2025-Nov-04, Álvaro Herrera wrote: > Ah, the problem is that perform_work_item() pushes an active snapshot > before calling the BRIN function, but because of the error, we > terminate the transaction before returning, so when we try to Pop that > snapshot, it doesn't exist anymore. Other ways for this to happen would > be an autovacuum worker trying to run an item for a dropped table (the > "goto deleted2" case). Of course, the answer is just to inquire ActiveSnapshotSet() beforehand. I tested this by adding an elog(ERROR) in perform_work_item() -- without this fix, I see a segfault, which disappears with it. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Debido a que la velocidad de la luz es mucho mayor que la del sonido, algunas personas nos parecen brillantes un minuto antes de escuchar las pelotudeces que dicen." (Roberto Fontanarrosa)