Обсуждение: BRIN autosummarization lacking a snapshot

Поиск
Список
Период
Сортировка

BRIN autosummarization lacking a snapshot

От
Álvaro Herrera
Дата:
If you have a BRIN index with autosummarize=on, and it's based on a
function that requires a snapshot, autovacuum will simply error out when
trying to summarize a range.  Here's a reproducer:

create table journal (d timestamp);
create function packdate(d timestamp) returns text as $$
   begin return to_char(d, 'yyyymm'); end; $$
   language plpgsql returns null on null input immutable;
create index on journal using brin (packdate(d))
  with (autosummarize = on, pages_per_range = 1);

Now insert some data,
  insert into journal select generate_series(timestamp '2025-01-01', '2025-12-31', '1 day');

and wait for autovacuum to fire.  You'll get an error like

  2025-11-03 12:17:42.263 CET [536755] ERROR:  cannot execute SQL without an outer snapshot or portal

This patch fixes it.  I haven't given much thought to adding a good way
to test this yet ...

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Someone said that it is at least an order of magnitude more work to do
production software than a prototype. I think he is wrong by at least
an order of magnitude."                              (Brian Kernighan)

Вложения

Re: BRIN autosummarization lacking a snapshot

От
Michael Paquier
Дата:
On Mon, Nov 03, 2025 at 12:21:50PM +0100, Alvaro Herrera wrote:
> This patch fixes it.  I haven't given much thought to adding a good way
> to test this yet ...

Spawning an autovacuum worker can feel artistic as we try to make the
tests run fast, but it's not that bad.  The trick is to use an
"autovacuum_naptime = 1".  Then you could either scan the server logs
for some 'autovacuum: processing database "blah"', or just a polling
query based on pg_stat_all_tables.autovacuum_count.  See for example
006_signal_autovacuum.pl.
--
Michael

Вложения

Re: BRIN autosummarization lacking a snapshot

От
Álvaro Herrera
Дата:
On 2025-Nov-04, Michael Paquier wrote:

> Spawning an autovacuum worker can feel artistic as we try to make the
> tests run fast, but it's not that bad.  The trick is to use an
> "autovacuum_naptime = 1".  Then you could either scan the server logs
> for some 'autovacuum: processing database "blah"', or just a polling
> query based on pg_stat_all_tables.autovacuum_count.  See for example
> 006_signal_autovacuum.pl.

Ah yes ... and, actually, we already have a file doing a closely related
thing, so I added to it.  Here's the patch for master.  Backbranches are
essentially identical, modulo these changes for 13 and 14:

-use Test::More tests => 2;
+use Test::More tests => 4;

I'm glad we got rid of that :-)


With my initial try of this test, just counting the number of BRIN
tuples, I was _really_ surprised that the index did indeed contain the
expected number of tuples, even when the error was being thrown.  This
turned out to be expected, because the way BRIN summarization works is
that we insert a placeholder tuple first, then update it to the correct
value, and the error only aborts the second part.  That's why I needed
to add a WHERE clause to only count non-placeholder tuples.

I also added a 'sleep(1)', to avoid looping on the query when we know
autovacuum can't possibly have had a chance to run yet.

I unleashed CI on branches 15 and master, and will push soon if they
both turn green.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"La virtud es el justo medio entre dos defectos" (Aristóteles)

Вложения

Re: BRIN autosummarization lacking a snapshot

От
Álvaro Herrera
Дата:
> With my initial try of this test, just counting the number of BRIN
> tuples, I was _really_ surprised that the index did indeed contain the
> expected number of tuples, even when the error was being thrown.  This
> turned out to be expected, because the way BRIN summarization works is
> that we insert a placeholder tuple first, then update it to the correct
> value, and the error only aborts the second part.

One thing that's not fully clear to me, but will test later, is that if
this has happened to you, then the placeholder tuple remains in place
and doesn't ever become non-placeholder.  If vacuum (incl. autovacuum)
sees such a tuple, it will gladly ignore the page range, as if it were
already summarized.  This makes your index scans potentially
inefficient, because for placeholder tuples bringetbitmap will always
include the affected page range.

I think we should make vacuum clean it up somehow, but I'm not yet sure
how safe it is.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"El Maquinismo fue proscrito so pena de cosquilleo hasta la muerte"
(Ijon Tichy en Viajes, Stanislaw Lem)



Re: BRIN autosummarization lacking a snapshot

От
Arseniy Mukhin
Дата:
Hi,

On Tue, Nov 4, 2025 at 5:22 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
>
> > With my initial try of this test, just counting the number of BRIN
> > tuples, I was _really_ surprised that the index did indeed contain the
> > expected number of tuples, even when the error was being thrown.  This
> > turned out to be expected, because the way BRIN summarization works is
> > that we insert a placeholder tuple first, then update it to the correct
> > value, and the error only aborts the second part.
>
> One thing that's not fully clear to me, but will test later, is that if
> this has happened to you, then the placeholder tuple remains in place
> and doesn't ever become non-placeholder.  If vacuum (incl. autovacuum)
> sees such a tuple, it will gladly ignore the page range, as if it were
> already summarized.  This makes your index scans potentially
> inefficient, because for placeholder tuples bringetbitmap will always
> include the affected page range.
>
> I think we should make vacuum clean it up somehow, but I'm not yet sure
> how safe it is.
>

+1. I noticed the same behavior while working on amcheck support for BRIN.

I think it should be safe because we hold ShareUpdateExclusiveLock
during any summarization, desummarization, or vacuum. So it means we
can't have concurrent summarization during the vacuum. So if we
encounter a placeholder during vacuum, it is always the result of
"aborted" summarization. At least brinRevmapDesummarizeRange always
treats placeholders this way. Maybe we can use
brinRevmapDesummarizeRange for this? FWIW there is a deadlock issue
with brinRevmapDesummarizeRange [0], maybe it is worth addressing if
we want to start using it during vacuum.


[0] https://www.postgresql.org/message-id/flat/261e68bc-f5f5-5234-fb2c-af4f583513c0%40enterprisedb.com


Best regards,
Arseniy Mukhin



Re: BRIN autosummarization lacking a snapshot

От
Álvaro Herrera
Дата:
On 2025-Nov-04, Álvaro Herrera wrote:

> With my initial try of this test, just counting the number of BRIN
> tuples, I was _really_ surprised that the index did indeed contain the
> expected number of tuples, even when the error was being thrown.  This
> turned out to be expected, because the way BRIN summarization works is
> that we insert a placeholder tuple first, then update it to the correct
> value, and the error only aborts the second part.  That's why I needed
> to add a WHERE clause to only count non-placeholder tuples.

I see that skink (buildfarm animal that runs under valgrind) has failed.
Gotta vacate the premises, will study later.

2025-11-04 16:44:46.271 CET [2118443][autovacuum worker][108/6:0] LOG:  process 2118443 still waiting for
ShareUpdateExclusiveLockon relation 16436 of database 5 after 1016.528 ms
 
2025-11-04 16:44:46.271 CET [2118443][autovacuum worker][108/6:0] DETAIL:  Process holding the lock: 2118078. Wait
queue:2118443.
 
2025-11-04 16:44:46.271 CET [2118443][autovacuum worker][108/6:0] CONTEXT:  waiting for ShareUpdateExclusiveLock on
relation16436 of database 5
 
2025-11-04 16:44:46.298 CET [2118078][autovacuum worker][103/9:766] ERROR:  canceling autovacuum task
2025-11-04 16:44:46.298 CET [2118078][autovacuum worker][103/9:766] CONTEXT:  automatic analyze of table
"postgres.public.journal"
2025-11-04 16:44:46.382 CET [2118443][autovacuum worker][108/6:0] LOG:  process 2118443 acquired
ShareUpdateExclusiveLockon relation 16436 of database 5 after 1188.860 ms
 
2025-11-04 16:44:46.382 CET [2118443][autovacuum worker][108/6:0] CONTEXT:  waiting for ShareUpdateExclusiveLock on
relation16436 of database 5
 
2025-11-04 16:44:46.975 CET [2118946][autovacuum worker][110/7:0] LOG:  skipping analyze of "journal" --- lock not
available
2025-11-04 16:44:47.490 CET [2118078][autovacuum worker][103/10:0] LOG:  process 2118078 still waiting for
ShareUpdateExclusiveLockon relation 16436 of database 5 after 1017.402 ms
 
2025-11-04 16:44:47.490 CET [2118078][autovacuum worker][103/10:0] DETAIL:  Process holding the lock: 2118443. Wait
queue:2118078, 2118946.
 
2025-11-04 16:44:47.490 CET [2118078][autovacuum worker][103/10:0] CONTEXT:  waiting for ShareUpdateExclusiveLock on
relation16436 of database 5
 
2025-11-04 16:44:47.792 CET [2118443][autovacuum worker][108/6:0] ERROR:  canceling autovacuum task
2025-11-04 16:44:47.792 CET [2118443][autovacuum worker][108/6:0] CONTEXT:  processing work entry for relation
"postgres.public.brin_packdate_idx"
2025-11-04 16:44:47.810 CET [2118078][autovacuum worker][103/10:0] LOG:  process 2118078 acquired
ShareUpdateExclusiveLockon relation 16436 of database 5 after 1414.103 ms
 
2025-11-04 16:44:47.810 CET [2118078][autovacuum worker][103/10:0] CONTEXT:  waiting for ShareUpdateExclusiveLock on
relation16436 of database 5
 
==2118443== VALGRINDERROR-BEGIN
==2118443== Invalid read of size 8
==2118443==    at 0x4634F39: PopActiveSnapshot (snapmgr.c:777)
==2118443==    by 0x43F693F: do_autovacuum (autovacuum.c:2561)
==2118443==    by 0x43F6E2B: AutoVacWorkerMain (autovacuum.c:1604)
==2118443==    by 0x43FA9C9: postmaster_child_launch (launch_backend.c:268)
==2118443==    by 0x43FDD9E: StartChildProcess (postmaster.c:3991)
==2118443==    by 0x43FE008: StartAutovacuumWorker (postmaster.c:4055)
==2118443==    by 0x43FF078: process_pm_pmsignal (postmaster.c:3812)
==2118443==    by 0x43FF93C: ServerLoop (postmaster.c:1706)
==2118443==    by 0x4401080: PostmasterMain (postmaster.c:1403)
==2118443==    by 0x432A55F: main (main.c:231)
==2118443==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
==2118443== 
==2118443== VALGRINDERROR-END
{
   <insert_a_suppression_name_here>
   Memcheck:Addr8
   fun:PopActiveSnapshot
   fun:do_autovacuum
   fun:AutoVacWorkerMain
   fun:postmaster_child_launch
   fun:StartChildProcess
   fun:StartAutovacuumWorker
   fun:process_pm_pmsignal
   fun:ServerLoop
   fun:PostmasterMain
   fun:main
}
==2118443== 
==2118443== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==2118443==  Access not within mapped region at address 0x10
==2118443==    at 0x4634F39: PopActiveSnapshot (snapmgr.c:777)
==2118443==    by 0x43F693F: do_autovacuum (autovacuum.c:2561)
==2118443==    by 0x43F6E2B: AutoVacWorkerMain (autovacuum.c:1604)
==2118443==    by 0x43FA9C9: postmaster_child_launch (launch_backend.c:268)
==2118443==    by 0x43FDD9E: StartChildProcess (postmaster.c:3991)
==2118443==    by 0x43FE008: StartAutovacuumWorker (postmaster.c:4055)
==2118443==    by 0x43FF078: process_pm_pmsignal (postmaster.c:3812)
==2118443==    by 0x43FF93C: ServerLoop (postmaster.c:1706)
==2118443==    by 0x4401080: PostmasterMain (postmaster.c:1403)
==2118443==    by 0x432A55F: main (main.c:231)
==2118443==  If you believe this happened as a result of a stack
==2118443==  overflow in your program's main thread (unlikely but
==2118443==  possible), you can try to increase the size of the
==2118443==  main thread stack using the --main-stacksize= flag.
==2118443==  The main thread stack size used in this run was 8388608.
2025-11-04 16:44:48.010 CET [2118946][autovacuum worker][110/8:0] LOG:  process 2118946 still waiting for
ShareUpdateExclusiveLockon relation 16436 of database 5 after 1013.910 ms
 
2025-11-04 16:44:48.010 CET [2118946][autovacuum worker][110/8:0] DETAIL:  Process holding the lock: 2118078. Wait
queue:2118946.
 
2025-11-04 16:44:48.010 CET [2118946][autovacuum worker][110/8:0] CONTEXT:  waiting for ShareUpdateExclusiveLock on
relation16436 of database 5
 
2025-11-04 16:44:48.098 CET [2118946][autovacuum worker][110/8:0] LOG:  process 2118946 acquired
ShareUpdateExclusiveLockon relation 16436 of database 5 after 1111.825 ms
 
2025-11-04 16:44:48.098 CET [2118946][autovacuum worker][110/8:0] CONTEXT:  waiting for ShareUpdateExclusiveLock on
relation16436 of database 5
 
2025-11-04 16:44:48.482 CET [2119181][client backend][8/2:0] LOG:  statement: select count(*) from
brin_page_items(get_raw_page('brin_wi_idx',2), 'brin_wi_idx'::regclass)
 
         where not placeholder;
2025-11-04 16:44:48.799 CET [2109989][postmaster][:0] LOG:  autovacuum worker (PID 2118443) was terminated by signal
11:Segmentation fault
 
2025-11-04 16:44:48.799 CET [2109989][postmaster][:0] DETAIL:  Failed process was running: autovacuum: BRIN summarize
public.brin_packdate_idx1
 


-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/



Re: BRIN autosummarization lacking a snapshot

От
Álvaro Herrera
Дата:
On 2025-Nov-04, Álvaro Herrera wrote:

> 2025-11-04 16:44:47.792 CET [2118443][autovacuum worker][108/6:0] ERROR:  canceling autovacuum task
> 2025-11-04 16:44:47.792 CET [2118443][autovacuum worker][108/6:0] CONTEXT:  processing work entry for relation
"postgres.public.brin_packdate_idx"
> 2025-11-04 16:44:47.810 CET [2118078][autovacuum worker][103/10:0] LOG:  process 2118078 acquired
ShareUpdateExclusiveLockon relation 16436 of database 5 after 1414.103 ms
 

> ==2118443== VALGRINDERROR-BEGIN
> ==2118443== Invalid read of size 8
> ==2118443==    at 0x4634F39: PopActiveSnapshot (snapmgr.c:777)
> ==2118443==    by 0x43F693F: do_autovacuum (autovacuum.c:2561)
> ==2118443==    by 0x43F6E2B: AutoVacWorkerMain (autovacuum.c:1604)

Ah, the problem is that perform_work_item() pushes an active snapshot
before calling the BRIN function, but because of the error, we
terminate the transaction before returning, so when we try to Pop that
snapshot, it doesn't exist anymore.  Other ways for this to happen would
be an autovacuum worker trying to run an item for a dropped table (the
"goto deleted2" case).  Hmm ...

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: BRIN autosummarization lacking a snapshot

От
Álvaro Herrera
Дата:
On 2025-Nov-04, Álvaro Herrera wrote:

> Ah, the problem is that perform_work_item() pushes an active snapshot
> before calling the BRIN function, but because of the error, we
> terminate the transaction before returning, so when we try to Pop that
> snapshot, it doesn't exist anymore.  Other ways for this to happen would
> be an autovacuum worker trying to run an item for a dropped table (the
> "goto deleted2" case).

Of course, the answer is just to inquire ActiveSnapshotSet() beforehand.
I tested this by adding an elog(ERROR) in perform_work_item() -- without
this fix, I see a segfault, which disappears with it.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Debido a que la velocidad de la luz es mucho mayor que la del sonido,
 algunas personas nos parecen brillantes un minuto antes
 de escuchar las pelotudeces que dicen." (Roberto Fontanarrosa)

Вложения