Обсуждение: ERROR: found xmin 4133102167 from before relfrozenxid 4151440783

Поиск
Список
Период
Сортировка

ERROR: found xmin 4133102167 from before relfrozenxid 4151440783

От
Tushar Takate
Дата:
Hi Team,

I was reviewing a couple of community threads in pgsql-bugs and pgsql-general, however, I was unable to determine whether this is a bug or actual corruption.

Details as below 

PostgreSQL version: 15.12
Disk type: RAID5
OS: RHEL 8.10

Error/Issue :

vacuumdb: error: processing of database "live_order_us_db" failed: ERROR:  found xmin 4133102167 from before relfrozenxid 4151440783

2025-09-11 02:29:58.888 UTC,,,2362287,,68c233e1.240hbf,1,,2025-09-11 02:28:49 UTC,122/46371006,0,ERROR,XX001,"found xmin 4133102167 from before relfrozenxid 4151440783",,,,,"while scanning block 5821149 offset 5 of relation ""public.order""
2025-09-11 02:40:50.361 UTC,"prod_user_ap","live_order_us_db",2375672,"127.0.0.1:59344",68c2342b.243ff8,4,"VACUUM",2025-09-11 02:30:03 UTC,169/38875732,0,ERROR,XX001,"found xmin 4133102167 from before relfrozenxid 4151440783",,,,,"while scanning block 5821149 offset 5 of relation ""public.order""","VACUUM (VERBOSE, ANALYZE) public.order;",,,"vacuumdb","client backend",,-5528190995457849841

One more thing/observation we saw in the PostgreSQL logs :

The following message consistently appeared once a day during the past week

2025-09-10 23:33:14.469 UTC,,,157915,,68c21a46.268fb,3,,2025-09-10 23:31:18 UTC,45/49119328,0,WARNING,01000,"page is not marked all-visible but visibility map bit is set in relation ""order"" page 5815453",,,,,"while scanning block 5815453 of relation ""public.order""",,,,"","autovacuum worker",,0

What specific condition or scenario is triggering this PostgreSQL error? Can it be classified as a bug? If not, what’s a safe and efficient way to resolve it without relying on a dump and restore, particularly for large, mission-critical tables over 200GB?

-
Thanks & Regards,

Tushar.
LinkedIn : Tushar
My-Blogs : Tushar Blogspot 

Re: ERROR: found xmin 4133102167 from before relfrozenxid 4151440783

От
Laurenz Albe
Дата:
On Fri, 2025-09-12 at 22:09 +0530, Tushar Takate wrote:
> PostgreSQL version: 15.12
> Disk type: RAID5
> OS: RHEL 8.10
>
> Error/Issue :
>
> vacuumdb: error: processing of database "live_order_us_db" failed: ERROR:  found xmin 4133102167 from before
relfrozenxid4151440783 
>
> 2025-09-11 02:29:58.888 UTC,,,2362287,,68c233e1.240hbf,1,,2025-09-11 02:28:49 UTC,122/46371006,0,ERROR,XX001,"found
xmin4133102167 from before relfrozenxid 4151440783",,,,,"while scanning block 5821149 offset 5 of relation
""public.order""
> 2025-09-11 02:40:50.361
UTC,"prod_user_ap","live_order_us_db",2375672,"127.0.0.1:59344",68c2342b.243ff8,4,"VACUUM",2025-09-1102:30:03
UTC,169/38875732,0,ERROR,XX001,"foundxmin 4133102167 from before relfrozenxid 4151440783",,,,,"while scanning block
5821149offset 5 of relation ""public.order""","VACUUM (VERBOSE, ANALYZE) public.order;",,,"vacuumdb","client
backend",,-5528190995457849841

That is probably caused by a PostgreSQL bug; you can get rid of the message
by creating the "pg_surgery" extension and running

  SELECT heap_force_freeze('public.order'::regclass, '{(5821149,5)}'::tid[]);

> One more thing/observation we saw in the PostgreSQL logs :
>
> The following message consistently appeared once a day during the past week
>
> 2025-09-10 23:33:14.469 UTC,,,157915,,68c21a46.268fb,3,,2025-09-10 23:31:18 UTC,45/49119328,0,WARNING,01000,"page is
notmarked all-visible but visibility map bit is set in relation ""order"" page 5815453",,,,,"while scanning block
5815453of relation ""public.order""",,,,"","autovacuum worker",,0 
>
> What specific condition or scenario is triggering this PostgreSQL error? Can it be classified
> as a bug? If not, what’s a safe and efficient way to resolve it without relying on a dump
> and restore, particularly for large, mission-critical tables over 200GB?

That is some kind of data corruption, perhaps caused by a bug, perhaps by
something else.  The autovacuum run should fix that problem.

Yours,
Laurenz Albe



Re: ERROR: found xmin 4133102167 from before relfrozenxid 4151440783

От
Tushar Takate
Дата:
On Sat, Sep 13, 2025 at 2:40 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Fri, 2025-09-12 at 22:09 +0530, Tushar Takate wrote:
> PostgreSQL version: 15.12
> Disk type: RAID5
> OS: RHEL 8.10
>
> Error/Issue :
>
> vacuumdb: error: processing of database "live_order_us_db" failed: ERROR:  found xmin 4133102167 from before relfrozenxid 4151440783
>
> 2025-09-11 02:29:58.888 UTC,,,2362287,,68c233e1.240hbf,1,,2025-09-11 02:28:49 UTC,122/46371006,0,ERROR,XX001,"found xmin 4133102167 from before relfrozenxid 4151440783",,,,,"while scanning block 5821149 offset 5 of relation ""public.order""
> 2025-09-11 02:40:50.361 UTC,"prod_user_ap","live_order_us_db",2375672,"127.0.0.1:59344",68c2342b.243ff8,4,"VACUUM",2025-09-11 02:30:03 UTC,169/38875732,0,ERROR,XX001,"found xmin 4133102167 from before relfrozenxid 4151440783",,,,,"while scanning block 5821149 offset 5 of relation ""public.order""","VACUUM (VERBOSE, ANALYZE) public.order;",,,"vacuumdb","client backend",,-5528190995457849841

That is probably caused by a PostgreSQL bug; you can get rid of the message

In which version can we expect the fix for it? Also, can you please help to understand which specific condition or scenario is triggering this PostgreSQL error and skipping to freeze xmin?
 
by creating the "pg_surgery" extension and running

  SELECT heap_force_freeze('public.order'::regclass, '{(5821149,5)}'::tid[]);


I agree we can run pg_surgery , but the question is how safe it is to run for large and mission-critical tables over 200GB.
From pg_surgery docThese functions are unsafe by design and using them may corrupt (or further corrupt) your database
 

> One more thing/observation we saw in the PostgreSQL logs :
>
> The following message consistently appeared once a day during the past week
>
> 2025-09-10 23:33:14.469 UTC,,,157915,,68c21a46.268fb,3,,2025-09-10 23:31:18 UTC,45/49119328,0,WARNING,01000,"page is not marked all-visible but visibility map bit is set in relation ""order"" page 5815453",,,,,"while scanning block 5815453 of relation ""public.order""",,,,"","autovacuum worker",,0
>
> What specific condition or scenario is triggering this PostgreSQL error? Can it be classified
> as a bug? If not, what’s a safe and efficient way to resolve it without relying on a dump
> and restore, particularly for large, mission-critical tables over 200GB?

That is some kind of data corruption, perhaps caused by a bug, perhaps by
something else.  The autovacuum run should fix that problem.

This is something supporting data I have provided, before the issue, the above WARNING was seen in db-logs for the same table. 

 
Yours,
Laurenz Albe

Re: ERROR: found xmin 4133102167 from before relfrozenxid 4151440783

От
Laurenz Albe
Дата:
On Sat, 2025-09-13 at 06:40 +0530, Tushar Takate wrote:
> On Sat, Sep 13, 2025 at 2:40 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > On Fri, 2025-09-12 at 22:09 +0530, Tushar Takate wrote:
> > > PostgreSQL version: 15.12
> > > Disk type: RAID5
> > > OS: RHEL 8.10
> > >
> > > Error/Issue :
> > >
> > > vacuumdb: error: processing of database "live_order_us_db" failed: ERROR:  found xmin 4133102167 from before
relfrozenxid4151440783 
> > >
> > > 2025-09-11 02:29:58.888 UTC,,,2362287,,68c233e1.240hbf,1,,2025-09-11 02:28:49
UTC,122/46371006,0,ERROR,XX001,"foundxmin 4133102167 from before relfrozenxid 4151440783",,,,,"while scanning block
5821149offset 5 of relation ""public.order"" 
> > > 2025-09-11 02:40:50.361
UTC,"prod_user_ap","live_order_us_db",2375672,"127.0.0.1:59344",68c2342b.243ff8,4,"VACUUM",2025-09-1102:30:03
UTC,169/38875732,0,ERROR,XX001,"foundxmin 4133102167 from before relfrozenxid 4151440783",,,,,"while scanning block
5821149offset 5 of relation ""public.order""","VACUUM (VERBOSE, ANALYZE) public.order;",,,"vacuumdb","client
backend",,-5528190995457849841
> >
> > That is probably caused by a PostgreSQL bug; you can get rid of the message
>
> In which version can we expect the fix for it? Also, can you please help to understand
> which specific condition or scenario is triggering this PostgreSQL error and skipping
> to freeze xmin?

I *believe* there must be a bug that causes that problem, because I have seen that error
reported often enough that I don't think it can be attributed to hardware errors.
Unfortunately, I think that nobody knows how it happens, so we cannot fix it.

> > by creating the "pg_surgery" extension and running
> >
> >   SELECT heap_force_freeze('public.order'::regclass, '{(5821149,5)}'::tid[]);
>
> I agree we can run pg_surgery , but the question is how safe it is to run for large and mission-critical tables over
200GB.
> From pg_surgery doc: These functions are unsafe by design and using them may corrupt (or further corrupt) your
database

It is dangerous, and that has nothing to do with the size of the table.
If you do the wrong thing with that knife, you can cause more problems
than you fix.

> > > One more thing/observation we saw in the PostgreSQL logs :
> > >
> > > The following message consistently appeared once a day during the past week
> > >
> > > 2025-09-10 23:33:14.469 UTC,,,157915,,68c21a46.268fb,3,,2025-09-10 23:31:18 UTC,45/49119328,0,WARNING,01000,"page
isnot marked all-visible but visibility map bit is set in relation ""order"" page 5815453",,,,,"while scanning block
5815453of relation ""public.order""",,,,"","autovacuum worker",,0 
> > >
> > > What specific condition or scenario is triggering this PostgreSQL error? Can it be classified
> > > as a bug? If not, what’s a safe and efficient way to resolve it without relying on a dump
> > > and restore, particularly for large, mission-critical tables over 200GB?
> >
> > That is some kind of data corruption, perhaps caused by a bug, perhaps by
> > something else.  The autovacuum run should fix that problem.
>
> This is something supporting data I have provided, before the issue, the above WARNING was seen in db-logs for the
sametable.  

Sorry, I cannot parse that sentence.

Yours,
Laurenz Albe



Re: ERROR: found xmin 4133102167 from before relfrozenxid 4151440783

От
Pavel Stehule
Дата:
Hi

so 13. 9. 2025 v 3:24 odesílatel Laurenz Albe <laurenz.albe@cybertec.at> napsal:
On Sat, 2025-09-13 at 06:40 +0530, Tushar Takate wrote:
> On Sat, Sep 13, 2025 at 2:40 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > On Fri, 2025-09-12 at 22:09 +0530, Tushar Takate wrote:
> > > PostgreSQL version: 15.12
> > > Disk type: RAID5
> > > OS: RHEL 8.10
> > >
> > > Error/Issue :
> > >
> > > vacuumdb: error: processing of database "live_order_us_db" failed: ERROR:  found xmin 4133102167 from before relfrozenxid 4151440783
> > >
> > > 2025-09-11 02:29:58.888 UTC,,,2362287,,68c233e1.240hbf,1,,2025-09-11 02:28:49 UTC,122/46371006,0,ERROR,XX001,"found xmin 4133102167 from before relfrozenxid 4151440783",,,,,"while scanning block 5821149 offset 5 of relation ""public.order""
> > > 2025-09-11 02:40:50.361 UTC,"prod_user_ap","live_order_us_db",2375672,"127.0.0.1:59344",68c2342b.243ff8,4,"VACUUM",2025-09-11 02:30:03 UTC,169/38875732,0,ERROR,XX001,"found xmin 4133102167 from before relfrozenxid 4151440783",,,,,"while scanning block 5821149 offset 5 of relation ""public.order""","VACUUM (VERBOSE, ANALYZE) public.order;",,,"vacuumdb","client backend",,-5528190995457849841
> >
> > That is probably caused by a PostgreSQL bug; you can get rid of the message
>
> In which version can we expect the fix for it? Also, can you please help to understand
> which specific condition or scenario is triggering this PostgreSQL error and skipping
> to freeze xmin?

I *believe* there must be a bug that causes that problem, because I have seen that error
reported often enough that I don't think it can be attributed to hardware errors.
Unfortunately, I think that nobody knows how it happens, so we cannot fix it.

> > by creating the "pg_surgery" extension and running
> >
> >   SELECT heap_force_freeze('public.order'::regclass, '{(5821149,5)}'::tid[]);
>
> I agree we can run pg_surgery , but the question is how safe it is to run for large and mission-critical tables over 200GB.
> From pg_surgery doc: These functions are unsafe by design and using them may corrupt (or further corrupt) your database

It is dangerous, and that has nothing to do with the size of the table.
If you do the wrong thing with that knife, you can cause more problems
than you fix.


I got this error after using pg_repack and following the upgrade. So maybe there can be more factors.

Regards

Pavel