Обсуждение: PD_ALL_VISIBLE flag warning
Hello
I came in this morning and noticed this warning sitting in my inbox quite a few times...
WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "table_2010q1" page 471118
WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "table_2010q1" page 471119
WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "table_2010q1" page 471120
....
and I'm wondering how worked up I should be getting.
I'm trying to do due diligence on my side and am digging through archives, but just in case it's a cause for greater alarm than I think it is I'm going to throw this out there and see what those more experienced that I say.
Postgres version 8.4.1, redhat 5.3.
The table in question is about 6 million rows, a single partition in a larger set. A relatively heavy load (mass insert followed by a mass update) takes place each morning, during the time period where I was seeing this error. Auto-vacuum apparently kicked off at this time as well (auto-analyze is showing last-run at 4:27 AM, auto-vacuum at 8:48). The first warning kicked off at 4:55.
My assumption here is that the load from the inserts/updates was interfering with auto-vacuum in some way, and resulted in the above warnings. Is this a reasonable assumption at all? I'm under the impression that auto-vacuum will kill itself if it notices that it's hurting performance, but I wouldn't think it should throw all of these warnings...does it indicate poor auto-vacuum configuration? I admit that said configuration is an area that I've been meaning to look into more (upgraded from 8.1 where we didn't use auto-vacuum), but hasn't ended up on the top of the priorities list yet.
Let me know if I'm not providing enough info.
I came in this morning and noticed this warning sitting in my inbox quite a few times...
WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "table_2010q1" page 471118
WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "table_2010q1" page 471119
WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "table_2010q1" page 471120
....
and I'm wondering how worked up I should be getting.
I'm trying to do due diligence on my side and am digging through archives, but just in case it's a cause for greater alarm than I think it is I'm going to throw this out there and see what those more experienced that I say.
Postgres version 8.4.1, redhat 5.3.
The table in question is about 6 million rows, a single partition in a larger set. A relatively heavy load (mass insert followed by a mass update) takes place each morning, during the time period where I was seeing this error. Auto-vacuum apparently kicked off at this time as well (auto-analyze is showing last-run at 4:27 AM, auto-vacuum at 8:48). The first warning kicked off at 4:55.
My assumption here is that the load from the inserts/updates was interfering with auto-vacuum in some way, and resulted in the above warnings. Is this a reasonable assumption at all? I'm under the impression that auto-vacuum will kill itself if it notices that it's hurting performance, but I wouldn't think it should throw all of these warnings...does it indicate poor auto-vacuum configuration? I admit that said configuration is an area that I've been meaning to look into more (upgraded from 8.1 where we didn't use auto-vacuum), but hasn't ended up on the top of the priorities list yet.
Let me know if I'm not providing enough info.
On Thu, Apr 1, 2010 at 2:10 PM, Jonathan Foy <thefoy@gmail.com> wrote: > Hello > > I came in this morning and noticed this warning sitting in my inbox quite a > few times... > > WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "table_2010q1" > page 471118 > WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "table_2010q1" > page 471119 > WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "table_2010q1" > page 471120 > .... > > and I'm wondering how worked up I should be getting. This is a warning about a fairly serious data corruption problem. In the worst case prior queries returned incorrect results -- which if those results were used to calculate subsequent updates could have caused your data to be inconsistent. Specifically, having that flag set would cause rows to be returned which should not have been returned, either because they had been deleted, the transaction inserting them aborted, or they had been updated (causing duplicate copies of the same row to be returned). There was a bug which could cause this in 8.4.0 but it was fixed in 8.4.1. You say you're running 8.4.1 (you should be running 8.4.3 though none of the subsequent bug-fixes are as serious if the SSL security vulnerability isn't relevant) so this is quite worrisome. Nobody else has reported anything similar though. Is it possible these pages have never been vacuumed since you updated from 8.4.0 to 8.4.1? I'm not sure how likely that is. In 8.4 Postgres tries to avoid vacuuming pages that don't need it so it's possible this situation could live on for longer than in the past. I think a regular "VACUUM" of every table should guarantee that any remnants of this bug from 8.4.0 are cleaned up. If you do this and possibly get some more warnings from it but never see it again afterwards then I would assume they're leftovers from that old bug. -- greg
Thanks for the reply...
We have an unhealthy habit of not updating very often around here, one that I'm trying to fix. We run both 8.1 and 8.4, and we're testing the update to 8.1.20 this weekend....from 8.1.3. Across the board 8.4 update will take much more testing due to the typing issue. Other 8.4 updates will take place soon.
Anyway, my guilt about outdated versions aside...
We never actually ran 8.4.0. We went straight from 8.3.3 to 8.4.1. I did a full dump/restore from the 8.3.3 database into the 8.4.1 database (using the 8.4 pg_dump), partitioning as I went. The table in question was not partitioned in 8.3, the 8.4 dump/restore seemed as good a place to do the partitioning as any. I don't specifically remember doing it, but I almost always do a manual vacuum/analyze after a restore. Autovacuum is/was also turned on. According to the stats, everything seems to have been at least vacuumed on 2/15 (the date of the update I believe), so it is doubtful to me that any of the tables, especially the currently active partition, would have gone un-vacuumed as of 4/1.
The logs for that day have already rotated into less-easily available storage, but I'm considering retrieving them to look for any aborts/rollbacks.
Also, for what it's worth, we never delete rows from this table, we only insert and update.
I have not seen the warning since.
Sound like trouble?
We have an unhealthy habit of not updating very often around here, one that I'm trying to fix. We run both 8.1 and 8.4, and we're testing the update to 8.1.20 this weekend....from 8.1.3. Across the board 8.4 update will take much more testing due to the typing issue. Other 8.4 updates will take place soon.
Anyway, my guilt about outdated versions aside...
We never actually ran 8.4.0. We went straight from 8.3.3 to 8.4.1. I did a full dump/restore from the 8.3.3 database into the 8.4.1 database (using the 8.4 pg_dump), partitioning as I went. The table in question was not partitioned in 8.3, the 8.4 dump/restore seemed as good a place to do the partitioning as any. I don't specifically remember doing it, but I almost always do a manual vacuum/analyze after a restore. Autovacuum is/was also turned on. According to the stats, everything seems to have been at least vacuumed on 2/15 (the date of the update I believe), so it is doubtful to me that any of the tables, especially the currently active partition, would have gone un-vacuumed as of 4/1.
The logs for that day have already rotated into less-easily available storage, but I'm considering retrieving them to look for any aborts/rollbacks.
Also, for what it's worth, we never delete rows from this table, we only insert and update.
I have not seen the warning since.
Sound like trouble?
On Thu, Apr 8, 2010 at 12:59 AM, Greg Stark <gsstark@mit.edu> wrote:
On Thu, Apr 1, 2010 at 2:10 PM, Jonathan Foy <thefoy@gmail.com> wrote:This is a warning about a fairly serious data corruption problem. In
> Hello
>
> I came in this morning and noticed this warning sitting in my inbox quite a
> few times...
>
> WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "table_2010q1"
> page 471118
> WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "table_2010q1"
> page 471119
> WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "table_2010q1"
> page 471120
> ....
>
> and I'm wondering how worked up I should be getting.
the worst case prior queries returned incorrect results -- which if
those results were used to calculate subsequent updates could have
caused your data to be inconsistent.
Specifically, having that flag set would cause rows to be returned
which should not have been returned, either because they had been
deleted, the transaction inserting them aborted, or they had been
updated (causing duplicate copies of the same row to be returned).
There was a bug which could cause this in 8.4.0 but it was fixed in
8.4.1. You say you're running 8.4.1 (you should be running 8.4.3
though none of the subsequent bug-fixes are as serious if the SSL
security vulnerability isn't relevant) so this is quite worrisome.
Nobody else has reported anything similar though.
Is it possible these pages have never been vacuumed since you updated
from 8.4.0 to 8.4.1? I'm not sure how likely that is. In 8.4 Postgres
tries to avoid vacuuming pages that don't need it so it's possible
this situation could live on for longer than in the past.
I think a regular "VACUUM" of every table should guarantee that any
remnants of this bug from 8.4.0 are cleaned up. If you do this and
possibly get some more warnings from it but never see it again
afterwards then I would assume they're leftovers from that old bug.
--
greg
On Thu, Apr 8, 2010 at 1:57 PM, Jonathan Foy <thefoy@gmail.com> wrote: > Thanks for the reply... > > We have an unhealthy habit of not updating very often around here, one that > I'm trying to fix. We run both 8.1 and 8.4, and we're testing the update to > 8.1.20 this weekend....from 8.1.3. Across the board 8.4 update will take > much more testing due to the typing issue. Other 8.4 updates will take place > soon. If you're on 8.1 it's important to be on the latest 8.1 -- there are at least 17 significant bugs that warranted bug-fix releases between 8.1.3 and 8.1.20 including security bugs and data corruption bugs. > We never actually ran 8.4.0. We went straight from 8.3.3 to 8.4.1. I did a > full dump/restore from the 8.3.3 database into the 8.4.1 database (using the > 8.4 pg_dump), partitioning as I went. Ugh, then it sounds like there's a real as-yet unidentified problem. The other thing to check before panicking is, has this server ever run with fsync off and crashed (either kernel panic or power failure)? What does the storage system on this server look like? Are they IDE or SCSI drives? Is there an LVM layer involved? etc. -- greg
On Thu, Apr 8, 2010 at 10:16 AM, Greg Stark <gsstark@mit.edu> wrote:
Understood. It's my current top priority.
I have never turned off fsync. Also, the server is showing an uptime of 281 days, and the postgres instance has been up since 2/21.
If you're on 8.1 it's important to be on the latest 8.1 -- there are
at least 17 significant bugs that warranted bug-fix releases between
8.1.3 and 8.1.20 including security bugs and data corruption bugs.
Understood. It's my current top priority.
Ugh, then it sounds like there's a real as-yet unidentified problem.
> We never actually ran 8.4.0. We went straight from 8.3.3 to 8.4.1. I did a
> full dump/restore from the 8.3.3 database into the 8.4.1 database (using the
> 8.4 pg_dump), partitioning as I went.
The other thing to check before panicking is, has this server ever run
with fsync off and crashed (either kernel panic or power failure)?
I have never turned off fsync. Also, the server is showing an uptime of 281 days, and the postgres instance has been up since 2/21.
What does the storage system on this server look like? Are they IDE or
SCSI drives? Is there an LVM layer involved? etc.
--
greg
Storage is on an NFS mounted Fiber Netapp volume.
Let me know if you need more...