Обсуждение: Index corruption after pg_basebackup
I have been struggling with this particular scenario now for a few days and very frustrated with the consistent results. maybe someone can help.
We are migrating a service to new docker container from a bare metal box. This is one of about 150 which we have done already successfully. basically setup a replica, promote it, and point the f5 to the new master. However with this particular migration we always get index corruption on the new server once we are up and running.
In a nut shell, here is our flow.
- Use pg_basebackup initiated from new server against the old one to establish a replica of the old server.
- Recovery.conf file has a restore command to point to the archive directory because we cycle through wal files very frequently.
- Start new replica, let it do it's magic and catch up.
At this point all seems normal. However certain queries against the replica return drastically different result sets when compared to the primary. During our investigation, we promote the replica and do a few things including a vacuumdb -za. This is where we see error messages regarding indexes such as ...
ERROR: failed to re-find parent key in index "myindex" for deletion target page 851753
We could just reindex and be done. But, the db is very large and hot, we cannot have inconsistent data while we reindex.
When we run the same vacuum on the source server, all is good so the source data is good.
We can repeat this scenario in both staging and production so I am thinking it is something that is incompatible between the two.
So a little background just for your reference.
- Source database and target database are the same versions of postgres ( 9.5 ) However, the minor version is different. 9.5.5 vs 9.5.7
- fsync = on
- full_page_write = on
Any tips or help would be appreciated.
Don't have the luxury of doing a pg_dump :(
Thanks
JT
Re: Jorge Torralba 2018-08-28 <CACut7uRx72_R-nGOdOdnTSYCGGdKZ_TfsSghCb35+deF3s-4nw@mail.gmail.com> > ERROR: failed to re-find parent key in index "myindex" for deletion target > page 851753 Which data type is that, and which LC_COLLATE setting is used? If it's text (or similar), there's a known incompatibility with de_DE.UTF-8 in different glibc versions. Possibly other locales and libcs are affected as well. Christoph
Same on both systems.
lc_collate
-------------
en_US.UTF-8
(1 row)
One of the failed index is
btree (account_id, lower(t1_type), t_id, "timestamp" DESC)
datatypes of: UUID, text, text, timestamp
On Tue, Aug 28, 2018 at 9:42 AM Christoph Berg <myon@debian.org> wrote:
Re: Jorge Torralba 2018-08-28 <CACut7uRx72_R-nGOdOdnTSYCGGdKZ_TfsSghCb35+deF3s-4nw@mail.gmail.com>
> ERROR: failed to re-find parent key in index "myindex" for deletion target
> page 851753
Which data type is that, and which LC_COLLATE setting is used?
If it's text (or similar), there's a known incompatibility with
de_DE.UTF-8 in different glibc versions. Possibly other locales and
libcs are affected as well.
Christoph
Thanks,
Jorge Torralba
----------------------------
Note: This communication may contain privileged or other confidential information. If you are not the intended recipient, please do not print, copy, retransmit, disseminate or otherwise use the information. Please indicate to the sender that you have received this email in error and delete the copy you received. Thank You.
Jorge Torralba
----------------------------
Note: This communication may contain privileged or other confidential information. If you are not the intended recipient, please do not print, copy, retransmit, disseminate or otherwise use the information. Please indicate to the sender that you have received this email in error and delete the copy you received. Thank You.
Re: Jorge Torralba 2018-08-28 <CACut7uTwWhXKHg+Fd3sy6hFnxWn58sLZPOr4frgrg23+NEgq-A@mail.gmail.com> > Same on both systems. > > en_US.UTF-8 The problem is that you are moving from one OS (full installation, one libc vendor, one libc version) to another OS (docker, possibly different libc vendor, likely different libc version). If any combination of strings sort differently in these locale incarnations, you need to REINDEX. I'm not aware of any specific problem with en_US.UTF-8, but I wouldn't be surprised if there's differences. Christoph