Re: Standby corruption after master is restarted
От | Tomas Vondra |
---|---|
Тема | Re: Standby corruption after master is restarted |
Дата | |
Msg-id | 1da55c73-4bd1-f13e-2d4b-c4049ffd73f5@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: Standby corruption after master is restarted (Emre Hasegeli <emre@hasegeli.com>) |
Ответы |
Re: Standby corruption after master is restarted
|
Список | pgsql-bugs |
On 04/17/2018 10:55 AM, Emre Hasegeli wrote: >> Can you check if the "incorrect" part of the WAL segment matches some >> previous segment? Verifying that shouldn't be very difficult (just cut a >> bunch of bytes using hexdump, compare to the incorrect data). Assuming >> you still have the WAL archive, of course. That would tell us that the >> corrupted part comes from an old recycled segment. > > I had found and saved the recycled WAL file from the archive after the > incident. Here is the hexdump of it at the same position: > > 0bddfc0 3253 4830 616f 5034 5243 4d79 664f 6164 > 0bddfd0 3967 592d 7963 7967 5541 4a59 3066 4f50 > 0bddfe0 2d55 346e 4254 3559 6a4e 726b 4e30 6f52 > 0bddff0 3876 4751 4a38 5956 5f32 7234 4b55 7045 > 0bde000 d087 0005 0005 0000 e000 66bd 1dfb 0000 > 0bde010 1931 0000 0000 0000 5a43 7746 7166 6e34 > 0bde020 304e 764e 9c32 0158 5400 e709 0900 6f66 > 0bde030 0765 7375 6111 646e 6f72 6469 370d 312e > > If you compare it with the other 2 I have posted, you would notice > that the corrupted file on standby is combination of the two. The > data on it starts with the data on the master, and continues with the > data of the recycled file. The switch is at the position 0bddff8 > which is the position printed as "Minimum recovery ending location" by > pg_controldata. > OK, this seems to confirm the theory that there's a race condition between segment recycling and replicating. It's likely limited to short period after a crash, otherwise we'd probably see many more reports. But it's still just hunch - someone needs to read through the code and check how it behaves in these situations. Not sure when I'll have time for that. >> Hmmm, I see you're using SSL. I don't think that could break affect >> anything, but maybe I should try mimicking this aspect too. > > This is the connection information. Although the master shows SSL > compression is disabled in despite of being explicitly asked for. > >> primary_conninfo = 'host=MASTER_NODE port=5432 dbname=repmgr user=repmgr connect_timeout=10 sslcompression=1' Hmmm, that seems like a separate issue. When you say 'master shows SSL compression is disabled' where do you see that? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-bugs по дате отправления: