Re: 001_rep_changes.pl fails due to publisher stuck on shutdown
От | Peter Smith |
---|---|
Тема | Re: 001_rep_changes.pl fails due to publisher stuck on shutdown |
Дата | |
Msg-id | CAHut+PtZk8Q3k_gymTqkiBueB=BLAXBuhRfvvbc3wstXg7bzUA@mail.gmail.com обсуждение исходный текст |
Ответ на | 001_rep_changes.pl fails due to publisher stuck on shutdown (Alexander Lakhin <exclusion@gmail.com>) |
Ответы |
Re: 001_rep_changes.pl fails due to publisher stuck on shutdown
|
Список | pgsql-hackers |
Hi, I have reproduced this multiple times now. I confirmed the initial post/steps from Alexander. i.e. The test script provided [1] gets itself into a state where function ReadPageInternal (called by XLogDecodeNextRecord and commented "Wait for the next page to become available") constantly returns XLREAD_FAIL. Ultimately the test times out because WalSndLoop() loops forever, since it never calls WalSndDone() to exit the walsender process. ~~~ I've made a patch to inject lots of logging, and when the test script fails a cycle of function failures can be seen. I don't know how to fix it yet, so I'm attaching my log results, hoping the information may be useful for anyone familiar with this area of the code. ~~~ Attachment #1 "v1-0001-DEBUG-LOGGING.patch" -- Patch to inject some logging. Be careful if you apply this because the resulting log files can be huge (e.g. 3G) Attachment #2 "bad8_logs_last500lines.txt" -- This is the last 500 lines of a 3G logfile from a failing test run. Attachment #3 "bad8_logs_last500lines-simple.txt" -- Same log file as above, but it's a simplified extract in which I showed the CYCLES of failure more clearly. Attachment #4 "bad8_digram"-- Same execution patch information as from the log files, but in diagram form (just to help me visualise the logic more easily). ~~~ Just so you know, the test script does not always cause the problem. Sometimes it happens after just 20 script iterations. Or, sometimes it takes a very long time and multiple runs (e.g. 400-500 script iterations). Either way, when the problem eventually occurs the CYCLES of the ReadPageInternal() failures always have the the same pattern shown in these attached logs. ====== [1] OP - https://www.postgresql.org/message-id/f15d665f-4cd1-4894-037c-afdbe369287e%40gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Вложения
В списке pgsql-hackers по дате отправления: