BUG #9420: WAL-Sender repeatedly killed
От | proels@web.de |
---|---|
Тема | BUG #9420: WAL-Sender repeatedly killed |
Дата | |
Msg-id | 20140303125342.2159.79464@wrigleys.postgresql.org обсуждение исходный текст |
Список | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 9420 Logged by: Stefan Pröls Email address: proels@web.de PostgreSQL version: 9.2.7 Operating system: opensuse 11.4 Description: I'm unsure whether this is really a bug, but at least the log messages don't look like expected behavior. We are running async. streaming replication between 2 PostgreSQL 9.2.7 servers. Last night the slave failed to reboot nicely because of a filesystem inconsistency which required remote hands. So there was a delay of about 30 hours until the slave was up an running again. Thus, it had to catch up quite some replication delay. It did succeed doing so -- however, according to the log messages the WAL sender process was killed several times during this period. From the master's log: 2014-03-03 13:02:30 CET [unbekannt] standby LOG: breche WAL-Sender-Prozess ab wegen Zeitüberschreitung bei der Replikation 2014-03-03 13:05:08 CET [unbekannt] standby LOG: breche WAL-Sender-Prozess ab wegen Zeitüberschreitung bei der Replikation [several more times] >From the slave's log: 2014-03-03 12:47:18 CET FATAL: konnte keine Daten an den WAL-Stream senden: Server beendete die Verbindung unerwartet Das heiÃt wahrscheinlich, dass der Server abnormal beendete bevor oder während die Anweisung bearbeitet wurde. 2014-03-03 13:00:34 CET LOG: ungültige magische Zahl 0000 in Logdatei 17, Segment 111, Offset 3538944 2014-03-03 13:00:35 CET LOG: Streaming-Replikation hat erfolgreich mit primärem Server verbunden 2014-03-03 13:03:07 CET FATAL: konnte keine Daten an den WAL-Stream senden: Server beendete die Verbindung unerwartet Das heiÃt wahrscheinlich, dass der Server abnormal beendete bevor oder während die Anweisung bearbeitet wurde. 2014-03-03 13:04:04 CET LOG: ungültige magische Zahl 0000 in Logdatei 17, Segment 185, Offset 5111808 2014-03-03 13:04:04 CET LOG: Streaming-Replikation hat erfolgreich mit primärem Server verbunden [also repeated serverali more times] It appears the slave had trouble keeping up with the amount of data sent by the master and, therefore, the wal-sender was repeatedly killed until the slave finally catched up. I don't know where the "invalid magic number 0000" comes from, though. Note that the master has much faster disks (SSDs) than the slave (SATA-HDDs), so some congestion on the master is likely to arise when the master streams out more data than the slave can keep up with.
В списке pgsql-bugs по дате отправления: