Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
От | Alvaro Herrera |
---|---|
Тема | Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running |
Дата | |
Msg-id | 20150226145354.GA2384@alvh.no-ip.org обсуждение исходный текст |
Ответ на | Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running ("MauMau" <maumau307@gmail.com>) |
Список | pgsql-hackers |
FWIW a fix for this has been posted to all active branches: Author: Andres Freund <andres@anarazel.de> Branch: master [fd6a3f3ad] 2015-02-26 12:50:07 +0100 Branch: REL9_4_STABLE [d72115112] 2015-02-26 12:50:07 +0100 Branch: REL9_3_STABLE [abce8dc7d] 2015-02-26 12:50:07 +0100 Branch: REL9_2_STABLE [d67076529] 2015-02-26 12:50:07 +0100 Branch: REL9_1_STABLE [5c8dabecd] 2015-02-26 12:50:08 +0100 Branch: REL9_0_STABLE [82e0d6eb5] 2015-02-26 12:50:08 +0100 Reconsider when to wait for WAL flushes/syncrep during commit. Up to now RecordTransactionCommit() waited for WALto be flushed (if synchronous_commit != off) and to be synchronously replicated (if enabled), even if a transactiondid not have a xid assigned. The primary reason for that is that sequence's nextval() did not assign a xid,but are worthwhile to wait for on commit. This can be problematic because sometimes read only transactions do write WAL, e.g. HOT page prune records. That then could lead to read only transactions having to wait during commit.Not something people expect in a read only transaction. This lead to such strange symptoms as backends beingseemingly stuck during connection establishment when all synchronous replicas are down. Especially annoying whensaid stuck connection is the standby trying to reconnect to allow syncrep again... This behavior also is involvedin a rather complicated <= 9.4 bug where the transaction started by catchup interrupt processing waited for syncrepusing latches, but didn't get the wakeup because it was already running inside the same overloaded signal handler.Fix the issue here doesn't properly solve that issue, merely papers over the problems. In 9.5 catchup interruptsaren't processed out of signal handlers anymore. To fix all this, make nextval() acquire a top level xid,and only wait for transaction commit if a transaction both acquired a xid and emitted WAL records. If only a xidhas been assigned we don't uselessly want to wait just because of writes to temporary/unlogged tables; if only WAL has been written we don't want to wait just because of HOT prunes. The xid assignment in nextval() is unlikely to causeoverhead in real-world workloads. For one it only happens SEQ_LOG_VALS/32 values anyway, for another only usageof nextval() without using the result in an insert or similar is affected. Discussion: 20150223165359.GF30784@awork2.anarazel.de, 369698E947874884A77849D8FE3680C2@maumau, 5CF4ABBA67674088B3941894E22A0D25@maumau Per complaint from maumau and Thom Brown Backpatch all the way back; 9.0doesn't have syncrep, but it seems better to be consistent behavior across all maintained branches. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: