Sharing more infrastructure between walsenders and regular backends (was Re: Switching timeline over streaming replication)
От | Heikki Linnakangas |
---|---|
Тема | Sharing more infrastructure between walsenders and regular backends (was Re: Switching timeline over streaming replication) |
Дата | |
Msg-id | 506DA6BD.9020809@vmware.com обсуждение исходный текст |
Ответ на | Re: Switching timeline over streaming replication (Amit Kapila <amit.kapila@huawei.com>) |
Ответы |
Re: Sharing more infrastructure between walsenders and regular backends (was Re: Switching timeline over streaming replication)
Re: Sharing more infrastructure between walsenders and regular backends (was Re: Switching timeline over streaming replication) |
Список | pgsql-hackers |
On 03.10.2012 18:15, Amit Kapila wrote: > 35.WalSenderMain(void) > { > .. > + if (walsender_shutdown_requested) > + ereport(FATAL, > + (errcode(ERRCODE_ADMIN_SHUTDOWN), > + errmsg("terminating replication > connection due to administrator command"))); > + > + /* Tell the client that we are ready to receive commands */ > > + ReadyForQuery(DestRemote); > + > .. > > + if (walsender_shutdown_requested) > + ereport(FATAL, > + (errcode(ERRCODE_ADMIN_SHUTDOWN), > + errmsg("terminating replication > connection due to administrator command"))); > + > > is it necessary to check walsender_shutdown_requested 2 times in a > loop, if yes, then can we write comment why it is important to check > it again. The idea was to check for shutdown request before and after the pq_getbyte() call, because that can block for a long time. Looking closer, we don't currently (ie. without this patch) make any effort to react to SIGTERM in a walsender, while it's waiting for a command from the client. After starting replication, it does check walsender_shutdown_requested in the loop, and it's also checked during a base backup (although only when switching to send next file, which seems too seldom). This issue is orthogonal to handling timeline changes over streaming replication, although that patch will make it more important to handle SIGTERM quickly while waiting for a command, because you stay in that mode for longer and more often. I think walsender needs to share more infrastructure with regular backends to handle this better. When we first implemented streaming replication in 9.0, it made sense to implement just the bare minimum needed to accept the handshake commands before entering the Copy state, but now that the replication command set has grown to cover base backups, and fetching timelines with the patch being discussed, we should bite the bullet and make the command loop more feature-complete and robust. In a regular backend, the command loop reacts to SIGTERM immediately, setting ImmediateInterruptOK at the right places, and calling CHECK_FOR_INTERRUPTS() at strategic places. I propose that we let PostgresMain handle the main command loop for walsender processes too, like it does for regular backends, and use ProcDiePending and the regular die() signal handler for SIGTERM in walsender as well. So I propose the attached patch. I made small changes to postgres.c to make it call exec_replication_command() instead of exec_simple_query(), and reject extend query protocol, in a WAL sender process. A lot of code related to handling the main command loop and signals is removed from walsender.c. - Heikki
Вложения
В списке pgsql-hackers по дате отправления: