Обсуждение: "no-slave yet" early CREATE TABLE transaction gets blocked when synchronous replication

Поиск
Список
Период
Сортировка

"no-slave yet" early CREATE TABLE transaction gets blocked when synchronous replication

От
Sékine Coulibaly
Дата:
Hi there !

I have a master/slave setup. I use Corosync/Pacemaker for the clustering layer, and repmgr for the PsotgreSQL 9.2.4 synchronous replication. When a transaction is received by the master before the slave is up and running, the transaction seems blocked forever on the backend.

The setup is as follows :
- The master node run a PostgreSQL master instance and an in-house application. The clustering first starts PostgreSQL, then the application. The application then connects to the database, and starts creating tables.
- The slave node only runs a PostgreSQL synchronous streamed replica. The in-house application doesn't run on the slave node.

My understanding of synchronous replication is that unless the slave is up, the master can not commit any transaction (unless the configurations forces it, which is nonesense).

This is the test scenario :

- The master node is started, so repmgr, through the clustering layer, starts PostgreSQL as a master.
- Once the master is started, the application is started. The latter connects to the master and issues a CREATE TABLE query.
- On the slave, repmgr then starts the PostgreSQL slave instance : it first performs a clone from the running master, and then starts streaming.

At this point, a ps auxw | grep "post" command shows the following :

- On the MASTER :
postgres 14495  0.0  0.0 220344  8616 ?        S    11:46   0:00 /usr/pgsql-9.2/bin/postgres -D /opt/analytics/pgdata -p 5432
postgres 14496  0.0  0.0 177900  1172 ?        Ss   11:46   0:00 postgres: logger process
postgres 14498  0.0  0.0 220488  2420 ?        Ss   11:46   0:00 postgres: checkpointer process
postgres 14499  0.0  0.0 220344  1636 ?        Ss   11:46   0:00 postgres: writer process
postgres 14500  0.0  0.0 220344  1404 ?        Ss   11:46   0:00 postgres: wal writer process
postgres 14501  0.0  0.0 221168  2672 ?        Ss   11:46   0:00 postgres: autovacuum launcher process
postgres 14502  0.0  0.0 180000  1260 ?        Ss   11:46   0:00 postgres: archiver process   last was 000000010000000000000040
postgres 14503  0.0  0.0 180136  1412 ?        Ss   11:46   0:00 postgres: stats collector process
postgres 14637  0.0  0.0 222576  7812 ?        Ss   11:46   0:00 postgres: postgres logs 127.0.0.1(40436) CREATE TABLE waiting for 0/3F01E730
postgres 15978  0.0  0.0 221316  3132 ?        Ss   11:48   0:00 postgres: wal sender process repmgr 10.15.35.5(50844) streaming 0/41002048


- On the SLAVE :
postgres  8004  0.0  0.0 220340  8612 ?        S    11:48   0:00 /usr/pgsql-9.2/bin/postgres -D /opt/analytics/pgdata -p 5432
postgres  8005  0.0  0.0 177896  1168 ?        Ss   11:48   0:00 postgres: logger process
postgres  8006  0.0  0.0 220420  1968 ?        Ss   11:48   0:00 postgres: startup process   recovering 000000010000000000000041
postgres  8007  0.0  0.0 227736  3132 ?        Ss   11:48   0:02 postgres: wal receiver process   streaming 0/41002048
postgres  8008  0.0  0.0 220340  1808 ?        Ss   11:48   0:00 postgres: checkpointer process
postgres  8009  0.0  0.0 220340  1632 ?        Ss   11:48   0:00 postgres: writer process
postgres  8010  0.0  0.0 180132  1408 ?        Ss   11:48   0:00 postgres: stats collector process


My observation here is that :
- the master has a WAL sender up-and-running, streaming WAL 0/41002048.
- on the master node, the in-house application is waiting for the CREATE TABLE transaction to be commited (CREATE TABLE waiting for 0/3F01E730). PostgreSQL masters seems to wait for a WAL that was probably never received (since the data of the WAL was transferred during the cloning by RSYNC'ing the master's filesystem).
- the slave has a WAL receiver up-and-running, streaming WAL 0/41002048.

I may be wrong here, but the expected behaviour in such a case wouldn't be to unblock the CREATE TABLE transaction ? It looks like this transaction will never be able to complete.

I'm running RHEL 6.5, and PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit.

I've not tried to reproduce on 9.4.1+ yet. Is it worth trying or is this a known and solved issue ?

Any hint will be highly appreciated !

Regards,

SC

Re: "no-slave yet" early CREATE TABLE transaction gets blocked when synchronous replication

От
Kevin Grittner
Дата:
S=C3=A9kine Coulibaly <scoulibaly@gmail.com> wrote:

> synchronous replication. When a transaction is received by the
> master before the slave is up and running, the transaction seems
> blocked forever on the backend.

This is not a bug.  The promise made for synchronous replication is
that when a commit returns an indication of success, the
transaction has been persisted on at least two clusters.  If you
don't want that promise yet, don't turn on synchronous replication
yet.  If you want that guarantee but you want the primary to be
able to continue to commit transactions when there is a failure of
a synchronous replica, then provide more than one synchronous
replica.

There was discussion of supporting a "don't actually provide that
guarantee, but kinda try when it's responding fast enough", but
that was rejected as being so close to asynchronous replication as
to not really add any value.  All it would do is stall the
successful return of a commit request without actually giving you
any stronger guarantee than asynchronous replication.  Effectively,
any product that behaves that way is just giving you a false sense
of security.  If you don't need the guarantee of a second copy of
the transaction having been persisted to a second cluster, use
asynchronous replication.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: "no-slave yet" early CREATE TABLE transaction gets blocked when synchronous replication

От
Sékine Coulibaly
Дата:
Kevin,

I see your point, and totally agree. The documentation is pretty clear about this.

I indeed want the security brought by synchronous replication. Having a commit no to return as long as the replica is not up-and-streaming is what I expect and perfectly fits my needs. It is perfectly right in my use case for the master to wait for the replica as long as necessary. Asynchronous replication is definitely not what I want.

My concern here is that, although the slave is back, the pending commit is not performed on the master side. I'd expect all ongoing and blocking commits to be unblocked as soon as the slave pops in. Since the master and slave are synchronous after the slave is back, what's the point in holding a transaction forever in the master's backend ?

Regards,

Sekine




2015-03-25 15:31 GMT+01:00 Kevin Grittner <kgrittn@ymail.com>:
Sékine Coulibaly <scoulibaly@gmail.com> wrote:

> synchronous replication. When a transaction is received by the
> master before the slave is up and running, the transaction seems
> blocked forever on the backend.

This is not a bug.  The promise made for synchronous replication is
that when a commit returns an indication of success, the
transaction has been persisted on at least two clusters.  If you
don't want that promise yet, don't turn on synchronous replication
yet.  If you want that guarantee but you want the primary to be
able to continue to commit transactions when there is a failure of
a synchronous replica, then provide more than one synchronous
replica.

There was discussion of supporting a "don't actually provide that
guarantee, but kinda try when it's responding fast enough", but
that was rejected as being so close to asynchronous replication as
to not really add any value.  All it would do is stall the
successful return of a commit request without actually giving you
any stronger guarantee than asynchronous replication.  Effectively,
any product that behaves that way is just giving you a false sense
of security.  If you don't need the guarantee of a second copy of
the transaction having been persisted to a second cluster, use
asynchronous replication.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: "no-slave yet" early CREATE TABLE transaction gets blocked when synchronous replication

От
Kevin Grittner
Дата:
S=C3=A9kine Coulibaly <scoulibaly@gmail.com> wrote:

> I indeed want the security brought by synchronous replication.
> Having a commit no to return as long as the replica is not
> up-and-streaming is what I expect and perfectly fits my needs. It
> is perfectly right in my use case for the master to wait for the
> replica as long as necessary. Asynchronous replication is
> definitely not what I want.
>
> My concern here is that, although the slave is back, the pending
> commit is not performed on the master side.

I apologize for misunderstanding what you were experiencing.

> I'd expect all ongoing and blocking commits to be unblocked as
> soon as the slave pops in.

Indeed they should.

> Since the master and slave are synchronous after the slave is
> back, what's the point in holding a transaction forever in the
> master's backend ?

I know that a number of bugs for this sort of edge condition have
been fixed since 9.2.4 was release (on 2013-04-04).  I strongly
recommend that you apply the latest bug-fix roll-up for the 9.2
branch, which is currently 9.2.10.

http://www.postgresql.org/support/versioning/

If you still see such behavior with the most recently released bug
fixes, please post again, attaching the configuration files and
showing log entries (from both clusters) from around the time the
slave is brought up.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company