Re: Speedup twophase transactions

Поиск
Список
Период
Сортировка
От Jesper Pedersen
Тема Re: Speedup twophase transactions
Дата
Msg-id 56E2F51F.2030005@redhat.com
обсуждение исходный текст
Ответ на Re: Speedup twophase transactions  (Stas Kelvich <s.kelvich@postgrespro.ru>)
Ответы Re: Speedup twophase transactions  (Stas Kelvich <s.kelvich@postgrespro.ru>)
Список pgsql-hackers
On 01/26/2016 07:43 AM, Stas Kelvich wrote:
> Thanks for reviews and commit!
>
>    As Simon and Andres already mentioned in this thread replay of twophase transaction is significantly slower then
thesame operations in normal mode. Major reason is that each state file is fsynced during replay and while it is not a
problemfor recovery, it is a problem for replication. Under high 2pc update load lag between master and async replica
isconstantly increasing (see graph below).
 
>
>    One way to improve things is to move fsyncs to restartpoints, but as we saw previously it is a half-measure and
justfrequent calls to fopen can cause bottleneck.
 
>
>    Other option is to use the same scenario for replay that was used already for non-recovery mode: read state files
tomemory during replay of prepare, and if checkpoint/restartpoint occurs between prepare and commit move data to files.
Oncommit we can read xlog or files. So here is the patch that implements this scenario for replay.
 
>
>    Patch is quite straightforward. During replay of prepare records RecoverPreparedFromXLOG() is called to create
memorystate in GXACT, PROC, PGPROC; on commit XlogRedoFinishPrepared() is called to clean up that state. Also there are
severalfunctions (PrescanPreparedTransactions, StandbyTransactionIdIsPrepared) that were assuming that during replay
allprepared xacts have files in pg_twophase, so I have extended them to check GXACT too.
 
>    Side effect of that behaviour is that we can see prepared xacts in pg_prepared_xacts view on slave.
>
> While this patch touches quite sensible part of postgres replay and there is some rarely used code paths, I wrote
shellscript to setup master/slave replication and test different failure scenarios that can happened with instances.
Attachingthis file to show test scenarios that I have tested and more importantly to show what I didn’t tested.
ParticularlyI failed to reproduce situation where StandbyTransactionIdIsPrepared() is called, may be somebody can
suggestway how to force it’s usage. Also I’m not too sure about necessity of calling cache invalidation callbacks
duringXlogRedoFinishPrepared(), I’ve marked this place in patch with 2REVIEWER comment.
 
>
> Tests shows that this patch increases speed of 2pc replay to the level when replica can keep pace with master.
>
> Graph: replica lag under a pgbench run for a 200 seconds with 2pc update transactions (80 connections, one update per
2pctx, two servers with 12 cores each, 10GbE interconnect) on current master and with suggested patch. Replica lag
measuredwith "select sent_location-replay_location as delay from pg_stat_replication;" each second.
 
>

Some comments:

* The patch needs a rebase against the latest TwoPhaseFileHeader change
* Rework the check.sh script into a TAP test case (src/test/recovery), 
as suggested by Alvaro and Michael down thread
* Add documentation for RecoverPreparedFromXLOG

+     * that xlog record. We need just to clen up memmory state.

'clean' + 'memory'

+     * This is usually called after end-of-recovery checkpoint, so all 2pc
+     * files moved xlog to files. But if we restart slave when master is
+     * switched off this function will be called before checkpoint ans we need
+     * to check PGXACT array as it can contain prepared transactions that
+     * didn't created any state files yet.

=>

"We need to check the PGXACT array for prepared transactions that 
doesn't have any state file in case of a slave restart with the master 
being off."

+         * prepare xlog resords in shared memory in the same way as it happens

'records'

+         * We need such behaviour because speed of 2PC replay on replica should
+         * be at least not slower than 2PC tx speed on master.

=>

"We need this behaviour because the speed of the 2PC replay on the 
replica should be at least the same as the 2PC transaction speed of the 
master."

I'll leave the 2REVIEWER section to Simon.

Best regards, Jesper




В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Steele
Дата:
Сообщение: Re: Inconsistent error handling in START_REPLICATION command
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: amcheck (B-Tree integrity checking tool)