Re: [HACKERS] snapbuild woes

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: [HACKERS] snapbuild woes
Дата
Msg-id 20170413002959.apytyi5lkknauauj@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: [HACKERS] snapbuild woes  (Petr Jelinek <petr.jelinek@2ndquadrant.com>)
Ответы Re: [HACKERS] snapbuild woes  (Petr Jelinek <petr.jelinek@2ndquadrant.com>)
Список pgsql-hackers
Hi,
On 2017-03-03 01:30:11 +0100, Petr Jelinek wrote:

> From 7d5b48c8cb80e7c867b2096c999d08feda50b197 Mon Sep 17 00:00:00 2001
> From: Petr Jelinek <pjmodos@pjmodos.net>
> Date: Fri, 24 Feb 2017 21:39:03 +0100
> Subject: [PATCH 1/5] Reserve global xmin for create slot snasphot export
> 
> Otherwise the VACUUM or pruning might remove tuples still needed by the
> exported snapshot.
> ---
>  src/backend/replication/logical/logical.c | 21 ++++++++++++++++++++-
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
> index 5529ac8..57c392c 100644
> --- a/src/backend/replication/logical/logical.c
> +++ b/src/backend/replication/logical/logical.c
> @@ -267,12 +267,18 @@ CreateInitDecodingContext(char *plugin,
>       * the slot machinery about the new limit. Once that's done the
>       * ProcArrayLock can be released as the slot machinery now is
>       * protecting against vacuum.
> +     *
> +     * Note that we only store the global xmin temporarily in the in-memory
> +     * state so that the initial snapshot can be exported. After initial
> +     * snapshot is done global xmin should be reset and not tracked anymore
> +     * so we are fine with losing the global xmin after crash.
>       * ----
>       */
>      LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
>  
>      slot->effective_catalog_xmin = GetOldestSafeDecodingTransactionId();
>      slot->data.catalog_xmin = slot->effective_catalog_xmin;
> +    slot->effective_xmin = slot->effective_catalog_xmin;


>  void
>  FreeDecodingContext(LogicalDecodingContext *ctx)
>  {
> +    ReplicationSlot *slot = MyReplicationSlot;
> +
>      if (ctx->callbacks.shutdown_cb != NULL)
>          shutdown_cb_wrapper(ctx);
>  
> +    /*
> +     * Cleanup global xmin for the slot that we may have set in
> +     * CreateInitDecodingContext().

Hm.  Is that actually a meaningful point to do so? For one, it gets
called by pg_logical_slot_get_changes_guts(), but more importantly, the
snapshot is exported till SnapBuildClearExportedSnapshot(), which is the
next command?  If we rely on the snapshot magic done by ExportSnapshot()
it'd be worthwhile to mention that...


> We do not take ProcArrayLock or similar
> +     * since we only reset xmin here and there's not much harm done by a
> +     * concurrent computation missing that.
> +     */

Hum.  I was prepared to complain about this, but ISTM, that there's
absolutely no risk because the following
ReplicationSlotsComputeRequiredXmin(false); actually does all the
necessary locking?  But still, I don't see much point in the
optimization.



> This patch changes the code so that stored snapshots are only used for
> logical decoding restart but not for initial slot snapshot.

Yea, that's a very good point...

> @@ -1284,13 +1286,13 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
>  
>          return false;
>      }
> -    /* c) valid on disk state */
> -    else if (SnapBuildRestore(builder, lsn))
> +    /* c) valid on disk state and not exported snapshot */
> +    else if (!TransactionIdIsNormal(builder->initial_xmin_horizon) &&
> +             SnapBuildRestore(builder, lsn))

Hm. Is this a good signaling mechanism? It'll also trigger for the SQL
interface, where it'd strictly speaking not be required, right?


> From 3318a929e691870f3c1ca665bec3bfa8ea2af2a8 Mon Sep 17 00:00:00 2001
> From: Petr Jelinek <pjmodos@pjmodos.net>
> Date: Sun, 26 Feb 2017 01:07:33 +0100
> Subject: [PATCH 3/5] Prevent snapshot builder xmin from going backwards

A bit more commentary would be good. What does that protect us against?



> From 53193b40f26dd19c712f3b9b77af55f81eb31cc4 Mon Sep 17 00:00:00 2001
> From: Petr Jelinek <pjmodos@pjmodos.net>
> Date: Wed, 22 Feb 2017 00:57:33 +0100
> Subject: [PATCH 4/5] Fix xl_running_xacts usage in snapshot builder
> 
> Due to race condition, the xl_running_xacts might contain no longer
> running transactions. Previous coding tried to get around this by
> additional locking but that did not work correctly for committs. Instead
> try combining decoded commits and multiple xl_running_xacts to get the
> consistent snapshot.

Needs more explanation about approach.


> @@ -1221,7 +1221,12 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
>       *      simply track the number of in-progress toplevel transactions and
>       *      lower it whenever one commits or aborts. When that number
>       *      (builder->running.xcnt) reaches zero, we can go from FULL_SNAPSHOT
> -     *      to CONSISTENT.
> +     *      to CONSISTENT. Sometimes we might get xl_running_xacts which has
> +     *      all tracked transactions as finished. We'll need to restart tracking
> +     *      in that case and use previously collected committed transactions to
> +     *      purge transactions mistakenly marked as running in the
> +     *      xl_running_xacts which exist as a result of race condition in
> +     *      LogStandbySnapshot().

I'm not following this yet.


> @@ -1298,11 +1303,17 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
>       * b) first encounter of a useable xl_running_xacts record. If we had
>       * found one earlier we would either track running transactions (i.e.
>       * builder->running.xcnt != 0) or be consistent (this function wouldn't
> -     * get called).
> +     * get called). However it's possible that we could not see all
> +     * transactions that were marked as running in xl_running_xacts, so if
> +     * we get new one that says all were closed but we are not consistent
> +     * yet, we need to restart the tracking while taking previously seen
> +     * transactions into account.

This needs to revise the preceding comment more heavily.  "This is the
first!!! Or maybe not!" isn't easy to understand.


>       */
> -    else if (!builder->running.xcnt)
> +    else if (!builder->running.xcnt ||
> +             running->oldestRunningXid > builder->running.xmax)

Isn't that wrong under wraparound?


>      {
>          int            off;
> +        bool        first = builder->running.xcnt == 0;
>  
>          /*
>           * We only care about toplevel xids as those are the ones we
> @@ -1338,20 +1349,13 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
>          builder->running.xmin = builder->running.xip[0];
>          builder->running.xmax = builder->running.xip[running->xcnt - 1];
>  
> +
>          /* makes comparisons cheaper later */
>          TransactionIdRetreat(builder->running.xmin);
>          TransactionIdAdvance(builder->running.xmax);
>  
>          builder->state = SNAPBUILD_FULL_SNAPSHOT;
>  
> -        ereport(LOG,
> -            (errmsg("logical decoding found initial starting point at %X/%X",
> -                    (uint32) (lsn >> 32), (uint32) lsn),
> -             errdetail_plural("%u transaction needs to finish.",
> -                              "%u transactions need to finish.",
> -                              builder->running.xcnt,
> -                              (uint32) builder->running.xcnt)));
> -
> +        /*
> +         * If this is the first time we've seen xl_running_xacts, we are done.
> +         */
> +        if (first)
> +        {
> +            ereport(LOG,
> +                (errmsg("logical decoding found initial starting point at %X/%X",
> +                        (uint32) (lsn >> 32), (uint32) lsn),
> +                 errdetail_plural("%u transaction needs to finish.",
> +                                  "%u transactions need to finish.",
> +                                  builder->running.xcnt,
> +                                  (uint32) builder->running.xcnt)));
> +        }
> +        else
> +        {
> +            /*
> +             * Because of the race condition in LogStandbySnapshot() the
> +             * transactions recorded in xl_running_xacts as running might have
> +             * already committed by the time the xl_running_xacts was written
> +             * to WAL. Use the information about decoded transactions that we
> +             * gathered so far to update our idea about what's still running.
> +             *
> +             * We can use SnapBuildEndTxn directly as it only does the
> +             * transaction running check and handling without any additional
> +             * side effects.
> +             */
> +            for (off = 0; off < builder->committed.xcnt; off++)
> +                SnapBuildEndTxn(builder, lsn, builder->committed.xip[off]);
> +            if (builder->state == SNAPBUILD_CONSISTENT)
> +                return false;
> +
> +            ereport(LOG,
> +                (errmsg("logical decoding moved initial starting point to %X/%X",
> +                        (uint32) (lsn >> 32), (uint32) lsn),
> +                 errdetail_plural("%u transaction needs to finish.",
> +                                  "%u transactions need to finish.",
> +                                  builder->running.xcnt,
> +                                  (uint32) builder->running.xcnt)));
> +        }

Hm, this is not pretty.




> From 4217da872e9aa48750c020542d8bc22c863a3d75 Mon Sep 17 00:00:00 2001
> From: Petr Jelinek <pjmodos@pjmodos.net>
> Date: Tue, 21 Feb 2017 19:58:18 +0100
> Subject: [PATCH 5/5] Skip unnecessary snapshot builds
> 
> When doing initial snapshot build during logical decoding
> initialization, don't build snapshots for transactions where we know the
> transaction didn't do any catalog changes. Otherwise we might end up
> with thousands of useless snapshots on busy server which can be quite
> expensive.
> ---
>  src/backend/replication/logical/snapbuild.c | 82 +++++++++++++++++++----------
>  1 file changed, 53 insertions(+), 29 deletions(-)
> 
> diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
> index 1a1c9ba..c800aa5 100644
> --- a/src/backend/replication/logical/snapbuild.c
> +++ b/src/backend/replication/logical/snapbuild.c
> @@ -954,6 +954,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
>      bool        forced_timetravel = false;
>      bool        sub_needs_timetravel = false;
>      bool        top_needs_timetravel = false;
> +    bool        skip_forced_snapshot = false;
>  
>      TransactionId xmax = xid;
>  
> @@ -975,10 +976,19 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
>          /*
>           * We could avoid treating !SnapBuildTxnIsRunning transactions as
>           * timetravel ones, but we want to be able to export a snapshot when
> -         * we reached consistency.
> +         * we reached consistency so we need to keep track of them.
>           */
>          forced_timetravel = true;
>          elog(DEBUG1, "forced to assume catalog changes for xid %u because it was running too early", xid);
> +
> +        /*
> +         * It is however desirable to skip building new snapshot for
> +         * !SnapBuildTxnIsRunning transactions as otherwise we might end up
> +         * building thousands of unused snapshots on busy servers which can
> +         * be very expensive.
> +         */
> +        if (!SnapBuildTxnIsRunning(builder, xid))
> +            skip_forced_snapshot = true;
>      }

That's pretty crudely bolted on the existing logic, isn't there a
simpler way?


Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: [HACKERS] Function to control physical replication slot
Следующее
От: Amit Langote
Дата:
Сообщение: Re: [HACKERS] Partitioned tables and relfilenode