Обсуждение: pgsql: snapshot scalability: cache snapshots using a xact completion co

Поиск
Список
Период
Сортировка

pgsql: snapshot scalability: cache snapshots using a xact completion co

От
Andres Freund
Дата:
snapshot scalability: cache snapshots using a xact completion counter.

Previous commits made it faster/more scalable to compute snapshots. But not
building a snapshot is still faster. Now that GetSnapshotData() does not
maintain RecentGlobal* anymore, that is actually not too hard:

This commit introduces xactCompletionCount, which tracks the number of
top-level transactions with xids (i.e. which may have modified the database)
that completed in some form since the start of the server.

We can avoid rebuilding the snapshot's contents whenever the current
xactCompletionCount is the same as it was when the snapshot was
originally built.  Currently this check happens while holding
ProcArrayLock. While it's likely possible to perform the check without
acquiring ProcArrayLock, it seems better to do that separately /
later, some careful analysis is required. Even with the lock this is a
significant win on its own.

On a smaller two socket machine this gains another ~1.03x, on a larger
machine the effect is roughly double (earlier patch version tested
though).  If we were able to safely avoid the lock there'd be another
significant gain on top of that.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/623a9ba79bbdd11c5eccb30b8bd5c446130e521c

Modified Files
--------------
src/backend/replication/logical/snapbuild.c |   1 +
src/backend/storage/ipc/procarray.c         | 125 +++++++++++++++++++++++-----
src/backend/utils/time/snapmgr.c            |   4 +
src/include/access/transam.h                |   9 ++
src/include/utils/snapshot.h                |   7 ++
5 files changed, 126 insertions(+), 20 deletions(-)


Re: pgsql: snapshot scalability: cache snapshots using a xact completion co

От
Michael Paquier
Дата:
On Tue, Aug 18, 2020 at 04:30:21AM +0000, Andres Freund wrote:
> snapshot scalability: cache snapshots using a xact completion counter.
>
> Previous commits made it faster/more scalable to compute snapshots. But not
> building a snapshot is still faster. Now that GetSnapshotData() does not
> maintain RecentGlobal* anymore, that is actually not too hard:
>
> This commit introduces xactCompletionCount, which tracks the number of
> top-level transactions with xids (i.e. which may have modified the database)
> that completed in some form since the start of the server.
>
> We can avoid rebuilding the snapshot's contents whenever the current
> xactCompletionCount is the same as it was when the snapshot was
> originally built.  Currently this check happens while holding
> ProcArrayLock. While it's likely possible to perform the check without
> acquiring ProcArrayLock, it seems better to do that separately /
> later, some careful analysis is required. Even with the lock this is a
> significant win on its own.
>
> On a smaller two socket machine this gains another ~1.03x, on a larger
> machine the effect is roughly double (earlier patch version tested
> though).  If we were able to safely avoid the lock there'd be another
> significant gain on top of that.

spurfowl and more animals are telling us that this commit has broken
2PC:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spurfowl&dt=2020-08-18%2004%3A31%3A11
--
Michael

Вложения

Re: pgsql: snapshot scalability: cache snapshots using a xact completion co

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> snapshot scalability: cache snapshots using a xact completion counter.

buildfarm doesn't like this a bit ...

            regards, tom lane



Re: pgsql: snapshot scalability: cache snapshots using a xact completion co

От
Andres Freund
Дата:
Hi,

On 2020-08-18 00:55:22 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > snapshot scalability: cache snapshots using a xact completion counter.
> 
> buildfarm doesn't like this a bit ...

Yea, looking already. Unless that turns out to be incredibly bad luck
and only the first three animals failed (there's a few passes after), or
unless I find the issue in the next 30min or so, I'll revert.

Greetings,

Andres Freund



Re: pgsql: snapshot scalability: cache snapshots using a xact completion co

От
Andres Freund
Дата:
On 2020-08-18 13:52:46 +0900, Michael Paquier wrote:
> On Tue, Aug 18, 2020 at 04:30:21AM +0000, Andres Freund wrote:
> spurfowl and more animals are telling us that this commit has broken
> 2PC:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spurfowl&dt=2020-08-18%2004%3A31%3A11

It looks like it's a bit more subtle than outright breaking 2PC. We're
now at 3 out of 18 BF members having failed. I locally ran also quite a
few loops of the normal regression tests without finding an issue.

I'd written to Tom that I was planning to revert unless the number of
failures were lower than initially indicated. But that actually seems to
have come to pass (the failures are quicker to report because they don't
run the subsequent tests, of course).  I'd like to let the failures
accumulate a bit longer, say until tomorrow Midday if I haven't figured
it out by then. With the hope of finding some detail to help pinpoint
the issue.

Greetings,

Andres Freund



Re: pgsql: snapshot scalability: cache snapshots using a xact completion co

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> I'd written to Tom that I was planning to revert unless the number of
> failures were lower than initially indicated. But that actually seems to
> have come to pass (the failures are quicker to report because they don't
> run the subsequent tests, of course).  I'd like to let the failures
> accumulate a bit longer, say until tomorrow Midday if I haven't figured
> it out by then. With the hope of finding some detail to help pinpoint
> the issue.

There's certainly no obvious pattern here, so I agree with waiting for
more data.

            regards, tom lane