Обсуждение: Is txid_status() actually safe? / What is 011_crash_recovery.pl testing?

Поиск
Список
Период
Сортировка

Is txid_status() actually safe? / What is 011_crash_recovery.pl testing?

От
Andres Freund
Дата:
Hi Craig, Robert,


The 011_crash_recovery.pl test test starts a transaction, creates a
table, fetches the transaction's xid. Then shuts down the server in
immediate mode. It then asserts that after crash recovery the previously
assigned xid is shown as aborted, and that new xids are assigned after
the xid.

But as far as I can tell there's no guarantee that that is the case.

It only happens to work because the test - for undocumented reasons -
creates the install with $node->init(allows_streaming => 1), which in
turn restricts shared_buffers to 1MB. Which forces the test to flush WAL
to disk during the CREATE TABLE.

I see failures in the test both when I increase the 1MB or when I change
the buffer replacement logic sufficiently.

E.g.
not ok 2 - new xid after restart is greater

#   Failed test 'new xid after restart is greater'
#   at t/011_crash_recovery.pl line 61.
#     '511'
#         >
#     '511'
not ok 3 - xid is aborted after crash



Craig, it kind of looks to me like you assumed it'd be guaranteed that
the xid at this point would show in-progress?

I don't think the use of txid_status() described in the docs added in
the commit is actually ever safe?

commit 857ee8e391ff6654ef9dcc5dd8b658d7709d0a3c
Author: Robert Haas <rhaas@postgresql.org>
Date:   2017-03-24 12:00:53 -0400

    Add a txid_status function.

    If your connection to the database server is lost while a COMMIT is
    in progress, it may be difficult to figure out whether the COMMIT was
    successful or not.  This function will tell you, provided that you
    don't wait too long to ask.  It may be useful in other situations,
    too.

    Craig Ringer, reviewed by Simon Riggs and by me

    Discussion: http://postgr.es/m/CAMsr+YHQiWNEi0daCTboS40T+V5s_+dst3PYv_8v2wNVH+Xx4g@mail.gmail.com


+   <para>
+    <function>txid_status(bigint)</> reports the commit status of a recent
+    transaction.  Applications may use it to determine whether a transaction
+    committed or aborted when the application and database server become
+    disconnected while a <literal>COMMIT</literal> is in progress.
+    The status of a transaction will be reported as either
+    <literal>in progress</>,
+    <literal>committed</>, or <literal>aborted</>, provided that the
+    transaction is recent enough that the system retains the commit status
+    of that transaction.  If is old enough that no references to that
+    transaction survive in the system and the commit status information has
+    been discarded, this function will return NULL.  Note that prepared
+    transactions are reported as <literal>in progress</>; applications must
+    check <link
+    linkend="view-pg-prepared-xacts"><literal>pg_prepared_xacts</></> if they
+    need to determine whether the txid is a prepared transaction.
+   </para>

Until the commit *has completed*, nothing guarantees that anything
bearing the transaction's xid has made it to disk. And we surely don't
want to force a WAL flush when assigning a transaction id, right?

Greetings,

Andres Freund




Re: Is txid_status() actually safe? / What is 011_crash_recovery.pl testing?

От
Robert Haas
Дата:
On Mon, Feb 8, 2021 at 4:52 PM Andres Freund <andres@anarazel.de> wrote:
> The 011_crash_recovery.pl test test starts a transaction, creates a
> table, fetches the transaction's xid. Then shuts down the server in
> immediate mode. It then asserts that after crash recovery the previously
> assigned xid is shown as aborted, and that new xids are assigned after
> the xid.
>
> But as far as I can tell there's no guarantee that that is the case.

I think you are right.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Is txid_status() actually safe? / What is 011_crash_recovery.pl testing?

От
Craig Ringer
Дата:
On Wed, 10 Feb 2021 at 04:28, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Feb 8, 2021 at 4:52 PM Andres Freund <andres@anarazel.de> wrote:
> The 011_crash_recovery.pl test test starts a transaction, creates a
> table, fetches the transaction's xid. Then shuts down the server in
> immediate mode. It then asserts that after crash recovery the previously
> assigned xid is shown as aborted, and that new xids are assigned after
> the xid.
>
> But as far as I can tell there's no guarantee that that is the case.

I think you are right.


Andres, I missed this mail initially. I'll look into it shortly.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

Re: Is txid_status() actually safe? / What is 011_crash_recovery.pl testing?

От
Craig Ringer
Дата:
On Tue, 9 Feb 2021 at 05:52, Andres Freund <andres@anarazel.de> wrote:

Craig, it kind of looks to me like you assumed it'd be guaranteed that
the xid at this point would show in-progress?

At the time I wrote that code, I don't think I understood that xid assignment wasn't necessarily durable until either (a) the next checkpoint; or (b) commit of some txn with a greater xid.

IIRC I expected that after crash and recovery the tx would always be treated as aborted, because the xid had been assigned but no corresponding commit was found before end-of-recovery. No explicit abort records are written to WAL for such txns since we crashed, but the server's oldest in-progress txn threshold is used to determine that they must be aborted rather than in-progress even though their clog entries aren't set to aborted.

Which was fine as far as it went, but I failed to account for the xid assignment not necessarily being durable when the client calls txid_status().


I don't think the use of txid_status() described in the docs added in
the commit is actually ever safe?

I agree. The client can query for its xid with txid_current() but as you note there's no guarantee that the assigned xid is durable.

The client would have to ensure that an xid was assigned, then ensure that the WAL was durably flushed past the point of the xid assignment before relying on the xid.

If we do a txn that performs a small write, calls txid_current(), and sends a commit that the server crashes before completing, we can't know for sure that the xid we recorded client-side before the server crash is the same txn we check the status of after crash recovery. Some other txn could've re-used the xid after crash so long as no other txn with a greater xid durably committed before the crash.

That scenario isn't hugely likely, but it's definitely possible on systems that don't do a lot of concurrent txns or do mostly long, heavyweight txns.

The txid_status() function was originally intended to be paired with a way to report topxid assignment to the client automatically, NOTIFY or GUC_REPORT-style. But that would not make this usage safe either, unless we delayed the report until WAL was flushed past the LSN of the xid assignment *or* some other txn with a greater xid committed.

This could be made safe with a variant of txid_current() that forced the xid assignment to be logged immediately if it was not already, and did not return until WAL flushed past the point of the assignment. If the client did most of the txn's work before requesting a guaranteed-durable xid, it would in practice not land up having to wait for a flush. But we'd have to keep track of when we assigned the xid in every single topxact in order to be able to promise we'd flushed it without having to immediately force a flush. That's pointless overhead all the rest of the time, just in case someone wants to get an xid for later use with txid_status().

The simplest option with no overhead on anything that doesn't care about txid_status() is to expose a function to force flush of WAL up to the current insert LSN. Then update the docs to say you have to call it after txid_current(), and before sending your commit. But at that point you might as well use 2PC, since you're paying the same double flush and double round-trip costs. The main point of txid_status() was to avoid the cost of that double-flush.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

Re: Is txid_status() actually safe? / What is 011_crash_recovery.pl testing?

От
Craig Ringer
Дата:
On Wed, 5 May 2021 at 23:15, Craig Ringer <craig@2ndquadrant.com> wrote:

> Which was fine as far as it went, but I failed to account for the xid
> assignment not necessarily being durable when the client calls
> txid_status().

Ahem, I meant "when the client calls txid_current()"

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise