Обсуждение: SIGQUIT and lost sequence WAL records

Поиск
Список
Период
Сортировка

SIGQUIT and lost sequence WAL records

От
Michael Paquier
Дата:
Hi all,

Lately a couple of colleagues have pinged me regarding the fact that a
server signaled with SIGQUIT, or "pg_ctl stop -m immediate", may
ignore sequence WAL records. See for example the script attached
seqtest.py (simplified test case provided from my colleague Nikhil
Deshpande), that will fail while for example the following command is
run in parallel:
for i in {1..1000}; do pg_ctl restart -m immediate -w; sleep 1; done
The failure happening being for example:
AssertionError: DB returned reused sequence value prev=1118 curr=958
Here it means that the last return value of nextval() was 958, even if
the highest value ever fetched was 1118. Note that this test script
uses psycopg2, but that it does *not* commit transactions on purpose.
Also note that those WAL records do not disappear all the time.
Sometimes they are here and the sequence keeps increasing monotically
after server is restarted.

Trying to find out if there were records ignored I added some logs in
the server thanks to the patch attached. In the case of this last
failure there have been a couple of records after the last one found
at recovery:
WARNING:  Logged new sequence value at 0/1505570 for nextval = 957
WARNING:  Logged new sequence value at 0/15066D8 for nextval = 990
WARNING:  Logged new sequence value at 0/1507828 for nextval = 1023
WARNING:  Logged new sequence value at 0/1508990 for nextval = 1056
WARNING:  Logged new sequence value at 0/1509AE0 for nextval = 1089
WARNING:  Logged new sequence value at 0/150AC48 for nextval = 1122
And if I looked at the WAL records those were not of course not
present. That's not really surprising as recovery didn't use them to
restore the sequence up to where it should have.

func.sgml mentions that a nextval() operation is *never* rolled back
even if a transaction is aborted, though in this case this is exactly
what is happening, even if when the server is stopped the existing
transactions running are forcibly aborted.

I am lacking of fuel to look more at that today, but at this point I
wanted to get some input to see if I am missing something obvious. So,
code bug or documentation bug?
--
Michael

Вложения

Re: SIGQUIT and lost sequence WAL records

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> Lately a couple of colleagues have pinged me regarding the fact that a
> server signaled with SIGQUIT, or "pg_ctl stop -m immediate", may
> ignore sequence WAL records.

I've not gone through this in detail, but I think what is happening
is that since the test case never commits any transactions, the
sequence-advance WAL records may not get flushed to disk, and thus
of course are not available for replay.  This is not a bug IMO, as
by the exact same token, no uses of those nextval() values can appear
in any committed tuples, so no database inconsistency is possible.

If an application is using nextval() results *externally to the
database*, it's unsafe for it to rely on those values being unique
unless/until it commits the nextval() calls.

I seem to recall some past discussions about whether a transaction
that commits after writing only a sequence advance (that is, it
did nextval() and nothing else) needs to flush WAL.  I think at one
point it did not but we changed it because of this consideration.

[ digs around... ]  That was a long time ago: see 01747692f.

            regards, tom lane

Re: SIGQUIT and lost sequence WAL records

От
Michael Paquier
Дата:
On Wed, Feb 17, 2016 at 11:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> Lately a couple of colleagues have pinged me regarding the fact that a
>> server signaled with SIGQUIT, or "pg_ctl stop -m immediate", may
>> ignore sequence WAL records.
>
> I've not gone through this in detail, but I think what is happening
> is that since the test case never commits any transactions, the
> sequence-advance WAL records may not get flushed to disk, and thus
> of course are not available for replay.  This is not a bug IMO, as
> by the exact same token, no uses of those nextval() values can appear
> in any committed tuples, so no database inconsistency is possible.
>
> If an application is using nextval() results *externally to the
> database*, it's unsafe for it to rely on those values being unique
> unless/until it commits the nextval() calls.

That's for the confirmation. Yeah I already suggested those folks to
issue a commit to ensure that nextval() remains consistent.

> I seem to recall some past discussions about whether a transaction
> that commits after writing only a sequence advance (that is, it
> did nextval() and nothing else) needs to flush WAL.  I think at one
> point it did not but we changed it because of this consideration.
>
> [ digs around... ]  That was a long time ago: see 01747692f.

That's a part of history...

The documentation is stating the following regarding this behavior:
       <important>
        <para>
         To avoid blocking concurrent transactions that obtain numbers from the
         same sequence, a <function>nextval</function> operation is never
         rolled back; that is, once a value has been fetched it is considered
         used, even if the transaction that did the
         <function>nextval</function> later aborts.  This means that aborted
         transactions might leave unused <quote>holes</quote> in the sequence
         of assigned values.
        </para>
       </important>
Do you think we should add a mention regarding SIGQUIT/stop-immediate
regarding the fact that the application needs to commit the
transaction that used nextval() with synchronous_commit = on to keep
it consistent at recovery?
--
Michael

Re: SIGQUIT and lost sequence WAL records

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> Do you think we should add a mention regarding SIGQUIT/stop-immediate
> regarding the fact that the application needs to commit the
> transaction that used nextval() with synchronous_commit = on to keep
> it consistent at recovery?

Meh.  In the normal case where you just do "SELECT nextval()" and use the
result externally, there's going to be an implicit commit before you get
the data back.  So I think you really have to be taking deliberate aim at
your foot in order to get burnt like this.

            regards, tom lane