Обсуждение: pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

Поиск
Список
Период
Сортировка

pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

От
Pavan Deolasee
Дата:
Hello,

While investigating some issue, I found that pg_xlogdump fails to dump contents from a WAL file if the file has continuation data from previous WAL record and the data spans more than one page. In such cases, XLogFindNextRecord() fails to take into account that there will be more than one xlog page headers (long and short) and thus tries to read from an offset where no valid record exists. That results in pg_xlogdump throwing error such as:

pg_xlogdump: FATAL:  could not find a valid record after 0/46000000
 
Attached WAL file from master branch demonstrates the issue, generated using synthetic data. Also, attached patch fixes it for me.

While we could have deduced the number of short and long headers and skipped directly to the offset, I found reading one page at a time and using XLogPageHeaderSize() to find header size of each page separately, a much cleaner way. Also, the continuation data is not going to span many pages. So I don't see any harm in doing it that way. 

I encountered this on 9.3, but the patch applies to both 9.3 and master. I haven't tested it on other branches, but I have no reason to believe it won't apply or work. I believe we should back patch it all supported branches.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

Re: pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

От
Michael Paquier
Дата:
On Wed, Mar 23, 2016 at 3:43 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
> While investigating some issue, I found that pg_xlogdump fails to dump
> contents from a WAL file if the file has continuation data from previous WAL
> record and the data spans more than one page. In such cases,
> XLogFindNextRecord() fails to take into account that there will be more than
> one xlog page headers (long and short) and thus tries to read from an offset
> where no valid record exists. That results in pg_xlogdump throwing error
> such as:

OK. That makes sense, that is indeed a possible scenario.

> Attached WAL file from master branch demonstrates the issue, generated using
> synthetic data. Also, attached patch fixes it for me.

So does it for me.

> While we could have deduced the number of short and long headers and skipped
> directly to the offset, I found reading one page at a time and using
> XLogPageHeaderSize() to find header size of each page separately, a much
> cleaner way. Also, the continuation data is not going to span many pages. So
> I don't see any harm in doing it that way.

I have to agree, the new code is pretty clean this way by calculating
the next page LSN directly in the loop should the record be longer
than it.

> I encountered this on 9.3, but the patch applies to both 9.3 and master. I
> haven't tested it on other branches, but I have no reason to believe it
> won't apply or work. I believe we should back patch it all supported
> branches.

Agreed, that's a bug, and the logic of your patch looks good to me.

+       /*
+        * Compute targetRecOff. It should typically be greater than short
+        * page-header since a valid record can't , but can also be zero when
+        * caller has supplied a page-aligned address or when we are skipping
+        * multi-page continuation record. It doesn't matter though because
+        * ReadPageInternal() will read at least short page-header worth of
+        * data
+        */
This needs some polishing, there is an unfinished sentence here.

+       targetRecOff = tmpRecPtr % XLOG_BLCKSZ;
targetRecOff, pageHeaderSize and targetPagePtr could be declared
inside directly the new while loop.
-- 
Michael



Re: pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

От
Pavan Deolasee
Дата:


On Wed, Mar 23, 2016 at 1:13 PM, Michael Paquier <michael.paquier@gmail.com> wrote:

+       /*
+        * Compute targetRecOff. It should typically be greater than short
+        * page-header since a valid record can't , but can also be zero when
+        * caller has supplied a page-aligned address or when we are skipping
+        * multi-page continuation record. It doesn't matter though because
+        * ReadPageInternal() will read at least short page-header worth of
+        * data
+        */
This needs some polishing, there is an unfinished sentence here.

+       targetRecOff = tmpRecPtr % XLOG_BLCKSZ;
targetRecOff, pageHeaderSize and targetPagePtr could be declared
inside directly the new while loop.

Thanks Michael for reviewing the patch. I've fixed these issues and new version is attached.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

Re: pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

От
Michael Paquier
Дата:
On Wed, Mar 23, 2016 at 7:04 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
>
>
> On Wed, Mar 23, 2016 at 1:13 PM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
>>
>>
>> +       /*
>> +        * Compute targetRecOff. It should typically be greater than short
>> +        * page-header since a valid record can't , but can also be zero
>> when
>> +        * caller has supplied a page-aligned address or when we are
>> skipping
>> +        * multi-page continuation record. It doesn't matter though
>> because
>> +        * ReadPageInternal() will read at least short page-header worth
>> of
>> +        * data
>> +        */
>> This needs some polishing, there is an unfinished sentence here.
>>
>> +       targetRecOff = tmpRecPtr % XLOG_BLCKSZ;
>> targetRecOff, pageHeaderSize and targetPagePtr could be declared
>> inside directly the new while loop.
>
>
> Thanks Michael for reviewing the patch. I've fixed these issues and new
> version is attached.

I'd just add dots at the end of the sentences in the comment blocks
because that's project style, but I'm being picky, except that the
logic looks quite good.
-- 
Michael



Re: pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

От
Pavan Deolasee
Дата:


On Wed, Mar 23, 2016 at 6:16 PM, Michael Paquier <michael.paquier@gmail.com> wrote:


I'd just add dots at the end of the sentences in the comment blocks
because that's project style, but I'm being picky, except that the
logic looks quite good.

Since this is a bug affecting all stable branches, IMHO it will be a good idea to fix this before the upcoming minor releases.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

От
Andres Freund
Дата:
On 2016-03-28 13:21:41 +0530, Pavan Deolasee wrote:
> On Wed, Mar 23, 2016 at 6:16 PM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
> 
> >
> >
> > I'd just add dots at the end of the sentences in the comment blocks
> > because that's project style, but I'm being picky, except that the
> > logic looks quite good.
> >
> 
> Since this is a bug affecting all stable branches, IMHO it will be a good
> idea to fix this before the upcoming minor releases.

It's definitely too late for that; they're going to be wrapped in a
couple hours.

Andres



Re: pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

От
Michael Paquier
Дата:
On Mon, Mar 28, 2016 at 10:15 PM, Andres Freund <andres@anarazel.de> wrote:
> It's definitely too late for that; they're going to be wrapped in a
> couple hours.

I have added this patch to the next CF so as we do not lose track of this bug:
https://commitfest.postgresql.org/10/593/
-- 
Michael



Re: pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

От
Craig Ringer
Дата:
On 23 March 2016 at 18:04, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:


On Wed, Mar 23, 2016 at 1:13 PM, Michael Paquier <michael.paquier@gmail.com> wrote:

+       /*
+        * Compute targetRecOff. It should typically be greater than short
+        * page-header since a valid record can't , but can also be zero when
+        * caller has supplied a page-aligned address or when we are skipping
+        * multi-page continuation record. It doesn't matter though because
+        * ReadPageInternal() will read at least short page-header worth of
+        * data
+        */
This needs some polishing, there is an unfinished sentence here.

+       targetRecOff = tmpRecPtr % XLOG_BLCKSZ;
targetRecOff, pageHeaderSize and targetPagePtr could be declared
inside directly the new while loop.

Thanks Michael for reviewing the patch. I've fixed these issues and new version is attached.


Can you describe the process used to generate the sample WAL segment?

I'd like to turn it into a TAP test to go along with the patch. 



--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

От
Craig Ringer
Дата:
On 23 March 2016 at 18:04, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:


On Wed, Mar 23, 2016 at 1:13 PM, Michael Paquier <michael.paquier@gmail.com> wrote:

+       /*
+        * Compute targetRecOff. It should typically be greater than short
+        * page-header since a valid record can't , but can also be zero when
+        * caller has supplied a page-aligned address or when we are skipping
+        * multi-page continuation record. It doesn't matter though because
+        * ReadPageInternal() will read at least short page-header worth of
+        * data
+        */
This needs some polishing, there is an unfinished sentence here.

+       targetRecOff = tmpRecPtr % XLOG_BLCKSZ;
targetRecOff, pageHeaderSize and targetPagePtr could be declared
inside directly the new while loop.

Thanks Michael for reviewing the patch. I've fixed these issues and new version is attached.

Looks sensible to me based on a reading of "git diff -w" of the applied patch. It passes make check and make -C src/test/recovery check . Marked ready for committer; I'd like to add a TAP test for it, but it's ready to go without that.



--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

От
Pavan Deolasee
Дата:
On Thu, Mar 31, 2016 at 6:27 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

Can you describe the process used to generate the sample WAL segment?

 
Shame that I can't find the sql file used to create the problematic WAL segment. But this is what I did.

I wrote a plpgsql function which inserts rows in a loop, every time checking if the remaining space in the WAL segment  has fallen to less than couple of hundred bytes. Of course, I used pg_settings to get the WAL segment size, WAL page size and pg_current_xlog_insert_location() to correctly compute remaining bytes in the WAL segment. At this point, do a non-HOT update, preferably to table which is already fully vacuumed and CHECKPOINTed to avoid getting any other WAL records in between. Assuming FPW is turned ON, the UPDATE should generate enough WAL to cross over the first page in the new WAL segment.

Let me know if you would like to me to put together a sql based on this description.

Thanks,
Pavan
 
--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

От
Andres Freund
Дата:
On 2016-03-31 09:41:46 +0530, Pavan Deolasee wrote:
> On Thu, Mar 31, 2016 at 6:27 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> >
> >
> > Can you describe the process used to generate the sample WAL segment?
> >
> >
> Shame that I can't find the sql file used to create the problematic WAL
> segment. But this is what I did.

It's probably easier to just generate a humongous commit record. You can
do so by having a *lot* of subtransactions. Relatively easy to do with
plpgsql by creating them in a loop (SELECT txid_current() in EXCEPTION
bearing block ought to suffice).

Greetings,

Andres Freund



Re: pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

От
Craig Ringer
Дата:
On 31 March 2016 at 16:13, Andres Freund <andres@anarazel.de> wrote:
 
It's probably easier to just generate a humongous commit record. You can
do so by having a *lot* of subtransactions. Relatively easy to do with
plpgsql by creating them in a loop (SELECT txid_current() in EXCEPTION
bearing block ought to suffice).


This does the trick and does it quickly on 9.4:

CREATE TABLE trivial(padding text);
ALTER TABLE trivial ALTER COLUMN padding SET STORAGE plain;

DO
LANGUAGE plpgsql
$$
DECLARE
  wal_seg_size integer;
  remaining_size integer;
BEGIN
  wal_seg_size := (select setting from pg_settings where name = 'wal_segment_size')::integer
                * (select setting from pg_settings where name = 'wal_block_size')::integer;
  LOOP
    SELECT wal_seg_size - file_offset FROM pg_xlogfile_name_offset(pg_current_xlog_insert_location()) INTO remaining_size;
    IF remaining_size < 8192
    THEN
        -- Make a nice big commit record right at the end of the segment
        EXIT;
    ELSE
        BEGIN
          -- About 4k
          INSERT INTO trivial(padding) VALUES (repeat('0123456789abcdef', 256));
        EXCEPTION
          WHEN division_by_zero THEN
            CONTINUE;
        END;
    END IF;
  END LOOP;
END;
$$;


I had no issue reproducing the bug on 9.4, but I don't see it in 9.6. At least, not this one, not yet. 

Still, we might want to backpatch the fix; it seems clear enough even if I can't reproduce the issue yet. It doesn't look like it'll affect logical decoding since the snapshot builder has its own unrelated logic for finding the startpoint for decoding, and it certainly doesn't affect WAL replay otherwise we would've seen the fireworks much earlier. The upside is that also makes the backpatch much safer.

I was surprised to see that there are no tests for pg_xlogdump to add this on to, so I've added a trivial test. I've got some more complete xlogdump-helper code in the failover slots tests that I should extract and add to PostgresNode.pm but this'll do in the mean time and is much simpler.

In the process I noticed that pg_xlogdump doesn't follow the rules of the rest of the codebase when it comes to command line handling with --version, returning nonzero on bad options, etc. It might be good to fix that; there's a small BC impact, but I doubt anyone cares when it comes to pg_xlogdump.

I'll attach the new testcase once I either get it to reproduce this bug or give up and leave the basic xlogdump testcase alone.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

От
Craig Ringer
Дата:
On 1 April 2016 at 21:30, Craig Ringer <craig@2ndquadrant.com> wrote:
 
I'll attach the new testcase once I either get it to reproduce this bug or give up and leave the basic xlogdump testcase alone.

I had another bash at this and I still can't reproduce it on master using the giant commit record approach Andres suggested. In fact I generated a commit record larger than an entire xlog segment and it was still fine.

The DO procedure I posted upthread, when run on 9.4, reliably produces segments that the xlogreader cannot decode with the symptoms Pavan reported. It's fine on 9.6.

So I can't reproduce this on 9.6, but there might be a separate bug on 9.4.

I've attached a patch with the simple tests I added for pg_xlogdump as part of this. I doubt it'd be desirable to commit the ridiculous commit record part, but that's trivially removed, and I left it in place in case someone else wanted to fiddle with other ways to reproduce this.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

Re: pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

От
Fujii Masao
Дата:
On Wed, Mar 23, 2016 at 7:04 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
>
>
> On Wed, Mar 23, 2016 at 1:13 PM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
>>
>>
>> +       /*
>> +        * Compute targetRecOff. It should typically be greater than short
>> +        * page-header since a valid record can't , but can also be zero
>> when
>> +        * caller has supplied a page-aligned address or when we are
>> skipping
>> +        * multi-page continuation record. It doesn't matter though
>> because
>> +        * ReadPageInternal() will read at least short page-header worth
>> of
>> +        * data
>> +        */
>> This needs some polishing, there is an unfinished sentence here.
>>
>> +       targetRecOff = tmpRecPtr % XLOG_BLCKSZ;
>> targetRecOff, pageHeaderSize and targetPagePtr could be declared
>> inside directly the new while loop.
>
>
> Thanks Michael for reviewing the patch. I've fixed these issues and new
> version is attached.

The patch looks good to me. Barring any objections,
I'll push this and back-patch to 9.3 where pg_xlogdump was added.

Regards,

-- 
Fujii Masao



Re: pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

От
Fujii Masao
Дата:
On Fri, Aug 26, 2016 at 10:03 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Mar 23, 2016 at 7:04 PM, Pavan Deolasee
> <pavan.deolasee@gmail.com> wrote:
>>
>>
>> On Wed, Mar 23, 2016 at 1:13 PM, Michael Paquier <michael.paquier@gmail.com>
>> wrote:
>>>
>>>
>>> +       /*
>>> +        * Compute targetRecOff. It should typically be greater than short
>>> +        * page-header since a valid record can't , but can also be zero
>>> when
>>> +        * caller has supplied a page-aligned address or when we are
>>> skipping
>>> +        * multi-page continuation record. It doesn't matter though
>>> because
>>> +        * ReadPageInternal() will read at least short page-header worth
>>> of
>>> +        * data
>>> +        */
>>> This needs some polishing, there is an unfinished sentence here.
>>>
>>> +       targetRecOff = tmpRecPtr % XLOG_BLCKSZ;
>>> targetRecOff, pageHeaderSize and targetPagePtr could be declared
>>> inside directly the new while loop.
>>
>>
>> Thanks Michael for reviewing the patch. I've fixed these issues and new
>> version is attached.
>
> The patch looks good to me. Barring any objections,
> I'll push this and back-patch to 9.3 where pg_xlogdump was added.

Pushed. Thanks!

Regards,

-- 
Fujii Masao