Обсуждение: XLog changes for 9.3

Поиск
Список
Период
Сортировка

XLog changes for 9.3

От
Heikki Linnakangas
Дата:
When I worked on the XLogInsert scaling patch, it became apparent that 
some changes to the WAL format would make it a lot easier. So for 9.3, 
I'd like to do some refactoring:

1. Use a 64-bit integer instead of the two-variable log/seg 
representation, for identifying a WAL segment. This has no user-visible 
effect, but makes the code a bit simpler.

2. Don't waste the last WAL segment in each logical 4GB file. Currently, 
we skip the WAL segment ending with "FF". The comments claim that 
wasting the last segment "ensures that we don't have problems 
representing last-byte-position-plus-1", but in my experience, it just 
makes things more complicated. You have two ways to represent the 
segment boundary, and some functions are picky on which one is used. For 
example, XLogWrite() assumes that when you want to flush to the end of a 
logical log file, you use the "5/FF000000" representation, not 
"6/00000000". Other functions, like XLogPageRead(), expect the latter.

This is a backwards-incompatible change for external utilities that know 
how the WAL segment numbering works. Hopefully there aren't too many of 
those around.

3. Move the only field, xl_rem_len, from the continuation record header 
straight to the xlog page header, eliminating XLogContRecord altogether. 
This makes it easier to calculate in advance how much space a WAL record 
requires, as it no longer depends on how many pages it has to be split 
across. This wastes 4-8 bytes on every xlog page, but that's not much.

4. Allow WAL record header to be split across page boundaries. 
Currently, if there are less than SizeOfXLogRecord bytes left on the 
current WAL page, it is wasted, and the next record is inserted at the 
beginning of the next page. The problem with that is again that it makes 
it impossible to know in advance exactly how much space a WAL record 
requires, because it depends on how many bytes need to be wasted at the 
end of current page.

These changes will help the XLogInsert scaling patch, by making the 
space calculations simpler. In essence, to reserve space for a WAL 
record of size X, you just need to do "bytepos += X".  There's a lot 
more details with that, like mapping from the contiguous byte position 
to an XLogRecPtr that takes page headers into account, and noticing 
RedoRecPtr changes safely, but it's a start.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: XLog changes for 9.3

От
Andres Freund
Дата:
On Thursday, June 07, 2012 03:50:35 PM Heikki Linnakangas wrote:
> When I worked on the XLogInsert scaling patch, it became apparent that
> some changes to the WAL format would make it a lot easier. So for 9.3,
> I'd like to do some refactoring:

> 1. Use a 64-bit integer instead of the two-variable log/seg
> representation, for identifying a WAL segment. This has no user-visible
> effect, but makes the code a bit simpler.
+1

We can define a sensible InvalidXLogRecPtr instead of doing that locally in 
loads of places! Yipee.

> 2. Don't waste the last WAL segment in each logical 4GB file. Currently,
> we skip the WAL segment ending with "FF". The comments claim that
> wasting the last segment "ensures that we don't have problems
> representing last-byte-position-plus-1", but in my experience, it just
> makes things more complicated. You have two ways to represent the
> segment boundary, and some functions are picky on which one is used. For
> example, XLogWrite() assumes that when you want to flush to the end of a
> logical log file, you use the "5/FF000000" representation, not
> "6/00000000". Other functions, like XLogPageRead(), expect the latter.
> 
> This is a backwards-incompatible change for external utilities that know
> how the WAL segment numbering works. Hopefully there aren't too many of
> those around.
+1

> 3. Move the only field, xl_rem_len, from the continuation record header
> straight to the xlog page header, eliminating XLogContRecord altogether.
> This makes it easier to calculate in advance how much space a WAL record
> requires, as it no longer depends on how many pages it has to be split
> across. This wastes 4-8 bytes on every xlog page, but that's not much.
+1. I don't think this will waste a measureable amount in real-world 
scenarios. A very big percentag of pages have continuation records.

> 4. Allow WAL record header to be split across page boundaries.
> Currently, if there are less than SizeOfXLogRecord bytes left on the
> current WAL page, it is wasted, and the next record is inserted at the
> beginning of the next page. The problem with that is again that it makes
> it impossible to know in advance exactly how much space a WAL record
> requires, because it depends on how many bytes need to be wasted at the
> end of current page.
+0.5. Its somewhat convenient to be able to look at a record before you have 
reassembled it over multiple pages. But its probably not worth the 
implementation complexity.
If we do that we can remove all the aligment padding as well. Which would be a 
problem for you anyway, wouldn't it?

> These changes will help the XLogInsert scaling patch, by making the
> space calculations simpler. In essence, to reserve space for a WAL
> record of size X, you just need to do "bytepos += X".  There's a lot
> more details with that, like mapping from the contiguous byte position
> to an XLogRecPtr that takes page headers into account, and noticing
> RedoRecPtr changes safely, but it's a start.
Hm. Wouldn't you need to remove short/long page headers for that as well? 


Andres

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services


Re: XLog changes for 9.3

От
Tom Lane
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> When I worked on the XLogInsert scaling patch, it became apparent that 
> some changes to the WAL format would make it a lot easier. So for 9.3, 
> I'd like to do some refactoring:

> 1. Use a 64-bit integer instead of the two-variable log/seg 
> representation, for identifying a WAL segment. This has no user-visible 
> effect, but makes the code a bit simpler.

> 2. Don't waste the last WAL segment in each logical 4GB file. Currently, 
> we skip the WAL segment ending with "FF". The comments claim that 
> wasting the last segment "ensures that we don't have problems 
> representing last-byte-position-plus-1", but in my experience, it just 
> makes things more complicated.

I think that's actually an indivisible part of point #1.  The issue in
the 32+32 representation is that you'd overflow the low-order half when
trying to represent last-byte-of-file-plus-1, and have to do something
with propagating that to the high half.  In a 64-bit continuous
addressing scheme the problem goes away, and it would just get more
complicated not less to preserve the "hole".
        regards, tom lane


Re: XLog changes for 9.3

От
Heikki Linnakangas
Дата:
On 07.06.2012 17:18, Andres Freund wrote:
> On Thursday, June 07, 2012 03:50:35 PM Heikki Linnakangas wrote:
>> 3. Move the only field, xl_rem_len, from the continuation record header
>> straight to the xlog page header, eliminating XLogContRecord altogether.
>> This makes it easier to calculate in advance how much space a WAL record
>> requires, as it no longer depends on how many pages it has to be split
>> across. This wastes 4-8 bytes on every xlog page, but that's not much.
> +1. I don't think this will waste a measureable amount in real-world
> scenarios. A very big percentag of pages have continuation records.

Yeah, although the way I'm planning to do it, you'll waste 4 bytes (on 
64-bit architectures) even when there is a continuation record, because 
of alignment:

typedef struct XLogPageHeaderData
{    uint16      xlp_magic;     /* magic value for correctness checks */    uint16      xlp_info;      /* flag bits,
seebelow */    TimeLineID  xlp_tli;       /* TimeLineID of first record on    XLogRecPtr  xlp_pageaddr;  /* XLOG
addressof this page */
 

+   uint32      xlp_rem_len;   /* bytes remaining of continued record */ } XLogPageHeaderData;

The page header is currently 16 bytes in length, so adding a 4-byte 
field to it bumps the aligned size to 24 bytes. Nevertheless, I think we 
can well live with that.

>> 4. Allow WAL record header to be split across page boundaries.
>> Currently, if there are less than SizeOfXLogRecord bytes left on the
>> current WAL page, it is wasted, and the next record is inserted at the
>> beginning of the next page. The problem with that is again that it makes
>> it impossible to know in advance exactly how much space a WAL record
>> requires, because it depends on how many bytes need to be wasted at the
>> end of current page.
> +0.5. Its somewhat convenient to be able to look at a record before you have
> reassembled it over multiple pages. But its probably not worth the
> implementation complexity.

Looking at the code, I think it'll be about the same complexity for 
XLogInsert in its current form (it will help the patch I'm working on), 
and makes ReadRecord() a bit more complicated. But not much.

> If we do that we can remove all the aligment padding as well. Which would be a
> problem for you anyway, wouldn't it?

It's not a problem. You just MAXALIGN the size of the record when you 
calculate how much space it needs, and then all records become naturally 
MAXALIGNed. We could quite easily remove the alignment on-disk if we 
wanted to, ReadRecord() already always copies the record to an aligned 
buffer, but I wasn't planning to do that.

>> These changes will help the XLogInsert scaling patch, by making the
>> space calculations simpler. In essence, to reserve space for a WAL
>> record of size X, you just need to do "bytepos += X".  There's a lot
>> more details with that, like mapping from the contiguous byte position
>> to an XLogRecPtr that takes page headers into account, and noticing
>> RedoRecPtr changes safely, but it's a start.
> Hm. Wouldn't you need to remove short/long page headers for that as well?

No, those are ok because they're predictable. Although it would make the 
mapping simpler. To convert from a contiguous xlog byte position that 
excludes all headers, to XLogRecPtr, you need to do something like this 
(I just made this up, probably has bugs, but it's about this complex):

#define UsableBytesInPage (XLOG_BLCKSZ - SizeOfXLogShortPHD)
#define UsableBytesInSegment ((XLOG_SEG_SIZE / XLOG_BLCKSZ) * 
UsableBytesInPage - (SizeOfXLogLongPHD - SizeOfXLogShortPHD)

uint64 xlogrecptr;
uint64 full_segments = bytepos / UsableBytesInSegment;
int offset_in_segment = bytepos % UsableBytesInSegment;

xlogrecptr = full_segments * XLOG_SEG_SIZE;
/* is it on the first page? */
if (offset_in_segment < XLOG_BLCKSZ - SizeOfXLogLongPHD)   xlogrecptr += SizeOfXLogLongPHD + offset_in_segment;
else
{   /* first page is fully used */   xlogrecptr += XLOG_BLCKSZ;   /* add other full pages */   offset_in_segment -=
XLOG_BLCKSZ- SizeOfXLogLongPHD;   xlogrecptr += (offset_in_segment / UsableBytesInPage) * XLOG_BLCKSZ;   /* and finally
offsetwithin the last page */   xlogrecptr += offset_in_segment % UsableBytesInPage;
 
}
/* finally convert the 64-bit xlogrecptr to a XLogRecPtr struct */
XLogRecPtr.xlogid = xlogrecptr >> 32;
XLogRecPtr.xrecoff = xlogrecptr & 0xffffffff;

Capsulated in a function, that's not too bad. But if we want to make 
that simpler, one idea would be to allocate the whole 1st page in each 
WAL segment for metadata. That way all the actual xlog pages would hold 
the same amount of xlog data.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: XLog changes for 9.3

От
Simon Riggs
Дата:
On 7 June 2012 14:50, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

> These changes will help the XLogInsert scaling patch

...and as I'm sure you're aware will junk much of the replication code
and almost certainly set back the other work that we have brewing for
9.3. So this is a very large curve ball you're throwing there.

Personally, I don't think we should do this until we have a better
regression test suite around replication and recovery because the
impact will be huge but I welcome the suggested changes themselves.

If you are going to do this in 9.3, then it has to be early in the
first Commit Fest and you'll need to be around to quickly follow
through on all of the other subsequent breakages it will cause,
otherwise every other piece of work in this area will be halted or
delayed.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: XLog changes for 9.3

От
Andres Freund
Дата:
Hi,

On Thursday, June 07, 2012 05:51:07 PM Simon Riggs wrote:
> On 7 June 2012 14:50, Heikki Linnakangas
> 
> <heikki.linnakangas@enterprisedb.com> wrote:
> > These changes will help the XLogInsert scaling patch
> 
> ...and as I'm sure you're aware will junk much of the replication code
> and almost certainly set back the other work that we have brewing for
> 9.3. So this is a very large curve ball you're throwing there.
It's not that bad. Most of that code is pretty abstracted, the changes to 
adapt to that should be less than 20 lines. And it would remove some of the 
complexity.

> Personally, I don't think we should do this until we have a better
> regression test suite around replication and recovery because the
> impact will be huge but I welcome the suggested changes themselves.
Hm. One could regard the logical rep stuff as a testsuite ;)

> If you are going to do this in 9.3, then it has to be early in the
> first Commit Fest and you'll need to be around to quickly follow
> through on all of the other subsequent breakages it will cause,
> otherwise every other piece of work in this area will be halted or
> delayed.
Yea, I would definitely welcome an early patch.

Greetings,

Andres

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services


Re: XLog changes for 9.3

От
Magnus Hagander
Дата:
On Thu, Jun 7, 2012 at 5:56 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> Hi,
>
> On Thursday, June 07, 2012 05:51:07 PM Simon Riggs wrote:
>> On 7 June 2012 14:50, Heikki Linnakangas
>>
>> <heikki.linnakangas@enterprisedb.com> wrote:
>> > These changes will help the XLogInsert scaling patch
>>
>> ...and as I'm sure you're aware will junk much of the replication code
>> and almost certainly set back the other work that we have brewing for
>> 9.3. So this is a very large curve ball you're throwing there.
> It's not that bad. Most of that code is pretty abstracted, the changes to
> adapt to that should be less than 20 lines. And it would remove some of the
> complexity.
>
>> Personally, I don't think we should do this until we have a better
>> regression test suite around replication and recovery because the
>> impact will be huge but I welcome the suggested changes themselves.
> Hm. One could regard the logical rep stuff as a testsuite ;)
>
>> If you are going to do this in 9.3, then it has to be early in the
>> first Commit Fest and you'll need to be around to quickly follow
>> through on all of the other subsequent breakages it will cause,
>> otherwise every other piece of work in this area will be halted or
>> delayed.
> Yea, I would definitely welcome an early patch.

Just as I'm sure everybody else would welcome *your* patches landing
in the first commitfest and that you all guarantee to be around
quickly follow through on all potential breakages *that* can cause.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: XLog changes for 9.3

От
Andres Freund
Дата:
On Thursday, June 07, 2012 06:02:12 PM Magnus Hagander wrote:
> On Thu, Jun 7, 2012 at 5:56 PM, Andres Freund <andres@2ndquadrant.com> 
wrote:
> > Hi,
> > 
> > On Thursday, June 07, 2012 05:51:07 PM Simon Riggs wrote:
> >> On 7 June 2012 14:50, Heikki Linnakangas
> >> 
> >> <heikki.linnakangas@enterprisedb.com> wrote:
> >> > These changes will help the XLogInsert scaling patch
> >> 
> >> ...and as I'm sure you're aware will junk much of the replication code
> >> and almost certainly set back the other work that we have brewing for
> >> 9.3. So this is a very large curve ball you're throwing there.
> > 
> > It's not that bad. Most of that code is pretty abstracted, the changes to
> > adapt to that should be less than 20 lines. And it would remove some of
> > the complexity.
> > 
> >> Personally, I don't think we should do this until we have a better
> >> regression test suite around replication and recovery because the
> >> impact will be huge but I welcome the suggested changes themselves.
> > 
> > Hm. One could regard the logical rep stuff as a testsuite ;)
> > 
> >> If you are going to do this in 9.3, then it has to be early in the
> >> first Commit Fest and you'll need to be around to quickly follow
> >> through on all of the other subsequent breakages it will cause,
> >> otherwise every other piece of work in this area will be halted or
> >> delayed.
> > 
> > Yea, I would definitely welcome an early patch.
> 
> Just as I'm sure everybody else would welcome *your* patches landing
> in the first commitfest and that you all guarantee to be around
> quickly follow through on all potential breakages *that* can cause.
Agreed.

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services


Re: XLog changes for 9.3

От
Heikki Linnakangas
Дата:
On 07.06.2012 18:51, Simon Riggs wrote:
> On 7 June 2012 14:50, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com>  wrote:
>
>> These changes will help the XLogInsert scaling patch
>
> ...and as I'm sure you're aware will junk much of the replication code
> and almost certainly set back the other work that we have brewing for
> 9.3. So this is a very large curve ball you're throwing there.

I don't think this has much impact on what you're doing (although it's a 
bit hard to tell without more details). The way WAL records work is the 
same, it's just the code that lays them out on a page, and reads back 
from a page, that's changed. And that's fairly isolated in xlog.c.

> If you are going to do this in 9.3, then it has to be early in the
> first Commit Fest and you'll need to be around to quickly follow
> through on all of the other subsequent breakages it will cause,
> otherwise every other piece of work in this area will be halted or
> delayed.

Yeah, the plan is to get this in early, in the first commit fest. Not 
only because of possible breakage, but also because my ultimate goal is 
the XLogInsert refactoring, and I want do that early in the release 
cycle, too.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: XLog changes for 9.3

От
Robert Haas
Дата:
On Thu, Jun 7, 2012 at 11:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> So this is a very large curve ball you're throwing there.

This is not exactly unexpected.  At least the first two of these items
were previous discussed in the context of the XLOG scaling patch, many
months ago.  It shouldn't come as a surprise to anyone that Heikki is
planning to continue to work on that patch even though it didn't make
9.2.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: XLog changes for 9.3

От
Andres Freund
Дата:
On Thursday, June 07, 2012 05:35:11 PM Heikki Linnakangas wrote:
> On 07.06.2012 17:18, Andres Freund wrote:
> > On Thursday, June 07, 2012 03:50:35 PM Heikki Linnakangas wrote:
> >> 3. Move the only field, xl_rem_len, from the continuation record header
> >> straight to the xlog page header, eliminating XLogContRecord altogether.
> >> This makes it easier to calculate in advance how much space a WAL record
> >> requires, as it no longer depends on how many pages it has to be split
> >> across. This wastes 4-8 bytes on every xlog page, but that's not much.
> > 
> > +1. I don't think this will waste a measureable amount in real-world
> > scenarios. A very big percentag of pages have continuation records.
> 
> Yeah, although the way I'm planning to do it, you'll waste 4 bytes (on
> 64-bit architectures) even when there is a continuation record, because
> of alignment:
> 
> typedef struct XLogPageHeaderData
> {
>      uint16      xlp_magic;     /* magic value for correctness checks */
>      uint16      xlp_info;      /* flag bits, see below */
>      TimeLineID  xlp_tli;       /* TimeLineID of first record on
>      XLogRecPtr  xlp_pageaddr;  /* XLOG address of this page */
> 
> +   uint32      xlp_rem_len;   /* bytes remaining of continued record */
>   } XLogPageHeaderData;
> 
> The page header is currently 16 bytes in length, so adding a 4-byte
> field to it bumps the aligned size to 24 bytes. Nevertheless, I think we
> can well live with that.
At that point we can just do the
#define SizeofXLogPageHeaderData (offsetof(XLogPageHeaderData, xlp_pageaddr) + 
sizeof(uint32))
dance. If the record can be smeared over two pages there is no point in 
storing it aligned. Then we don't waste any additional space in comparison to 
the current state.

> > If we do that we can remove all the aligment padding as well. Which would
> > be a problem for you anyway, wouldn't it?
> It's not a problem. You just MAXALIGN the size of the record when you
> calculate how much space it needs, and then all records become naturally
> MAXALIGNed. We could quite easily remove the alignment on-disk if we
> wanted to, ReadRecord() already always copies the record to an aligned
> buffer, but I wasn't planning to do that.
Whats the reasoning for having alignment on disk if the records aren't stored 
continually?

> >> These changes will help the XLogInsert scaling patch, by making the
> >> space calculations simpler. In essence, to reserve space for a WAL
> >> record of size X, you just need to do "bytepos += X".  There's a lot
> >> more details with that, like mapping from the contiguous byte position
> >> to an XLogRecPtr that takes page headers into account, and noticing
> >> RedoRecPtr changes safely, but it's a start.
> > 
> > Hm. Wouldn't you need to remove short/long page headers for that as well?
> 
> No, those are ok because they're predictable.
I haven't read your scalability patch, so I am not really sure what you 
need...
The "bytepos += X" from above isn't as easy that way. But yes, its not that 
complicated.

> Although it would make the
> mapping simpler. To convert from a contiguous xlog byte position that
> excludes all headers, to XLogRecPtr, you need to do something like this
> (I just made this up, probably has bugs, but it's about this complex):
> 
> #define UsableBytesInPage (XLOG_BLCKSZ - SizeOfXLogShortPHD)
> #define UsableBytesInSegment ((XLOG_SEG_SIZE / XLOG_BLCKSZ) *
> UsableBytesInPage - (SizeOfXLogLongPHD - SizeOfXLogShortPHD)
> 
> uint64 xlogrecptr;
> uint64 full_segments = bytepos / UsableBytesInSegment;
> int offset_in_segment = bytepos % UsableBytesInSegment;
> 
> xlogrecptr = full_segments * XLOG_SEG_SIZE;
> /* is it on the first page? */
> if (offset_in_segment < XLOG_BLCKSZ - SizeOfXLogLongPHD)
>     xlogrecptr += SizeOfXLogLongPHD + offset_in_segment;
> else
> {
>     /* first page is fully used */
>     xlogrecptr += XLOG_BLCKSZ;
>     /* add other full pages */
>     offset_in_segment -= XLOG_BLCKSZ - SizeOfXLogLongPHD;
>     xlogrecptr += (offset_in_segment / UsableBytesInPage) * XLOG_BLCKSZ;
>     /* and finally offset within the last page */
>     xlogrecptr += offset_in_segment % UsableBytesInPage;
> }
> /* finally convert the 64-bit xlogrecptr to a XLogRecPtr struct */
> XLogRecPtr.xlogid = xlogrecptr >> 32;
> XLogRecPtr.xrecoff = xlogrecptr & 0xffffffff;
Its a bit more complicated than that, records can span a good bit more than 
just two pages (even more than two segments) and you need to decide for every 
of those whether it has a long or a short header.

> Capsulated in a function, that's not too bad. But if we want to make
> that simpler, one idea would be to allocate the whole 1st page in each
> WAL segment for metadata. That way all the actual xlog pages would hold
> the same amount of xlog data.
Its a bit easier then, but you probably still need to loop over the size and 
subtract till you reached the final point. Its no problem to produce a 100MB 
wal record. But then thats probably nothing to design for.

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services


Re: XLog changes for 9.3

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> dance. If the record can be smeared over two pages there is no point in 
> storing it aligned.

I think this is not true.  The value of requiring alignment is that you
can read the record-length field without first having to copy it somewhere.
In particular, it will get really ugly if the record length field itself
could cross a page boundary.  I think we want to be able to determine
the record length before we do any data copying, so that we can malloc
the record buffer and then just do one copy step.

The real reason for the current behavior of not letting the record
header get split across multiple pages is so that the length field is
guaranteed to be in the first page.  We can still guarantee that if
we (1) put the length field first and (2) require at least int32
alignment.  I think losing that property will be pretty bad though.
        regards, tom lane


Re: XLog changes for 9.3

От
Andres Freund
Дата:
On Thursday, June 07, 2012 06:53:58 PM Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > dance. If the record can be smeared over two pages there is no point in
> > storing it aligned.
> 
> I think this is not true.  The value of requiring alignment is that you
> can read the record-length field without first having to copy it somewhere.
> In particular, it will get really ugly if the record length field itself
> could cross a page boundary.  I think we want to be able to determine
> the record length before we do any data copying, so that we can malloc
> the record buffer and then just do one copy step.
Hm, I had assumed the record would get copied into a temp/static buffer first 
and only get reassembled together with the data afterwards.
But if thats not the way to go, sure, storing it aligned so that the length 
can always be read aligned within a page is sensible.

Andres


Re: XLog changes for 9.3

От
Simon Riggs
Дата:
On 7 June 2012 17:12, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 07.06.2012 18:51, Simon Riggs wrote:
>>
>> On 7 June 2012 14:50, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com>  wrote:
>>
>>> These changes will help the XLogInsert scaling patch
>>
>>
>> ...and as I'm sure you're aware will junk much of the replication code
>> and almost certainly set back the other work that we have brewing for
>> 9.3. So this is a very large curve ball you're throwing there.
>
>
> I don't think this has much impact on what you're doing (although it's a bit
> hard to tell without more details). The way WAL records work is the same,
> it's just the code that lays them out on a page, and reads back from a page,
> that's changed. And that's fairly isolated in xlog.c.

I wasn't worried about the code overlap, but the subsidiary breakage
looks pretty enormous to me.

Anything changing filenames will break every HA config anybody has
anywhere. So you can pretty much kiss goodbye to the idea of
pg_upgrade. For me, this one thing alone is sufficient to force next
release to be 10.0.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: XLog changes for 9.3

От
Andres Freund
Дата:
On Thursday, June 07, 2012 07:03:32 PM Simon Riggs wrote:
> On 7 June 2012 17:12, Heikki Linnakangas
> 
> <heikki.linnakangas@enterprisedb.com> wrote:
> > On 07.06.2012 18:51, Simon Riggs wrote:
> >> On 7 June 2012 14:50, Heikki Linnakangas
> >> 
> >> <heikki.linnakangas@enterprisedb.com>  wrote:
> >>> These changes will help the XLogInsert scaling patch
> >> 
> >> ...and as I'm sure you're aware will junk much of the replication code
> >> and almost certainly set back the other work that we have brewing for
> >> 9.3. So this is a very large curve ball you're throwing there.
> > 
> > I don't think this has much impact on what you're doing (although it's a
> > bit hard to tell without more details). The way WAL records work is the
> > same, it's just the code that lays them out on a page, and reads back
> > from a page, that's changed. And that's fairly isolated in xlog.c.
> I wasn't worried about the code overlap, but the subsidiary breakage
> looks pretty enormous to me.
The xlog arithmetic will still be encapsulated, so not much difference there. 
Removing reading of XLogContRecord isn't complicated and would result in less 
code. Shouldn't be much more than that.

> Anything changing filenames will break every HA config anybody has
> anywhere. So you can pretty much kiss goodbye to the idea of
> pg_upgrade. For me, this one thing alone is sufficient to force next
> release to be 10.0.
Hm? Wal isn't relevant for pg_upgrade. And the HA setups should rely on 
archive_command and such and not do computation of the next/last name. I would 
guess removing that corner-case actually fixes more tools than it breaks.

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services


Re: XLog changes for 9.3

От
"Kevin Grittner"
Дата:
Simon Riggs <simon@2ndQuadrant.com> wrote:
> Anything changing filenames will break every HA config anybody has
> anywhere.
It will impact our scripts related to backup and archiving, but I
think we're talking about two or three staff days to cover it in our
shop.
We should definitely make sure that this change is conspicuously
noted.  The scariest part is that there will now be files that
matter with names that previously didn't exist, so lack of action
will cause failure to capture a usable backup.  I don't know that it
merits a bump to 10.0, though.  We test every backup for usability,
as I believe any shop should; failure to cover this should cause
pretty obvious errors pretty quickly if you are testing your
backups.
-Kevin


Re: XLog changes for 9.3

От
Robert Haas
Дата:
On Thu, Jun 7, 2012 at 1:15 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Simon Riggs <simon@2ndQuadrant.com> wrote:
>
>> Anything changing filenames will break every HA config anybody has
>> anywhere.
>
> It will impact our scripts related to backup and archiving, but I
> think we're talking about two or three staff days to cover it in our
> shop.
>
> We should definitely make sure that this change is conspicuously
> noted.  The scariest part is that there will now be files that
> matter with names that previously didn't exist, so lack of action
> will cause failure to capture a usable backup.

But if you're just using regexp matching against pathnames, your tool
will be just fine.  Do your tools actually rely on the occasional
absence of a file in what would otherwise be the usual sequence of
files?

...Robert


Re: XLog changes for 9.3

От
"Kevin Grittner"
Дата:
Robert Haas <robertmhaas@gmail.com> wrote:
> But if you're just using regexp matching against pathnames, your
> tool will be just fine.  Do your tools actually rely on the
> occasional absence of a file in what would otherwise be the usual
> sequence of files?
To save "snapshot" backups for the long term, we generate a list of
the specific WAL files needed to reach a consistent recovery point
from a given base backup.  We keep monthly snapshot backups for a
year.  We currently determine the first and last file needed, and
then create a list of all the WAL files to save.  We error out if
any are missing, so we do skip the FF file.
-Kevin


Re: XLog changes for 9.3

От
Robert Haas
Дата:
On Thu, Jun 7, 2012 at 1:40 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>
>> But if you're just using regexp matching against pathnames, your
>> tool will be just fine.  Do your tools actually rely on the
>> occasional absence of a file in what would otherwise be the usual
>> sequence of files?
>
> To save "snapshot" backups for the long term, we generate a list of
> the specific WAL files needed to reach a consistent recovery point
> from a given base backup.  We keep monthly snapshot backups for a
> year.  We currently determine the first and last file needed, and
> then create a list of all the WAL files to save.  We error out if
> any are missing, so we do skip the FF file.

OK, I see.  Still, I think there are a lot of people who don't do
anything that complex, and won't be affected.  But I agree we had
better clearly release-note it as an incompatibility.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: XLog changes for 9.3

От
Tom Lane
Дата:
Simon Riggs <simon@2ndQuadrant.com> writes:
> Anything changing filenames will break every HA config anybody has
> anywhere.

This seems like nonsense to me.  How many external scripts are likely to
know that we skip the FF page?  There might be some, but not many.

> So you can pretty much kiss goodbye to the idea of pg_upgrade.

And that is certainly nonsense.  I don't think pg_upgrade even knows
about this, and if it does we can surely fix it.

> For me, this one thing alone is sufficient to force next release to be
> 10.0.

Huh?  We make incompatible changes in major versions all the time.
This one does not appear to me to be worse than many others.
        regards, tom lane


Re: XLog changes for 9.3

От
Simon Riggs
Дата:
On 7 June 2012 19:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> Anything changing filenames will break every HA config anybody has
>> anywhere.
>
> This seems like nonsense to me.  How many external scripts are likely to
> know that we skip the FF page?  There might be some, but not many.

If that is the only change in filenames, then all is forgiven.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: XLog changes for 9.3

От
Tom Lane
Дата:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On 7 June 2012 19:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This seems like nonsense to me. �How many external scripts are likely to
>> know that we skip the FF page? �There might be some, but not many.

> If that is the only change in filenames, then all is forgiven.

Oh, now I see what you're on about.  Yes, I agree that we should
maintain the same formatting of WAL segment file names, even though
it will be rather artificial in the 64-bit-arithmetic world.  The
only externally visible change should be the creation of FF-numbered
files where formerly those were skipped.
        regards, tom lane


Re: XLog changes for 9.3

От
Bruce Momjian
Дата:
On Thu, Jun 07, 2012 at 02:52:04PM -0400, Tom Lane wrote:
> > So you can pretty much kiss goodbye to the idea of pg_upgrade.
> 
> And that is certainly nonsense.  I don't think pg_upgrade even knows
> about this, and if it does we can surely fix it.

pg_upgrade doesn't know anything about xlog files --- all its interaction
in that area is through pg_resetxlog and it doesn't look at the xlog
details.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +