Обсуждение: Adding SMGR discriminator to buffer tags

Поиск
Список
Период
Сортировка

Adding SMGR discriminator to buffer tags

От
Thomas Munro
Дата:
Hello hackers,

On another thread, lots of undo log-related patches have been traded.
Buried deep in the stack is one that I'd like to highlight and discuss
in a separate thread, because it relates to a parallel thread of
development and it'd be good to get feedback on it.

In commit 3eb77eba, Shawn Debnath and I extended the checkpointer
fsync machinery to support more kinds of files.  Next, we'd like to
teach the buffer pool to deal with more kinds of buffers.  The context
for this collaboration is that he's working on putting things like
CLOG into shared buffers, and my EDB colleagues and I are working on
putting undo logs into shared buffers.  We want a simple way to put
any block-structured stuff into shared buffers, not just plain
"relations".

The questions are: how should buffer tags distinguish different kinds
of buffers, and how should SMGR direct IO traffic to the right place
when it needs to schlepp pages in and out?

In earlier prototype code, I'd been using a special database number
for undo logs.  In a recent thread[1], Tom and others didn't like that
idea much, and Shawn mentioned his colleague's idea of stealing unused
bits from the fork number so that there is no net change in tag size,
but we have entirely separate namespaces for each kind of buffered
data.

Here's a patch that does that, and then makes changes in the main
places I have found so far that need to be aware of the new SMGR ID
field.

Thoughts?

[1]
https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BDE0mmiBZMtZyvwWtgv1sZCniSVhXYsXkvJ_Wo%2B83vvw%40mail.gmail.com

-- 
Thomas Munro
https://enterprisedb.com

Вложения

Re: Adding SMGR discriminator to buffer tags

От
Shawn Debnath
Дата:
On Wed, May 08, 2019 at 06:31:04PM +1200, Thomas Munro wrote:

> The questions are: how should buffer tags distinguish different kinds
> of buffers, and how should SMGR direct IO traffic to the right place
> when it needs to schlepp pages in and out?
> 
> In earlier prototype code, I'd been using a special database number
> for undo logs.  In a recent thread[1], Tom and others didn't like that
> idea much, and Shawn mentioned his colleague's idea of stealing unused
> bits from the fork number so that there is no net change in tag size,
> but we have entirely separate namespaces for each kind of buffered
> data.
> 
> Here's a patch that does that, and then makes changes in the main
> places I have found so far that need to be aware of the new SMGR ID
> field.

Looks good to me. Minor nit: update the comment for XLogRecGetBlockTag:

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 9196aa3aae..9ee086f00b 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -1349,12 +1353,13 @@ err:
 /*
  * Returns information about the block that a block reference refers to.
  *
- * If the WAL record contains a block reference with the given ID, *rnode,
+ * If the WAL record contains a block reference with the given ID, *smgrid, *rnode,
  * *forknum, and *blknum are filled in (if not NULL), and returns true.
  * Otherwise returns false.
  */
 bool
 XLogRecGetBlockTag(XLogReaderState *record, uint8 block_id,
+                                  SmgrId *smgrid,
                                   RelFileNode *rnode, ForkNumber *forknum, BlockNumber *blknum)
 {
        DecodedBkpBlock *bkpb;

-- 
Shawn Debnath
Amazon Web Services (AWS)



Re: Adding SMGR discriminator to buffer tags

От
Thomas Munro
Дата:
On Fri, May 10, 2019 at 8:54 AM Shawn Debnath <sdn@amazon.com> wrote:
> On Wed, May 08, 2019 at 06:31:04PM +1200, Thomas Munro wrote:
> Looks good to me. Minor nit: update the comment for XLogRecGetBlockTag:

Fixed.  Also fixed broken upgrade scripts for pg_buffercache
extension, as pointed out by Robert[1] on the main thread where undo
stuff is being discussed. Attempts to keep subtopics separated have so
far failed, so the thread ostensibly about orphaned file cleanup is
now about undo work allocation, but I figured it'd be useful to
highlight this patch separately as it'll be the first to go in, and
it's needed by your work Shawn.  So I hope we're still on the same
page with this refactoring patch.

One thing I'm not sure about is the TODO message in parsexlog.c's
extractPageInfo() function.

[1] https://www.postgresql.org/message-id/CA%2BTgmob4htT-9Tq7eHG3wS%3DdpKFbQZOyqgSr1iWmV_65Duz6Pw%40mail.gmail.com

-- 
Thomas Munro
https://enterprisedb.com

Вложения

Re: Adding SMGR discriminator to buffer tags

От
Shawn Debnath
Дата:
On Fri, Jul 12, 2019 at 10:16:21AM +1200, Thomas Munro wrote:
> Attempts to keep subtopics separated have so
> far failed, so the thread ostensibly about orphaned file cleanup is
> now about undo work allocation, but I figured it'd be useful to
> highlight this patch separately as it'll be the first to go in, and
> it's needed by your work Shawn.  So I hope we're still on the same
> page with this refactoring patch.

Thanks for reminding me about this thread - I will revisit this again, 
had some more feedback after doing my PoC for the pgCon. Need to find 
that too...

> One thing I'm not sure about is the TODO message in parsexlog.c's
> extractPageInfo() function.
> 
> [1] https://www.postgresql.org/message-id/CA%2BTgmob4htT-9Tq7eHG3wS%3DdpKFbQZOyqgSr1iWmV_65Duz6Pw%40mail.gmail.com

+
+               /* TODO: How should we handle other smgr IDs? */
+               if (smgrid != SMGR_MD)
                        continue;

All files are copied verbatim from source to target except for relation 
files. So this would include slru data and undo data. From what I read 
in the docs, I do not believe we need any special handling for either 
new SMGRs and your current code should suffice.

process_block_change() is very relation specific so if different 
handling is required by different SMGRs, it would make sense to call on 
smgr specific functions instead.

Can't wait for the SMGR_MD to SMGR_REL change :-) It will make 
understanding this code a tad bit easier.

-- 
Shawn Debnath
Amazon Web Services (AWS)



Re: Adding SMGR discriminator to buffer tags

От
Thomas Munro
Дата:
On Fri, Jul 12, 2019 at 11:19 AM Shawn Debnath <sdn@amazon.com> wrote:
> On Fri, Jul 12, 2019 at 10:16:21AM +1200, Thomas Munro wrote:
> +
> +               /* TODO: How should we handle other smgr IDs? */
> +               if (smgrid != SMGR_MD)
>                         continue;
>
> All files are copied verbatim from source to target except for relation
> files. So this would include slru data and undo data. From what I read
> in the docs, I do not believe we need any special handling for either
> new SMGRs and your current code should suffice.
>
> process_block_change() is very relation specific so if different
> handling is required by different SMGRs, it would make sense to call on
> smgr specific functions instead.

Right.  And since undo and slru etc data will be WAL-logged with block
references, it's entirely possible to teach it to scan them properly,
though it's not clear whether it's worth doing that.  Ok, good, TODO
removed.

> Can't wait for the SMGR_MD to SMGR_REL change :-) It will make
> understanding this code a tad bit easier.

Or could we retrofit different words that start with M and D?

Here's a new version of the patch set (ie the first 3 patches in the
undo patch set, and the part that I think you need for slru work),
this time with the pg_buffercache changes as a separate commit since
it's somewhat independent and has a different (partial) reviewer.

I was starting to think about whether I might be able to commit these,
but now I see that this increase in WAL size is probably not
acceptable:

@@ -727,6 +734,8 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
                }
                if (!samerel)
                {
+                       memcpy(scratch, ®buf->smgrid, sizeof(SmgrId));
+                       scratch += sizeof(SmgrId);
                        memcpy(scratch, ®buf->rnode, sizeof(RelFileNode));
                        scratch += sizeof(RelFileNode);
                }

@@ -1220,8 +1221,10 @@ DecodeXLogRecord(XLogReaderState *state,
XLogRecord *record, char **errormsg)
                        }
                        if (!(fork_flags & BKPBLOCK_SAME_REL))
                        {
+                               COPY_HEADER_FIELD(&blk->smgrid, sizeof(SmgrId));
                                COPY_HEADER_FIELD(&blk->rnode,
sizeof(RelFileNode));
                                rnode = &blk->rnode;
+                               smgrid = blk->smgrid;
                        }

That's an enum, so it works out to a word per record.  The obvious way
to avoid increasing the size is shove the SMGR ID into the same space
that holds the forknum.  Unlike BufferTag, where forknum currently
swims in 32 bits which this patch chops in half, XLogRecorBlockHeader
is already crammed into a uint8 fork_flags of which it has only the
lower nibble, and the upper nibble is used for eg BKP_BLOCK_xxx flag
bits, and there isn't even a spare bit to say 'has non-zero SMGR ID'.
Rats.  I suppose I could change it to a byte.  I wonder if one extra
byte per WAL record is acceptable.  Anyone?

-- 
Thomas Munro
https://enterprisedb.com

Вложения

Re: Adding SMGR discriminator to buffer tags

От
Robert Haas
Дата:
On Mon, Jul 15, 2019 at 6:59 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> That's an enum, so it works out to a word per record.  The obvious way
> to avoid increasing the size is shove the SMGR ID into the same space
> that holds the forknum.  Unlike BufferTag, where forknum currently
> swims in 32 bits which this patch chops in half, XLogRecorBlockHeader
> is already crammed into a uint8 fork_flags of which it has only the
> lower nibble, and the upper nibble is used for eg BKP_BLOCK_xxx flag
> bits, and there isn't even a spare bit to say 'has non-zero SMGR ID'.
> Rats.  I suppose I could change it to a byte.  I wonder if one extra
> byte per WAL record is acceptable.  Anyone?

OK, I'll bite: I don't like it. I think this patch is more about how
people feel about things than it is about a technically necessary
change, and I'm absolutely OK with that up to the point where it
starts to inflict measurable costs on our users. Making WAL records
bigger in common cases, even by 1 byte, is a measurable cost.  And
there are a few other minor costs too: we whack around a bunch of
internal APIs, and we force a pg_buffercache version bump.  And I am
of the opinion that none of those costs, big or small, are buying us
anything technically.  I am OK with being convinced otherwise, but
right now I am not convinced.

To set forth my argument: I think magic database OIDs are just fine.
The contrary arguments as I understand them are (1) stuff might break
if there's no matching entry in pg_database, or if there is, and (2)
some hypothetical smgr might need the database OID as a discriminator.
My counter-arguments are (1) we can fix that by writing the
appropriate code and it doesn't even seem very hard and (2) tough
noogies.  To expand on (2) slightly, the proposals on the table do not
need that, the existing smgr does not need that, and there's no reason
to suppose that future proposals would require that either, because
2^32 relfilenodes of up to 2^32 blocks each is a lot, and you
shouldn't need another 2^32 bits.  If someone does come up with a
proposal that needs those bits, perhaps because it lives within a
database rather than being a global object like SLRU or undo data,
maybe it should be a new kind of AM rather than a new smgr.  And if
not, then maybe we should leave it to that hypothetical patch to solve
that hypothetical problem, because right now we're just speculating
that another 32 bits will fix it, which we can't really know, because
if we're hypothesizing the existence of a patch that needs more bits,
we could also hypothesize that it needs more than 32 of them.

If we absolutely have to keep driving down this course, you could
probably steal a bit from the fork number nibble to indicate a
non-default smgr.  Even if there are only 2 bits there, you could use
1 for non-default smgr and 1 for non-default fork number, and then in
the common case of references to the default block of the default
smgr, you wouldn't be spending anything additional ... assuming you
don't count the CPU cycles to encode and decode a more complex WAL
record format.

But how about just using a magic database OID?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Adding SMGR discriminator to buffer tags

От
Thomas Munro
Дата:
On Tue, Jul 16, 2019 at 1:49 AM Robert Haas <robertmhaas@gmail.com> wrote:
> [long form -1]
>
> But how about just using a magic database OID?

This patch was just an experiment based on discussion here:

https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BDE0mmiBZMtZyvwWtgv1sZCniSVhXYsXkvJ_Wo%2B83vvw%40mail.gmail.com

I learned some things.  The main one is that you don't just need space
the buffer tag (which has plenty of spare bits) but also in WAL block
references, and that does seem to be a strike against the idea.  I
don't want lack of agreement here to hold up other work.  So here's
what I propose:

I'll go and commit the simple refactoring bits of this work, which
just move some stuff belonging to md.c out of smgr.c (see attached).
I'll go back to using a magic database OID for the undo log patch set
for now.  We could always reconsider the SMGR discriminator later.
For now I'm not going to consider this question a blocker for the
later undo code when it's eventually ready for commit.

-- 
Thomas Munro
https://enterprisedb.com

Вложения

Re: Adding SMGR discriminator to buffer tags

От
Thomas Munro
Дата:
On Tue, Jul 16, 2019 at 10:49 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> I'll go and commit the simple refactoring bits of this work, which
> just move some stuff belonging to md.c out of smgr.c (see attached).

Pushed.  The rest of that earlier patch set is hereby abandoned (at
least for now).  I'll be posting a new-and-improved undo log patch set
soon, now a couple of patches smaller but back to magic database 9.  I
think I'll probably do that with a new catalog header file that
defines pseudo-database OIDs.

-- 
Thomas Munro
https://enterprisedb.com



Re: Adding SMGR discriminator to buffer tags

От
Shawn Debnath
Дата:
On Tue, Jul 16, 2019 at 10:49:39AM +1200, Thomas Munro wrote:
> On Tue, Jul 16, 2019 at 1:49 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > [long form -1]
> >
> > But how about just using a magic database OID?
> 
> This patch was just an experiment based on discussion here:
> 
>
https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BDE0mmiBZMtZyvwWtgv1sZCniSVhXYsXkvJ_Wo%2B83vvw%40mail.gmail.com
> 
> I learned some things.  The main one is that you don't just need space
> the buffer tag (which has plenty of spare bits) but also in WAL block
> references, and that does seem to be a strike against the idea.  I
> don't want lack of agreement here to hold up other work.  So here's
> what I propose:
> 
> I'll go and commit the simple refactoring bits of this work, which
> just move some stuff belonging to md.c out of smgr.c (see attached).
> I'll go back to using a magic database OID for the undo log patch set
> for now.  We could always reconsider the SMGR discriminator later.
> For now I'm not going to consider this question a blocker for the
> later undo code when it's eventually ready for commit.

Agree that we should move on at this point.  The magic OIDs do not block 
us from moving to this model later if needed.


-- 
Shawn Debnath
Amazon Web Services (AWS)



Re: Adding SMGR discriminator to buffer tags

От
Shawn Debnath
Дата:
On Wed, Jul 17, 2019 at 03:01:47PM +1200, Thomas Munro wrote:
> On Tue, Jul 16, 2019 at 10:49 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > I'll go and commit the simple refactoring bits of this work, which
> > just move some stuff belonging to md.c out of smgr.c (see attached).
> 
> Pushed.  The rest of that earlier patch set is hereby abandoned (at
> least for now).  I'll be posting a new-and-improved undo log patch set
> soon, now a couple of patches smaller but back to magic database 9.  I
> think I'll probably do that with a new catalog header file that
> defines pseudo-database OIDs.

One suggestion, let's expose the magic oids via a dedicated catalog 
pg_smgr so that they can be reserved and accounted for via the scripts 
as discussed in [1]. There were suggestions in the thread to use pg_am, 
but with the revised pg_am [2], it seems we will be stretching the 
meaning of access methods quite a bit, in my opinion, incorrectly.
The benefit of having a dedicated catalog is that we can expose data 
particular to smgrs that do not fit in the access methods scope.


[1] 
https://www.postgresql.org/message-id/20180821184835.GA1032%4060f81dc409fc.ant.amazon.com
[2] https://www.postgresql.org/docs/devel/catalog-pg-am.html

-- 
Shawn Debnath
Amazon Web Services (AWS)