Обсуждение: Proposal to add page headers to SLRU pages

Поиск
Список
Период
Сортировка

Proposal to add page headers to SLRU pages

От
"Li, Yong"
Дата:

Hi all,

 

PostgreSQL currently maintains several data structures in the SLRU cache.  The current SLRU pages do not have any header, so it is impossible to checksum a page and verify its integrity.  It is very difficult to debug issues caused by corrupted SLRU pages.  Also, without a page header, page LSN is tracked in an ad-hoc fashion using LSN groups, which requires additional data structure in the shared memory.  At eBay, we are building on the patch shared by Rishu Bagga in [1], which adds the standard PageHeaderData to each SLRU page.  We believe that adding the standard page header to each SLRU page is the correct approach for the long run.  It adds a checksum to each SLRU page, tracks page LSN as if it is a standard page and eases future page enhancements.

 

The enclosed patch changes the address calculation logic for all 7 SLRUs in the following 6 files:

src/backend/access/transam/clog.c

src/backend/access/transam/commit_ts.c

src/backend/access/transam/multixact.c

src/backend/access/transam/subtrans.c

src/backend/commands/async.c

src/backend/storage/lmgr/predicate.c

 

The patch enables page checksum with changes to the following 2 files:

src/backend/access/transam/slru.c

src/bin/pg_checksums/pg_checksums.c

 

The patch removes the group LSNs defined for each SLRU cache. See changes to:

src/include/access/slru.h

 

The patch adds a few helper macros in the following files:

src/backend/storage/page/bufpage.c

src/include/storage/bufpage.h

 

The patch updates some test cases:

src/bin/pg_resetwal/t/001_basic.pl

src/test/modules/test_slru/test_slru.c

 

I am still working on patching the pg_upgrade.  Just love to hear your thoughts on the idea and the current patch.

 

 

Discussed with: Anton Shyrabokau and Shawn Debnath

 

[1] https://www.postgresql.org/message-id/flat/EFAAC0BE-27E9-4186-B925-79B7C696D5AC%40amazon.com

 

 

Regards,

Yong

 

 

Вложения

Re: Proposal to add page headers to SLRU pages

От
"Andrey M. Borodin"
Дата:
Hi Yong!

+1 to the idea to protect SLRUs from corruption. I'm slightly leaning towards the idea of separating checksums from data pages, but anyway this checksums are better than no checksums.

On 7 Dec 2023, at 10:06, Li, Yong <yoli@ebay.com> wrote:

I am still working on patching the pg_upgrade.  Just love to hear your thoughts on the idea and the current patch.

FWIW you can take upgrade code from this patch [0] doing all the same stuff :)


Best regards, Andrey Borodin.


Re: Proposal to add page headers to SLRU pages

От
Aleksander Alekseev
Дата:
Hi,

> +1 to the idea to protect SLRUs from corruption. I'm slightly leaning towards the idea of separating checksums from
datapages, but anyway this checksums are better than no checksums.
 
>
> On 7 Dec 2023, at 10:06, Li, Yong <yoli@ebay.com> wrote:
>
> I am still working on patching the pg_upgrade.  Just love to hear your thoughts on the idea and the current patch.
>
> FWIW you can take upgrade code from this patch [0] doing all the same stuff :)

Sounds like a half-measure to me. If we really want to go down this
rabbit hole IMO SLRU should be moved to shared buffers as proposed
elsewhere [1].

[1]: https://www.postgresql.org/message-id/EFAAC0BE-27E9-4186-B925-79B7C696D5AC@amazon.com

--
Best regards,
Aleksander Alekseev



Re: Proposal to add page headers to SLRU pages

От
"Debnath, Shawn"
Дата:
>> Sounds like a half-measure to me. If we really want to go down this
>> rabbit hole IMO SLRU should be moved to shared buffers as proposed
>> elsewhere [1].

> Thread that I cited stopped in 2018 for this exact reason. 5 years ago. Is this argument still valid? 
Meanwhile checksums of buffer pages also reside on a page :)

I would love to have seen more progress on the set of threads that proposed
the page header and integration of SLRU into buffer cache. The changes were
large, and unfortunately as a result, it didn't get the detailed review
that it needed. The complex nature of the feature allowed for more branches
to be split from the main thread with alternative approaches. Athough this is
great to see, it did result in the set of core requirements around LSN and
checksum tracking via page headers to not get into PG 16.

What is being proposed now is the simple and core functionality of introducing
page headers to SLRU pages while continuing to be in the SLRU cache. This
allows the whole project to be iterative and reviewers to better reason about
the smaller set of changes being introduced into the codebase.

Once the set of on-disk changes are in, we can follow up on optimizations.
It may be moving to buffer cache or reviewing Dilip's approach in [1], we
will have the option to be flexible in our approach.

[1] https://www.postgresql.org/message-id/flat/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com

Shawn


Re: Proposal to add page headers to SLRU pages

От
Robert Haas
Дата:
On Thu, Dec 7, 2023 at 1:28 PM Debnath, Shawn <sdn@ebay.com> wrote:
> What is being proposed now is the simple and core functionality of introducing
> page headers to SLRU pages while continuing to be in the SLRU cache. This
> allows the whole project to be iterative and reviewers to better reason about
> the smaller set of changes being introduced into the codebase.
>
> Once the set of on-disk changes are in, we can follow up on optimizations.
> It may be moving to buffer cache or reviewing Dilip's approach in [1], we
> will have the option to be flexible in our approach.

I basically agree with this. I don't think we should let the perfect
be the enemy of the good. Shooting down this patch because it doesn't
do everything that we want is a recipe for getting nothing done at
all.

That said, I don't think that the original post on this thread
provides a sufficiently clear and detailed motivation for making this
change. For this to eventually be committed, it's going to need (among
other things) a commit message that articulates a convincing rationale
for whatever changes it makes. Here's what the original email said:

> It adds a checksum to each SLRU page, tracks page LSN as if it is a standard page and eases future page enhancements.

Of those three things, in my opinion, the first is good and the other
two are too vague. I assume that most people who would be likely to
read a commit message would understand the value of pages having
checksums. But I can't immediately think of what the value of tracking
the page LSN as if it were a standard page might be, so that probably
needs more explanation. Similarly, at least one or two of the future
page enhancements that might be eased should be spelled out, and/or
the ways in which they would be made easier should be articulated.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Proposal to add page headers to SLRU pages

От
"Li, Yong"
Дата:

Given so many different approaches were discussed, I have started a wiki to record and collaborate all efforts towards SLRU improvements.  The wiki provides a concise overview of all the ideas discussed and can serve as a portal for all historical discussions.  Currently, the wiki summarizes four recent threads ranging from identifier format change to page header change, to moving SLRU into the main buffer pool, to reduce lock contention on SLRU latches.  We can keep the patch related discussions in this thread and use the wiki as a live document for larger scale collaborations.

 

The wiki page is here:  https://wiki.postgresql.org/wiki/SLRU_improvements

 

Regarding the benefits of this patch, here is a detailed explanation:

  1. Checksum is added to each page, allowing us to verify if a page has been corrupted when read from the disk.
  2. The ad-hoc LSN group structure is removed from the SLRU cache control data and is replaced by the page LSN in the page header. This allows us to use the same WAL protocol as used by pages in the main buffer pool: flush all redo logs up to the page LSN before flushing the page itself. If we move SLRU caches into the main buffer pool, this change fits naturally.
  3. It leaves further optimizations open. We can continue to pursue the goal of moving SLRU into the main buffer pool, or we can follow the lock partition idea. This change by itself does not conflict with either proposal.

 

Also, the patch is now complete and is ready for review.  All check-world tests including tap tests passed with this patch.

 

 

Regards,

Yong

 

From: Robert Haas <robertmhaas@gmail.com>
Date: Friday, December 8, 2023 at 03:51
To: Debnath, Shawn <sdn@ebay.com>
Cc: Andrey Borodin <x4mmm@yandex-team.ru>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Aleksander Alekseev <aleksander@timescale.com>, Li, Yong <yoli@ebay.com>, Shyrabokau, Anton <antons@ebay.com>, Bagga, Rishu <bagrishu@amazon.com>
Subject: Re: Proposal to add page headers to SLRU pages

External Email

On Thu, Dec 7, 2023 at 1:28 PM Debnath, Shawn <sdn@ebay.com> wrote:
> What is being proposed now is the simple and core functionality of introducing
> page headers to SLRU pages while continuing to be in the SLRU cache. This
> allows the whole project to be iterative and reviewers to better reason about
> the smaller set of changes being introduced into the codebase.
>
> Once the set of on-disk changes are in, we can follow up on optimizations.
> It may be moving to buffer cache or reviewing Dilip's approach in [1], we
> will have the option to be flexible in our approach.

I basically agree with this. I don't think we should let the perfect
be the enemy of the good. Shooting down this patch because it doesn't
do everything that we want is a recipe for getting nothing done at
all.

That said, I don't think that the original post on this thread
provides a sufficiently clear and detailed motivation for making this
change. For this to eventually be committed, it's going to need (among
other things) a commit message that articulates a convincing rationale
for whatever changes it makes. Here's what the original email said:

> It adds a checksum to each SLRU page, tracks page LSN as if it is a standard page and eases future page enhancements.

Of those three things, in my opinion, the first is good and the other
two are too vague. I assume that most people who would be likely to
read a commit message would understand the value of pages having
checksums. But I can't immediately think of what the value of tracking
the page LSN as if it were a standard page might be, so that probably
needs more explanation. Similarly, at least one or two of the future
page enhancements that might be eased should be spelled out, and/or
the ways in which they would be made easier should be articulated.

--
Robert Haas
EDB: https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.com%2F&data=05%7C01%7Cyoli%40ebay.com%7C2cad2fe1de8a40f3167608dbf75de73c%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C638375754901646398%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=hkccuGfpt1%2BKxuhk%2BJt%2F3HyYuJqQHYfizib76%2F9HtUU%3D&reserved=0

Вложения

Re: Proposal to add page headers to SLRU pages

От
"Bagga, Rishu"
Дата:
On Thu, Dec 8, 2023 at 1:36 AM Li, Yong <yoli@ebay.com> wrote:

>Given so many different approaches were discussed, I have started a  
>wiki to record and collaborate all efforts towards SLRU 
>improvements.  The wiki provides a concise overview of all the ideas 
>discussed and can serve as a portal for all historical 
>discussions.  Currently, the wiki summarizes four recent threads 
>ranging from identifier format change to page header change, to moving 
>SLRU into the main buffer pool, to reduce lock contention on SLRU 
>latches.  We can keep the patch related discussions in this thread and 
>use the wiki as a live document for larger scale collaborations.

>The wiki page is 
>here:  https://wiki.postgresql.org/wiki/SLRU_improvements

>Regarding the benefits of this patch, here is a detailed explanation:

1.    Checksum is added to each page, allowing us to verify if a page has
    been corrupted when read from the disk.
2.    The ad-hoc LSN group structure is removed from the SLRU cache 
    control data and is replaced by the page LSN in the page header. 
    This allows us to use the same WAL protocol as used by pages in the 
    main buffer pool: flush all redo logs up to the page LSN before 
    flushing the page itself. If we move SLRU caches into the main 
    buffer pool, this change fits naturally.
3.    It leaves further optimizations open. We can continue to pursue the 
    goal of moving SLRU into the main buffer pool, or we can follow the 
    lock partition idea. This change by itself does not conflict with 
    either proposal.

>Also, the patch is now complete and is ready for review.  All check-
>world tests including tap tests passed with this patch. 




Hi Yong, 

I agree we should break the effort for the SLRU optimization into 
smaller chunks after having worked on some of the bigger patches and 
facing difficulty in making progress that way.

The patch looks mostly good to me; though one thing that I thought about 
differently with the upgrade portion is where we should keep the logic 
of re-writing the CLOG files.

There is a precedent introduced back in Postgres v9.6 in making on disk 
page format changes across different in visibility map: [1]

code comment: 
 * In versions of PostgreSQL prior to catversion 201603011, PostgreSQL's
 * visibility map included one bit per heap page; it now includes two.
 * When upgrading a cluster from before that time to a current PostgreSQL
 * version, we could refuse to copy visibility maps from the old cluster
 * to the new cluster; the next VACUUM would recreate them, but at the
 * price of scanning the entire table.  So, instead, we rewrite the old
 * visibility maps in the new format.  



This work is being done in file.c – it seems to me the proper way to 
proceed would be to continue writing on-disk upgrade logic here.


Besides that this looks good to me, would like to hear what others have to say.


Thanks, 

Rishu Bagga 

Amazon Web Services (AWS)

[1] https://github.com/postgres/postgres/commit/7087166a88fe0c04fc6636d0d6d6bea1737fc1fb


Re: Proposal to add page headers to SLRU pages

От
"Li, Yong"
Дата:
> This work is being done in file.c – it seems to me the proper way to
> proceed would be to continue writing on-disk upgrade logic here.

> Besides that this looks good to me, would like to hear what others have to say.

Thank you, Rishu for taking time to review the code.  I've updated the patch
and moved the on-disk upgrade logic to pg_upgrade/file.c.

I have also added this thread to the current Commitfest and hope this patch
will be  part of the 17 release.

The commitfest link:
https://commitfest.postgresql.org/46/4709/


Regards,
Yong,



Вложения

Re: Proposal to add page headers to SLRU pages

От
Aleksander Alekseev
Дата:
Hi,

> I have also added this thread to the current Commitfest and hope this patch
> will be  part of the 17 release.
>
> The commitfest link:
> https://commitfest.postgresql.org/46/4709/

Thanks for the updated patch.

cfbot seems to have some complaints regarding compiler warnings and
also building the patch on Windows:

http://cfbot.cputube.org/


-- 
Best regards,
Aleksander Alekseev



Re: Proposal to add page headers to SLRU pages

От
"Li, Yong"
Дата:

> On Jan 2, 2024, at 19:35, Aleksander Alekseev <aleksander@timescale.com> wrote:
>
> Thanks for the updated patch.
>
> cfbot seems to have some complaints regarding compiler warnings and
> also building the patch on Windows:
>
> http://cfbot.cputube.org/

Thanks for the information.  Here is the updated patch.

Regards,
Yong


Вложения

Re: Proposal to add page headers to SLRU pages

От
"Li, Yong"
Дата:
Rebase the patch against the latest HEAD.

Regards,
Yong

Вложения

Re: Proposal to add page headers to SLRU pages

От
"Li, Yong"
Дата:
Rebase the patch against the latest HEAD.

Regards,
Yong



Вложения

Re: Proposal to add page headers to SLRU pages

От
Alvaro Herrera
Дата:
Hello,

I suppose this is important to do if we ever want to move SLRUs into
shared buffers.  However, I wonder about the extra time this adds to
pg_upgrade.  Is this something we should be concerned about?  Is there
any measurement/estimates to tell us how long this would be?  Right now,
if you use a cloning strategy for the data files, the upgrade should be
pretty quick ... but the amount of data in pg_xact and pg_multixact
could be massive, and the rewrite is likely to take considerable time.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Las cosas son buenas o malas segun las hace nuestra opinión" (Lisias)



Re: Proposal to add page headers to SLRU pages

От
Stephen Frost
Дата:
Greetings,

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:
> I suppose this is important to do if we ever want to move SLRUs into
> shared buffers.  However, I wonder about the extra time this adds to
> pg_upgrade.  Is this something we should be concerned about?  Is there
> any measurement/estimates to tell us how long this would be?  Right now,
> if you use a cloning strategy for the data files, the upgrade should be
> pretty quick ... but the amount of data in pg_xact and pg_multixact
> could be massive, and the rewrite is likely to take considerable time.

While I definitely agree that there should be some consideration of
this concern, it feels on-par with the visibility-map rewrite which was
done previously.  Larger systems will likely have more to deal with than
smaller systems, but it's still a relatively small portion of the data
overall.

The benefit of this change, beyond just the possibility of moving them
into shared buffers some day in the future, is that this would mean that
SLRUs will have checksums (if the cluster has them enabled).  That
benefit strikes me as well worth the cost of the rewrite taking some
time and the minor loss of space due to the page header.

Would it be useful to consider parallelizing this work?  There's already
parts of pg_upgrade which can be parallelized and so this isn't,
hopefully, a big lift to add, but I'm not sure if there's enough work
being done here CPU-wise, compared to the amount of IO being done, to
have it make sense to run it in parallel.  Might be worth looking into
though, at least, as disks have gotten to be quite fast.

Thanks!

Stephen

Вложения

Re: Proposal to add page headers to SLRU pages

От
"Li, Yong"
Дата:
> On Mar 7, 2024, at 03:09, Stephen Frost <sfrost@snowman.net> wrote:
> 
> External Email
> 
> From: Stephen Frost <sfrost@snowman.net>
> Subject: Re: Proposal to add page headers to SLRU pages
> Date: March 7, 2024 at 03:09:59 GMT+8
> To: Alvaro Herrera <alvherre@alvh.no-ip.org>
> Cc: "Li, Yong" <yoli@ebay.com>, Aleksander Alekseev <aleksander@timescale.com>, PostgreSQL Hackers
<pgsql-hackers@lists.postgresql.org>,"Bagga, Rishu" <bagrishu@amazon.com>, Robert Haas <robertmhaas@gmail.com>,
"Debnath,Shawn" <sdn@ebay.com>, Andrey Borodin <x4mmm@yandex-team.ru>, "Shyrabokau, Anton" <antons@ebay.com>
 
> 
> 
> Greetings,
> 
> * Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:
>> I suppose this is important to do if we ever want to move SLRUs into
>> shared buffers.  However, I wonder about the extra time this adds to
>> pg_upgrade.  Is this something we should be concerned about?  Is there
>> any measurement/estimates to tell us how long this would be?  Right now,
>> if you use a cloning strategy for the data files, the upgrade should be
>> pretty quick ... but the amount of data in pg_xact and pg_multixact
>> could be massive, and the rewrite is likely to take considerable time.
> 
> While I definitely agree that there should be some consideration of
> this concern, it feels on-par with the visibility-map rewrite which was
> done previously.  Larger systems will likely have more to deal with than
> smaller systems, but it's still a relatively small portion of the data
> overall.
> 
> The benefit of this change, beyond just the possibility of moving them
> into shared buffers some day in the future, is that this would mean that
> SLRUs will have checksums (if the cluster has them enabled).  That
> benefit strikes me as well worth the cost of the rewrite taking some
> time and the minor loss of space due to the page header.
> 
> Would it be useful to consider parallelizing this work?  There's already
> parts of pg_upgrade which can be parallelized and so this isn't,
> hopefully, a big lift to add, but I'm not sure if there's enough work
> being done here CPU-wise, compared to the amount of IO being done, to
> have it make sense to run it in parallel.  Might be worth looking into
> though, at least, as disks have gotten to be quite fast.
> 
> Thanks!
> 
> Stephen
> 


Thank Alvaro and Stephen for your thoughtful comments.

I did a quick benchmark regarding pg_upgrade time, and here are the results.

Hardware spec:
MacBook Pro M1 Max - 10 cores, 64GB memory, 1TB Apple SSD

Operating system:
macOS 14.3.1

Complier:
Apple clang 15.0.0

Compiler optimization level: -O2

====
PG setups:
Old cluster:  PG 16.2 release (source build)
New cluster:  PG Git HEAD plus the patch (source build)

====
Benchmark steps:

1. Initdb for PG 16.2.
2. Initdb for PG HEAD.
3. Run pg_upgrade on the above empty database, and time the overall wall clock time.
4. In the old cluster, write 512MB all-zero dummy segment files (2048 segments) under pg_xact.
5. In the old cluster, write 512MB all-zero dummy segment files under pg_multixact/members.
6. In the old cluster, write 512MB all-zero dummy segment files under pg_multixact/offsets.
7. Purge the OS page cache.
7. Run pg_upgrade again, and time the overall wall clock time.

====
Test result:

On the empty database, pg_upgrade took 4.8 seconds to complete.

With 1.5GB combined SLRU data to convert, pg_upgrade took 11.5 seconds to complete.

It took 6.7 seconds to convert 1.5GB SLRU files for pg_upgrade.

====

For clog, 2048 segments can host about 2 billion transactions, right at the limit for wraparound.
That’s the maximum we can have.  2048 segments are also big for pg_multixact SLRUs.

Therefore, on a modern hardware, in the worst case, pg_upgrade will run for 7 seconds longer.


Regards,

Yong

Re: Proposal to add page headers to SLRU pages

От
Stephen Frost
Дата:
Greetings,

* Li, Yong (yoli@ebay.com) wrote:
> > On Mar 7, 2024, at 03:09, Stephen Frost <sfrost@snowman.net> wrote:
> > * Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:
> >> I suppose this is important to do if we ever want to move SLRUs into
> >> shared buffers.  However, I wonder about the extra time this adds to
> >> pg_upgrade.  Is this something we should be concerned about?  Is there
> >> any measurement/estimates to tell us how long this would be?  Right now,
> >> if you use a cloning strategy for the data files, the upgrade should be
> >> pretty quick ... but the amount of data in pg_xact and pg_multixact
> >> could be massive, and the rewrite is likely to take considerable time.
> >
> > While I definitely agree that there should be some consideration of
> > this concern, it feels on-par with the visibility-map rewrite which was
> > done previously.  Larger systems will likely have more to deal with than
> > smaller systems, but it's still a relatively small portion of the data
> > overall.
> >
> > The benefit of this change, beyond just the possibility of moving them
> > into shared buffers some day in the future, is that this would mean that
> > SLRUs will have checksums (if the cluster has them enabled).  That
> > benefit strikes me as well worth the cost of the rewrite taking some
> > time and the minor loss of space due to the page header.
> >
> > Would it be useful to consider parallelizing this work?  There's already
> > parts of pg_upgrade which can be parallelized and so this isn't,
> > hopefully, a big lift to add, but I'm not sure if there's enough work
> > being done here CPU-wise, compared to the amount of IO being done, to
> > have it make sense to run it in parallel.  Might be worth looking into
> > though, at least, as disks have gotten to be quite fast.
>
> Thank Alvaro and Stephen for your thoughtful comments.
>
> I did a quick benchmark regarding pg_upgrade time, and here are the results.

> For clog, 2048 segments can host about 2 billion transactions, right at the limit for wraparound.
> That’s the maximum we can have.  2048 segments are also big for pg_multixact SLRUs.
>
> Therefore, on a modern hardware, in the worst case, pg_upgrade will run for 7 seconds longer.

Thanks for testing!  That strikes me as perfectly reasonable and seems
unlikely that we'd get much benefit from parallelizing it, so I'd say it
makes sense to keep this code simple.

Thanks again!

Stephen

Вложения

Re: Proposal to add page headers to SLRU pages

От
Alvaro Herrera
Дата:
On 2024-Mar-08, Stephen Frost wrote:

> Thanks for testing!  That strikes me as perfectly reasonable and seems
> unlikely that we'd get much benefit from parallelizing it, so I'd say it
> makes sense to keep this code simple.

Okay, agreed, that amount of time sounds reasonable to me too; but I
don't want to be responsible for this at least for pg17.  If some other
committer wants to take it, be my guest.  However, I think this is
mostly a foundation for building other things on top, so committing
during the last commitfest is perhaps not very useful.


Another aspect of this patch is the removal of the LSN groups.  There's
an explanation of the LSN groups in src/backend/access/transam/README,
and while this patch removes the LSN group feature, it doesn't update
that text.  That's already a problem which needs fixed, but the text
says

: In fact, we store more than one LSN for each clog page.  This relates to
: the way we set transaction status hint bits during visibility tests.
: We must not set a transaction-committed hint bit on a relation page and
: have that record make it to disk prior to the WAL record of the commit.
: Since visibility tests are normally made while holding buffer share locks,
: we do not have the option of changing the page's LSN to guarantee WAL
: synchronization.  Instead, we defer the setting of the hint bit if we have
: not yet flushed WAL as far as the LSN associated with the transaction.
: This requires tracking the LSN of each unflushed async commit.
: It is convenient to associate this data with clog buffers: because we
: will flush WAL before writing a clog page, we know that we do not need
: to remember a transaction's LSN longer than the clog page holding its
: commit status remains in memory.  However, the naive approach of storing
: an LSN for each clog position is unattractive: the LSNs are 32x bigger
: than the two-bit commit status fields, and so we'd need 256K of
: additional shared memory for each 8K clog buffer page.  We choose
: instead to store a smaller number of LSNs per page, where each LSN is
: the highest LSN associated with any transaction commit in a contiguous
: range of transaction IDs on that page.  This saves storage at the price
: of some possibly-unnecessary delay in setting transaction hint bits.

In the new code we effectively store only one LSN per page, which I
understand is strictly worse.  Maybe the idea of doing away with these
LSN groups should be reconsidered ... unless I completely misunderstand
the whole thing.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: Proposal to add page headers to SLRU pages

От
Jeff Davis
Дата:
On Fri, 2024-03-08 at 13:58 +0100, Alvaro Herrera wrote:
> In the new code we effectively store only one LSN per page, which I
> understand is strictly worse.

To quote from the README:

"Instead, we defer the setting of the hint bit if we have not yet
flushed WAL as far as the LSN associated with the transaction. This
requires tracking the LSN of each unflushed async commit."

In other words, the problem the group_lsns are solving is that we can't
set the hint bit on a tuple until the commit record for that
transaction has actually been flushed. For ordinary sync commit, that's
fine, because the CLOG bit isn't set until after the commit record is
flushed. But for async commit, the CLOG may be updated before the WAL
is flushed, and group_lsns are one way to track the right information
to hold off updating the hint bits.

"It is convenient to associate this data with clog buffers: because we
will flush WAL before writing a clog page, we know that we do not need
to remember a transaction's LSN longer than the clog page holding its
commit status remains in memory."

It's not clear to me that it is so convenient, if it's preventing the
SLRU from fitting in with the rest of the system architecture.

"The worst case is where a sync-commit xact shares a cached LSN with an
async-commit xact that commits a bit later; even though we paid to sync
the first xact to disk, we won't be able to hint its outputs until the
second xact is sync'd, up to three walwriter cycles later."

Perhaps we can revisit alternatives to the group_lsn? If we accept
Yong's proposal, and the SLRU has a normal LSN and was used in the
normal way, we would just need some kind of secondary structure to hold
a mapping from XID->LSN only for async transactions.

The characteristics we are looking for in this secondary structure are:

  1. cheap to test if it's empty, so it doesn't interfere with a purely
sync workload at all
  2. expire old entries (where the LSN has already been flushed)
cheaply enough so the data structure doesn't bloat
  3. look up an LSN given an XID cheaply enough that it doesn't
interfere with setting hint bits

Making a better secondary structure seems doable to me. Just to
brainstorm:

  * Have an open-addressed hash table mapping async XIDs to their
commit LSN. If you have a hash collision, opportunistically see if the
entry is old and can be removed. Try K probes, and if they are all
recent, then you need to XLogFlush. The table could get pretty big,
because it needs to hold enough async transactions for a wal writer
cycle or two, but it seems reasonable to make async workloads pay that
memory cost.

  * Another idea, if the size of the structure is a problem, is to
group K async xids into a bloom filter that points at a single LSN.
When more transactions come along, create another bloom filter for the
next K async xids. This could interfere with setting hint bits for sync
xids if the bloom filters are undersized, but that doesn't seem like a
big problem.

Regards,
    Jeff Davis




Re: Proposal to add page headers to SLRU pages

От
Jeff Davis
Дата:
On Fri, 2024-03-08 at 12:39 -0800, Jeff Davis wrote:
> Making a better secondary structure seems doable to me. Just to
> brainstorm:

We can also keep the group_lsns, and then just update both the CLOG
page LSN and the group_lsn when setting the transaction status. The
former would be used for all of the normal WAL-related stuff, and the
latter would be used by TransactionIdGetStatus() to return the more
precise LSN for that group.

Regards,
    Jeff Davis




Re: Proposal to add page headers to SLRU pages

От
Jeff Davis
Дата:
On Wed, 2024-03-06 at 12:01 +0000, Li, Yong wrote:
> Rebase the patch against the latest HEAD.

The upgrade logic could use more comments explaining what's going on
and why. As I understand it, it's a one-time conversion that needs to
happen between 16 and 17. Is that right?

Was the way CLOG is upgraded already decided in some earlier
discussion?

Given that the CLOG is append-only and gets truncated occasionally, I
wonder whether we can just have some marker that xids before some
number are the old CLOG, and xids beyond that number are in the new
CLOG. I'm not necessarily suggesting that; just an idea.

Regards,
    Jeff Davis




Re: Proposal to add page headers to SLRU pages

От
"Li, Yong"
Дата:
> On Mar 9, 2024, at 05:22, Jeff Davis <pgsql@j-davis.com> wrote:
> 
> External Email
> 
> On Wed, 2024-03-06 at 12:01 +0000, Li, Yong wrote:
>> Rebase the patch against the latest HEAD.
> 
> The upgrade logic could use more comments explaining what's going on
> and why. As I understand it, it's a one-time conversion that needs to
> happen between 16 and 17. Is that right?
> 
> Regards,
>        Jeff Davis
> 

> In the new code we effectively store only one LSN per page, which I
> understand is strictly worse.  Maybe the idea of doing away with these
> LSN groups should be reconsidered ... unless I completely misunderstand
> the whole thing.
> 
> --
> Álvaro Herrera         PostgreSQL Developer  —


Thanks for the comments on LSN groups and pg_upgrade.

I have updated the patch to address both comments:
- The clog LSN group has been brought back.
  Now the page LSN on each clog page is used for honoring the write-ahead rule
  and it is always the highest LSN of all the LSN groups on the page.
  The LSN groups are used by TransactionIdGetStatus() as before.
- New comments have been added to pg_upgrade to mention the SLRU
  page header change as the reason for upgrading clog files.

Regards,
Yong
Вложения

Re: Proposal to add page headers to SLRU pages

От
Jeff Davis
Дата:
On Mon, 2024-03-11 at 10:01 +0000, Li, Yong wrote:
> - The clog LSN group has been brought back.
>   Now the page LSN on each clog page is used for honoring the write-
> ahead rule
>   and it is always the highest LSN of all the LSN groups on the page.
>   The LSN groups are used by TransactionIdGetStatus() as before.

I like where this is going.

Álvaro, do you still see a problem with this approach?

> - New comments have been added to pg_upgrade to mention the SLRU
>   page header change as the reason for upgrading clog files.

That seems reasonable, but were any alternatives discussed? Do we have
consensus that this is the right thing to do?

And if we use this approach, is there extra validation or testing that
can be done?

Regards,
    Jeff Davis




Re: Proposal to add page headers to SLRU pages

От
"Li, Yong"
Дата:
>> - New comments have been added to pg_upgrade to mention the SLRU
>>  page header change as the reason for upgrading clog files.
> 
> That seems reasonable, but were any alternatives discussed? Do we have
> consensus that this is the right thing to do?

In general, there are two approaches. Either we convert the existing clog files,
or we don’t.  The patch chooses to convert.

If we don’t, then the clog file code must be able to handle both formats. For,
XIDs in the range where the clog is written in the old format, segment and offset
computation must be done in one way, and for XIDs in a different range, it must
be computed in a different way.  To avoid changing the format in the middle of a
page, which must not happen, the new format must start from a clean page, 
possibly in a clean new segment.  If the database is extremely small and has only
a few transactions on the first page of clog, then we must either convert the whole
page (effectively the whole clog file), or we must skip the rest of the XIDs on the
page and ask the database to start from XIDs on the second page on restart.
Also, we need to consider where to store the cut-off XID and when to remove it.
All these details feel very complex and error prone to me.  Performing a one-time
conversion is the most efficient and straightforward approach to me. 

> 
> And if we use this approach, is there extra validation or testing that
> can be done?
> 
> Regards,
>        Jeff Davis

Unfortunately, the test requires a setup of two different versions of PG. I am not
aware of an existing test infrastructure which can run automated tests using two
PGs. I did manually verify the output of pg_upgrade. 


Regards,
Yong