Обсуждение: pg_upgrade's interaction with pg_resetwal seems confusing

Поиск
Список
Период
Сортировка

pg_upgrade's interaction with pg_resetwal seems confusing

От
Amit Kapila
Дата:
In pg_upgrade, we reset WAL archives (remove WAL), transaction id,
etc. in copy_xact_xlog_xid() for the new cluster. Then, we create new
objects in the new cluster, and again towards the end of the upgrade
we invoke pg_resetwal with the -o option to reset the next OID. Now,
along with resetting the OID, pg_resetwal will again reset the WAL. I
am not sure if that is intentional and it may not have any problem
today except that it seems redundant to reset WAL again.

However, this can be problematic for the ongoing work to upgrade the
logical replication slots [1]. We want to create/migrate logical slots
in the new cluster before calling the final pg_resetwal (which resets
the next OID) to ensure that there is no new WAL inserted by
background processes or otherwise between resetwal location and
creation of slots. So, we thought that we would compute the next WAL
location by doing the computation similar to what pg_resetwal does to
reset a new WAL location, create slots using that location, and pass
the same to pg_resetwal using the -l option. However, that doesn't
work because pg_resetwal uses the passed -l option only as a hint but
can reset the later WAL if present which can remove the WAL position
we have decided as restart_lsn (point to start reading WAL) for slots.
So, we came up with another idea that we will reset the WAL just
before creating slots and use that location to create slots and then
invent a new option in pg_resetwal where it won't reset the WAL.

Now, as mentioned in the first paragraph, it seems we anyway don't
need to reset the WAL at the end when setting the next OID for the new
cluster with the -o option. If that is true, then I think even without
slots work it will be helpful to have such an option in pg_resetwal.

Thoughts?

[1] - https://commitfest.postgresql.org/45/4273/

-- 
With Regards,
Amit Kapila.



Re: pg_upgrade's interaction with pg_resetwal seems confusing

От
Robert Haas
Дата:
On Thu, Oct 12, 2023 at 7:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> Now, as mentioned in the first paragraph, it seems we anyway don't
> need to reset the WAL at the end when setting the next OID for the new
> cluster with the -o option. If that is true, then I think even without
> slots work it will be helpful to have such an option in pg_resetwal.
>
> Thoughts?

I wonder if we should instead provide a way to reset the OID counter
with a function call inside the database, gated by IsBinaryUpgrade.
Having something like pg_resetwal --but-dont-actually-reset-the-wal
seems both self-contradictory and vulnerable to abuse that we might be
better off not inviting.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_upgrade's interaction with pg_resetwal seems confusing

От
Amit Kapila
Дата:
On Fri, Oct 13, 2023 at 12:00 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Oct 12, 2023 at 7:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Now, as mentioned in the first paragraph, it seems we anyway don't
> > need to reset the WAL at the end when setting the next OID for the new
> > cluster with the -o option. If that is true, then I think even without
> > slots work it will be helpful to have such an option in pg_resetwal.
> >
> > Thoughts?
>
> I wonder if we should instead provide a way to reset the OID counter
> with a function call inside the database, gated by IsBinaryUpgrade.
>

I think the challenge in doing so would be that when the server is
running, a concurrent checkpoint can also update the OID counter value
in the control file. See below code:

CreateCheckPoint()
{
...
LWLockAcquire(OidGenLock, LW_SHARED);
checkPoint.nextOid = ShmemVariableCache->nextOid;
if (!shutdown)
checkPoint.nextOid += ShmemVariableCache->oidCount;
LWLockRelease(OidGenLock);
...
UpdateControlFile()
...
}

Now, we can try to pass some startup options like checkpoint_timeout
with a large value to ensure that checkpoint won't interfere but not
sure if that would be bulletproof. Instead, how about allowing
pg_upgrade to update the control file of the new cluster (with the
required value of OID) following the same method as pg_resetwal does
in RewriteControlFile()?

> Having something like pg_resetwal --but-dont-actually-reset-the-wal
> seems both self-contradictory and vulnerable to abuse that we might be
> better off not inviting.
>

Fair point.

--
With Regards,
Amit Kapila.



Re: pg_upgrade's interaction with pg_resetwal seems confusing

От
Dilip Kumar
Дата:
On Fri, Oct 13, 2023 at 9:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Oct 13, 2023 at 12:00 AM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Thu, Oct 12, 2023 at 7:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > Now, as mentioned in the first paragraph, it seems we anyway don't
> > > need to reset the WAL at the end when setting the next OID for the new
> > > cluster with the -o option. If that is true, then I think even without
> > > slots work it will be helpful to have such an option in pg_resetwal.
> > >
> > > Thoughts?
> >
> > I wonder if we should instead provide a way to reset the OID counter
> > with a function call inside the database, gated by IsBinaryUpgrade.
> >
>
> I think the challenge in doing so would be that when the server is
> running, a concurrent checkpoint can also update the OID counter value
> in the control file. See below code:
>
> CreateCheckPoint()
> {
> ...
> LWLockAcquire(OidGenLock, LW_SHARED);
> checkPoint.nextOid = ShmemVariableCache->nextOid;
> if (!shutdown)
> checkPoint.nextOid += ShmemVariableCache->oidCount;
> LWLockRelease(OidGenLock);
> ...
> UpdateControlFile()
> ...
> }
>

But is this a problem? basically, we will set the
ShmemVariableCache->nextOid counter to the value that we want in the
IsBinaryUpgrade-specific function.  And then the shutdown checkpoint
will flush that value to the control file and that is what we want no?
 I mean instead of resetwal directly modifying the control file we
will modify that value in the server using the binary_upgrade function
and then have that value flush to the disk by shutdown checkpoint.


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: pg_upgrade's interaction with pg_resetwal seems confusing

От
Amit Kapila
Дата:
On Fri, Oct 13, 2023 at 10:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Oct 13, 2023 at 9:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Oct 13, 2023 at 12:00 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > >
> > > On Thu, Oct 12, 2023 at 7:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > Now, as mentioned in the first paragraph, it seems we anyway don't
> > > > need to reset the WAL at the end when setting the next OID for the new
> > > > cluster with the -o option. If that is true, then I think even without
> > > > slots work it will be helpful to have such an option in pg_resetwal.
> > > >
> > > > Thoughts?
> > >
> > > I wonder if we should instead provide a way to reset the OID counter
> > > with a function call inside the database, gated by IsBinaryUpgrade.
> > >
> >
> > I think the challenge in doing so would be that when the server is
> > running, a concurrent checkpoint can also update the OID counter value
> > in the control file. See below code:
> >
> > CreateCheckPoint()
> > {
> > ...
> > LWLockAcquire(OidGenLock, LW_SHARED);
> > checkPoint.nextOid = ShmemVariableCache->nextOid;
> > if (!shutdown)
> > checkPoint.nextOid += ShmemVariableCache->oidCount;
> > LWLockRelease(OidGenLock);
> > ...
> > UpdateControlFile()
> > ...
> > }
> >
>
> But is this a problem? basically, we will set the
> ShmemVariableCache->nextOid counter to the value that we want in the
> IsBinaryUpgrade-specific function.  And then the shutdown checkpoint
> will flush that value to the control file and that is what we want no?
>

I think that can work. Basically, we need to do something like what
SetNextObjectId() does and then let the shutdown checkpoint update the
actual value in the control file.

>  I mean instead of resetwal directly modifying the control file we
> will modify that value in the server using the binary_upgrade function
> and then have that value flush to the disk by shutdown checkpoint.
>

True, the alternative to consider is to let pg_upgrade update the
control file by itself with the required value of OID. The point I am
slightly worried about doing via server-side function is that some
online and or shutdown checkpoint can update other values in the
control file as well whereas if we do via pg_upgrade, we can have
better control over just updating the OID.

--
With Regards,
Amit Kapila.



Re: pg_upgrade's interaction with pg_resetwal seems confusing

От
Dilip Kumar
Дата:
On Fri, Oct 13, 2023 at 11:07 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > But is this a problem? basically, we will set the
> > ShmemVariableCache->nextOid counter to the value that we want in the
> > IsBinaryUpgrade-specific function.  And then the shutdown checkpoint
> > will flush that value to the control file and that is what we want no?
> >
>
> I think that can work. Basically, we need to do something like what
> SetNextObjectId() does and then let the shutdown checkpoint update the
> actual value in the control file.

Right.

> >  I mean instead of resetwal directly modifying the control file we
> > will modify that value in the server using the binary_upgrade function
> > and then have that value flush to the disk by shutdown checkpoint.
> >
>
> True, the alternative to consider is to let pg_upgrade update the
> control file by itself with the required value of OID. The point I am
> slightly worried about doing via server-side function is that some
> online and or shutdown checkpoint can update other values in the
> control file as well whereas if we do via pg_upgrade, we can have
> better control over just updating the OID.

Yeah, thats a valid point.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: pg_upgrade's interaction with pg_resetwal seems confusing

От
Amit Kapila
Дата:
On Fri, Oct 13, 2023 at 2:03 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Oct 13, 2023 at 11:07 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > But is this a problem? basically, we will set the
> > > ShmemVariableCache->nextOid counter to the value that we want in the
> > > IsBinaryUpgrade-specific function.  And then the shutdown checkpoint
> > > will flush that value to the control file and that is what we want no?
> > >
> >
> > I think that can work. Basically, we need to do something like what
> > SetNextObjectId() does and then let the shutdown checkpoint update the
> > actual value in the control file.
>
> Right.
>
> > >  I mean instead of resetwal directly modifying the control file we
> > > will modify that value in the server using the binary_upgrade function
> > > and then have that value flush to the disk by shutdown checkpoint.
> > >
> >
> > True, the alternative to consider is to let pg_upgrade update the
> > control file by itself with the required value of OID. The point I am
> > slightly worried about doing via server-side function is that some
> > online and or shutdown checkpoint can update other values in the
> > control file as well whereas if we do via pg_upgrade, we can have
> > better control over just updating the OID.
>
> Yeah, thats a valid point.
>

But OTOH, just updating the control file via pg_upgrade may not be
sufficient because we should keep the shutdown checkpoint record also
updated with that value. See how pg_resetwal achieves it via
RewriteControlFile() and WriteEmptyXLOG(). So, considering both the
approaches it seems better to go with a server-side function as Robert
suggested.

--
With Regards,
Amit Kapila.



RE: pg_upgrade's interaction with pg_resetwal seems confusing

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear hackers,

> >
> > > >  I mean instead of resetwal directly modifying the control file we
> > > > will modify that value in the server using the binary_upgrade function
> > > > and then have that value flush to the disk by shutdown checkpoint.
> > > >
> > >
> > > True, the alternative to consider is to let pg_upgrade update the
> > > control file by itself with the required value of OID. The point I am
> > > slightly worried about doing via server-side function is that some
> > > online and or shutdown checkpoint can update other values in the
> > > control file as well whereas if we do via pg_upgrade, we can have
> > > better control over just updating the OID.
> >
> > Yeah, thats a valid point.
> >
> 
> But OTOH, just updating the control file via pg_upgrade may not be
> sufficient because we should keep the shutdown checkpoint record also
> updated with that value. See how pg_resetwal achieves it via
> RewriteControlFile() and WriteEmptyXLOG(). So, considering both the
> approaches it seems better to go with a server-side function as Robert
> suggested.

Based on these discussion, I implemented a server-side approach. An attached patch
adds a upgrade function which sets ShmemVariableCache->nextOid. It is called at
the end of the upgrade. Comments and name of issue_warnings_and_set_wal_level()
is also updated because they become outdated.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

Re: pg_upgrade's interaction with pg_resetwal seems confusing

От
vignesh C
Дата:
On Fri, 13 Oct 2023 at 17:15, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear hackers,
>
> > >
> > > > >  I mean instead of resetwal directly modifying the control file we
> > > > > will modify that value in the server using the binary_upgrade function
> > > > > and then have that value flush to the disk by shutdown checkpoint.
> > > > >
> > > >
> > > > True, the alternative to consider is to let pg_upgrade update the
> > > > control file by itself with the required value of OID. The point I am
> > > > slightly worried about doing via server-side function is that some
> > > > online and or shutdown checkpoint can update other values in the
> > > > control file as well whereas if we do via pg_upgrade, we can have
> > > > better control over just updating the OID.
> > >
> > > Yeah, thats a valid point.
> > >
> >
> > But OTOH, just updating the control file via pg_upgrade may not be
> > sufficient because we should keep the shutdown checkpoint record also
> > updated with that value. See how pg_resetwal achieves it via
> > RewriteControlFile() and WriteEmptyXLOG(). So, considering both the
> > approaches it seems better to go with a server-side function as Robert
> > suggested.
>
> Based on these discussion, I implemented a server-side approach. An attached patch
> adds a upgrade function which sets ShmemVariableCache->nextOid. It is called at
> the end of the upgrade. Comments and name of issue_warnings_and_set_wal_level()
> is also updated because they become outdated.

Few comments:
1) Most of the code in binary_upgrade_set_next_oid is similar to
SetNextObjectId, but SetNextObjectId has the error handling to report
an error if an invalid nextOid is specified:
if (ShmemVariableCache->nextOid > nextOid)
elog(ERROR, "too late to advance OID counter to %u, it is now %u",
nextOid, ShmemVariableCache->nextOid);

Is this check been left out from binary_upgrade_set_next_oid function
intentionally? Have you left this because it could be like a dead
code. If so, should we have an assert for this here?

2) How about changing issue_warnings_and_set_oid function name to
issue_warnings_and_set_next_oid?
 void
-issue_warnings_and_set_wal_level(void)
+issue_warnings_and_set_oid(void)
 {

3) We have removed these comments, is there any change to the rsync
instructions? If so we could update the comments accordingly.
-        * We unconditionally start/stop the new server because
pg_resetwal -o set
-        * wal_level to 'minimum'.  If the user is upgrading standby
servers using
-        * the rsync instructions, they will need pg_upgrade to write its final
-        * WAL record showing wal_level as 'replica'.

Regards,
Vignesh



RE: pg_upgrade's interaction with pg_resetwal seems confusing

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Vignesh,

Thank you for reviewing! PSA new version.

> 
> Few comments:
> 1) Most of the code in binary_upgrade_set_next_oid is similar to
> SetNextObjectId, but SetNextObjectId has the error handling to report
> an error if an invalid nextOid is specified:
> if (ShmemVariableCache->nextOid > nextOid)
> elog(ERROR, "too late to advance OID counter to %u, it is now %u",
> nextOid, ShmemVariableCache->nextOid);
> 
> Is this check been left out from binary_upgrade_set_next_oid function
> intentionally? Have you left this because it could be like a dead
> code. If so, should we have an assert for this here?

Yeah, they were removed intentionally, but I did rethink that they could be
combined. ereport() would be skipped during the upgrade mode. Thought?

Regarding the first ereport(ERROR), it just requires that we are doing initdb.

As for the second ereport(ERROR), it requires that next OID is not rollbacked.
The restriction seems OK during the initialization, but it is not appropriate for
upgrading: wraparound of OID counter might be occurred on old cluster but we try
to restore the counter anyway.

> 2) How about changing issue_warnings_and_set_oid function name to
> issue_warnings_and_set_next_oid?
>  void
> -issue_warnings_and_set_wal_level(void)
> +issue_warnings_and_set_oid(void)
>  {

Fixed.

> 3) We have removed these comments, is there any change to the rsync
> instructions? If so we could update the comments accordingly.
> -        * We unconditionally start/stop the new server because
> pg_resetwal -o set
> -        * wal_level to 'minimum'.  If the user is upgrading standby
> servers using
> -        * the rsync instructions, they will need pg_upgrade to write its final
> -        * WAL record showing wal_level as 'replica'.
>

Hmm, I thought comments for rsync seemed outdated so that removed. I still think
this is not needed. Since controlfile->wal_level is not updated to 'minimal'
anymore, the unconditional startup is not required for physical standby.


[1] :
https://www.postgresql.org/docs/devel/pgupgrade.html#:~:text=the%20next%20step.-,Run%20rsync,-When%20using%20link

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

Re: pg_upgrade's interaction with pg_resetwal seems confusing

От
Alvaro Herrera
Дата:
Note that this patch falsifies the comment in SetNextObjectId that
taking the lock is pro forma only -- it no longer is, since in upgrade
mode there can be multiple subprocesses running -- so I think it should
be updated.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: pg_upgrade's interaction with pg_resetwal seems confusing

От
vignesh C
Дата:
On Mon, 16 Oct 2023 at 10:33, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Vignesh,
>
> Thank you for reviewing! PSA new version.
>
> >
> > Few comments:
> > 1) Most of the code in binary_upgrade_set_next_oid is similar to
> > SetNextObjectId, but SetNextObjectId has the error handling to report
> > an error if an invalid nextOid is specified:
> > if (ShmemVariableCache->nextOid > nextOid)
> > elog(ERROR, "too late to advance OID counter to %u, it is now %u",
> > nextOid, ShmemVariableCache->nextOid);
> >
> > Is this check been left out from binary_upgrade_set_next_oid function
> > intentionally? Have you left this because it could be like a dead
> > code. If so, should we have an assert for this here?
>
> Yeah, they were removed intentionally, but I did rethink that they could be
> combined. ereport() would be skipped during the upgrade mode. Thought?
>
> Regarding the first ereport(ERROR), it just requires that we are doing initdb.
>
> As for the second ereport(ERROR), it requires that next OID is not rollbacked.
> The restriction seems OK during the initialization, but it is not appropriate for
> upgrading: wraparound of OID counter might be occurred on old cluster but we try
> to restore the counter anyway.
>
> > 2) How about changing issue_warnings_and_set_oid function name to
> > issue_warnings_and_set_next_oid?
> >  void
> > -issue_warnings_and_set_wal_level(void)
> > +issue_warnings_and_set_oid(void)
> >  {
>
> Fixed.
>
> > 3) We have removed these comments, is there any change to the rsync
> > instructions? If so we could update the comments accordingly.
> > -        * We unconditionally start/stop the new server because
> > pg_resetwal -o set
> > -        * wal_level to 'minimum'.  If the user is upgrading standby
> > servers using
> > -        * the rsync instructions, they will need pg_upgrade to write its final
> > -        * WAL record showing wal_level as 'replica'.
> >
>
> Hmm, I thought comments for rsync seemed outdated so that removed. I still think
> this is not needed. Since controlfile->wal_level is not updated to 'minimal'
> anymore, the unconditional startup is not required for physical standby.

We can update the commit message with the details of the same, it will
help to understand that it is intentionally done.

There are couple of typos with the new patch:
1) "uprade logical replication slot" should be "upgrade logical
replication slot":
Previously, the OID counter is restored by invoking pg_resetwal with the -o
option, at the end of upgrade. This is not problematic for now, but WAL removals
are not happy if we want to uprade logical replication slot. Therefore, a new
upgrade function is introduced to reset next OID.
2) "becasue the value" should be "because the value":
Note that we only update the on-memory data to avoid concurrent update of
control with the chekpointer. It is harmless becasue the value would be set at
shutdown checkpoint.

Regards,
Vignesh



RE: pg_upgrade's interaction with pg_resetwal seems confusing

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Alvaro,

Thank you for updating! PSA new version.

> Note that this patch falsifies the comment in SetNextObjectId that
> taking the lock is pro forma only -- it no longer is, since in upgrade
> mode there can be multiple subprocesses running -- so I think it should
> be updated.

Indeed, some comments were updated.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Вложения

RE: pg_upgrade's interaction with pg_resetwal seems confusing

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Vignesh,

Thank you for reviewing! New patch can be available in [1].

> We can update the commit message with the details of the same, it will
> help to understand that it is intentionally done.

Both comments and a commit message were updated related.

> There are couple of typos with the new patch:
> 1) "uprade logical replication slot" should be "upgrade logical
> replication slot":
> Previously, the OID counter is restored by invoking pg_resetwal with the -o
> option, at the end of upgrade. This is not problematic for now, but WAL removals
> are not happy if we want to uprade logical replication slot. Therefore, a new
> upgrade function is introduced to reset next OID.

Fixed.

> 2) "becasue the value" should be "because the value":
> Note that we only update the on-memory data to avoid concurrent update of
> control with the chekpointer. It is harmless becasue the value would be set at
> shutdown checkpoint.

Fixed.

[1]:
https://www.postgresql.org/message-id/TYAPR01MB5866DAFE000F8677C49ACD66F5D8A%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED