Обсуждение: pg_rewind problem: cannot find WAL

Поиск
Список
Период
Сортировка

pg_rewind problem: cannot find WAL

От
Luca Ferrari
Дата:
Hi all,
running 17.4 on ubuntu 24.04 machines. I've three hosts, pg-1
(primary) and two physical replicas.
I then promote host pg-3 as a master (pg_promote()) and want to rewind
the pg-1 to follow the new master, so:

ssh pg-3 'sudo -u postgres /usr/lib/postgresql/17/bin/pg_rewind -D
/var/lib/postgresql/17/main --source-server="user=replica_fluca
host=pg-3 dbname=replica_fluca"'
pg_rewind: servers diverged at WAL location 0/B8550F8 on timeline 1
pg_rewind: error: could not open file
"/var/lib/postgresql/17/main/pg_wal/00000001000000000000000A": No such
file or directory
pg_rewind: error: could not find previous WAL record at 0/AFFF4E8

But the file 0x010000A is not there:


 % ssh pg-3 'sudo ls /var/lib/postgresql/17/main/pg_wal'
00000001000000000000000B.partial
00000002.history
00000002000000000000000B
00000002000000000000000C
00000002000000000000000D
00000002000000000000000E
archive_status
summaries

% ssh pg-1 'sudo ls /var/lib/postgresql/17/main/pg_wal'
000000010000000000000005.00000028.backup
00000001000000000000000B
00000001000000000000000C
00000001000000000000000D
00000001000000000000000E
archive_status
summaries

Do i have to ensure the old primary pg-1 does a wal switch before
promoting the other one and try to rewind?

Thanks,
Luca



Re: pg_rewind problem: cannot find WAL

От
Laurenz Albe
Дата:
On Wed, 2025-05-07 at 12:51 +0200, Luca Ferrari wrote:
> running 17.4 on ubuntu 24.04 machines. I've three hosts, pg-1
> (primary) and two physical replicas.
> I then promote host pg-3 as a master (pg_promote()) and want to rewind
> the pg-1 to follow the new master, so:
>
> ssh pg-3 'sudo -u postgres /usr/lib/postgresql/17/bin/pg_rewind -D
> /var/lib/postgresql/17/main --source-server="user=replica_fluca
> host=pg-3 dbname=replica_fluca"'
> pg_rewind: servers diverged at WAL location 0/B8550F8 on timeline 1
> pg_rewind: error: could not open file
> "/var/lib/postgresql/17/main/pg_wal/00000001000000000000000A": No such
> file or directory
> pg_rewind: error: could not find previous WAL record at 0/AFFF4E8
>
> But the file 0x010000A is not there:
>
>
>  % ssh pg-3 'sudo ls /var/lib/postgresql/17/main/pg_wal'
> 00000001000000000000000B.partial
> 00000002.history
> 00000002000000000000000B
> 00000002000000000000000C
> 00000002000000000000000D
> 00000002000000000000000E
> archive_status
> summaries
>
> % ssh pg-1 'sudo ls /var/lib/postgresql/17/main/pg_wal'
> 000000010000000000000005.00000028.backup
> 00000001000000000000000B
> 00000001000000000000000C
> 00000001000000000000000D
> 00000001000000000000000E
> archive_status
> summaries
>
> Do i have to ensure the old primary pg-1 does a wal switch before
> promoting the other one and try to rewind?

I don't think it is connected to a WAL switch.

I'd say that you should set "wal_keep_size" high enough that all the WAL
needed for pg_rewind is still present.

If you have a WAL archive, you could define a restore_command on the server
you want to rewind.

Yours,
Laurenz Albe



Re: pg_rewind problem: cannot find WAL

От
Luca Ferrari
Дата:
On Wed, May 7, 2025 at 3:55 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> I don't think it is connected to a WAL switch.
>

Thanks.

> I'd say that you should set "wal_keep_size" high enough that all the WAL
> needed for pg_rewind is still present.
>
> If you have a WAL archive, you could define a restore_command on the server
> you want to rewind.

I've pgbackrest making backups, so I have an archive_command. I'm
going to see if putting a restore_command can fix the problem.

Thanks for the suggestion.

Luca



Re: pg_rewind problem: cannot find WAL

От
Luca Ferrari
Дата:
On Thu, May 8, 2025 at 8:54 AM Luca Ferrari <fluca1978@gmail.com> wrote:
>
> I've pgbackrest making backups, so I have an archive_command. I'm
> going to see if putting a restore_command can fix the problem.
>

But I'm facing a quite trivial problem: in ubuntu installation the
configuration files are separated from the PGDATA.
Apparently pg_rewind is trying to read postgresql.conf to get the
restore_command, and I don't know how to specify the different
location of the postgresql.conf (cannot specifcy -c as in postgres):

$ /usr/lib/postgresql/17/bin/pg_rewind -D /var/lib/postgresql/17/main
--source-server="user=replica_fluca host=dev-psqlha3
dbname=replica_fluca" -R -P --debug -c
postgres: could not access the server configuration file
"/var/lib/postgresql/17/main/postgresql.conf": No such file or
directory
no data was returned by command "/usr/lib/postgresql/17/bin/postgres
-D /var/lib/postgresql/17/main -C restore_command"
child process exited with exit code 2
pg_rewind: error: could not read restore_command from target cluster

Any idea?
Clearly, postgresql.auto.conf is within PGDATA, and since my
recovery_command is there, one trick could be to touch and empty
PGDATA/postgresql.conf, pg_rewind, remove the fake configurtion file.
But I'm sure there is a smarter solution.

Thanks,
Luca



Re: pg_rewind problem: cannot find WAL

От
Rob Sargent
Дата:
> 
> Any idea?
> Clearly, postgresql.auto.conf is within PGDATA, and since my
> recovery_command is there, one trick could be to touch and empty
> PGDATA/postgresql.conf, pg_rewind, remove the fake configurtion file.
> But I'm sure there is a smarter solution.
> 
> Thanks,
> Luca
> 
> 
A symlink from $PGDATA to where actual file?




Re: pg_rewind problem: cannot find WAL

От
Luca Ferrari
Дата:
On Thu, May 8, 2025 at 4:04 PM Rob Sargent <robjsargent@gmail.com> wrote:
>
>
> A symlink from $PGDATA to where actual file?
>

Could be, I need to experiment with pg_basebackup to ensure it is not
conflicting with the /etc/ configuration file when creating a clone.

Luca



Re: pg_rewind problem: cannot find WAL

От
Adrian Klaver
Дата:
On 5/8/25 04:26, Luca Ferrari wrote:
> On Thu, May 8, 2025 at 8:54 AM Luca Ferrari <fluca1978@gmail.com> wrote:
>>
>> I've pgbackrest making backups, so I have an archive_command. I'm
>> going to see if putting a restore_command can fix the problem.
>>
> 
> But I'm facing a quite trivial problem: in ubuntu installation the
> configuration files are separated from the PGDATA.
> Apparently pg_rewind is trying to read postgresql.conf to get the
> restore_command, and I don't know how to specify the different
> location of the postgresql.conf (cannot specifcy -c as in postgres):
> 
> $ /usr/lib/postgresql/17/bin/pg_rewind -D /var/lib/postgresql/17/main
> --source-server="user=replica_fluca host=dev-psqlha3
> dbname=replica_fluca" -R -P --debug -c
> postgres: could not access the server configuration file
> "/var/lib/postgresql/17/main/postgresql.conf": No such file or
> directory
> no data was returned by command "/usr/lib/postgresql/17/bin/postgres
> -D /var/lib/postgresql/17/main -C restore_command"
> child process exited with exit code 2
> pg_rewind: error: could not read restore_command from target cluster
> 
> Any idea?

/usr/lib/postgresql/17/bin/pg_rewind  --help
pg_rewind resynchronizes a PostgreSQL cluster with another copy of the 
cluster.

Usage:
   pg_rewind [OPTION]...

Options:
   -c, --restore-target-wal       use "restore_command" in target 
configuration to
                                  retrieve WAL files from archives
   -D, --target-pgdata=DIRECTORY  existing data directory to modify
       --source-pgdata=DIRECTORY  source data directory to synchronize with
       --source-server=CONNSTR    source server to synchronize with
   -n, --dry-run                  stop before modifying anything
   -N, --no-sync                  do not wait for changes to be written
                                  safely to disk
   -P, --progress                 write progress messages
   -R, --write-recovery-conf      write configuration for replication
                                  (requires --source-server)
       --config-file=FILENAME     use specified main server configuration
                                  file when running target cluster
       --debug                    write a lot of debug messages
       --no-ensure-shutdown       do not automatically fix unclean shutdown
       --sync-method=METHOD       set method for syncing files to disk
   -V, --version                  output version information, then exit
   -?, --help                     show this help, then exit


So use --config-file=FILENAME?

> Clearly, postgresql.auto.conf is within PGDATA, and since my
> recovery_command is there, one trick could be to touch and empty
> PGDATA/postgresql.conf, pg_rewind, remove the fake configurtion file.
> But I'm sure there is a smarter solution.
> 
> Thanks,
> Luca
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: pg_rewind problem: cannot find WAL

От
Luca Ferrari
Дата:
On Thu, May 8, 2025 at 5:11 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> /usr/lib/postgresql/17/bin/pg_rewind  --help
> pg_rewind resynchronizes a PostgreSQL cluster with another copy of the
> cluster.
>        --config-file=FILENAME     use specified main server configuration

shame on me! I was grepping config_file as in pg_ctl...

Thanks!

Luca