Обсуждение: pg_rewind problem: cannot find WAL
Hi all, running 17.4 on ubuntu 24.04 machines. I've three hosts, pg-1 (primary) and two physical replicas. I then promote host pg-3 as a master (pg_promote()) and want to rewind the pg-1 to follow the new master, so: ssh pg-3 'sudo -u postgres /usr/lib/postgresql/17/bin/pg_rewind -D /var/lib/postgresql/17/main --source-server="user=replica_fluca host=pg-3 dbname=replica_fluca"' pg_rewind: servers diverged at WAL location 0/B8550F8 on timeline 1 pg_rewind: error: could not open file "/var/lib/postgresql/17/main/pg_wal/00000001000000000000000A": No such file or directory pg_rewind: error: could not find previous WAL record at 0/AFFF4E8 But the file 0x010000A is not there: % ssh pg-3 'sudo ls /var/lib/postgresql/17/main/pg_wal' 00000001000000000000000B.partial 00000002.history 00000002000000000000000B 00000002000000000000000C 00000002000000000000000D 00000002000000000000000E archive_status summaries % ssh pg-1 'sudo ls /var/lib/postgresql/17/main/pg_wal' 000000010000000000000005.00000028.backup 00000001000000000000000B 00000001000000000000000C 00000001000000000000000D 00000001000000000000000E archive_status summaries Do i have to ensure the old primary pg-1 does a wal switch before promoting the other one and try to rewind? Thanks, Luca
On Wed, 2025-05-07 at 12:51 +0200, Luca Ferrari wrote: > running 17.4 on ubuntu 24.04 machines. I've three hosts, pg-1 > (primary) and two physical replicas. > I then promote host pg-3 as a master (pg_promote()) and want to rewind > the pg-1 to follow the new master, so: > > ssh pg-3 'sudo -u postgres /usr/lib/postgresql/17/bin/pg_rewind -D > /var/lib/postgresql/17/main --source-server="user=replica_fluca > host=pg-3 dbname=replica_fluca"' > pg_rewind: servers diverged at WAL location 0/B8550F8 on timeline 1 > pg_rewind: error: could not open file > "/var/lib/postgresql/17/main/pg_wal/00000001000000000000000A": No such > file or directory > pg_rewind: error: could not find previous WAL record at 0/AFFF4E8 > > But the file 0x010000A is not there: > > > % ssh pg-3 'sudo ls /var/lib/postgresql/17/main/pg_wal' > 00000001000000000000000B.partial > 00000002.history > 00000002000000000000000B > 00000002000000000000000C > 00000002000000000000000D > 00000002000000000000000E > archive_status > summaries > > % ssh pg-1 'sudo ls /var/lib/postgresql/17/main/pg_wal' > 000000010000000000000005.00000028.backup > 00000001000000000000000B > 00000001000000000000000C > 00000001000000000000000D > 00000001000000000000000E > archive_status > summaries > > Do i have to ensure the old primary pg-1 does a wal switch before > promoting the other one and try to rewind? I don't think it is connected to a WAL switch. I'd say that you should set "wal_keep_size" high enough that all the WAL needed for pg_rewind is still present. If you have a WAL archive, you could define a restore_command on the server you want to rewind. Yours, Laurenz Albe
On Wed, May 7, 2025 at 3:55 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > I don't think it is connected to a WAL switch. > Thanks. > I'd say that you should set "wal_keep_size" high enough that all the WAL > needed for pg_rewind is still present. > > If you have a WAL archive, you could define a restore_command on the server > you want to rewind. I've pgbackrest making backups, so I have an archive_command. I'm going to see if putting a restore_command can fix the problem. Thanks for the suggestion. Luca
On Thu, May 8, 2025 at 8:54 AM Luca Ferrari <fluca1978@gmail.com> wrote: > > I've pgbackrest making backups, so I have an archive_command. I'm > going to see if putting a restore_command can fix the problem. > But I'm facing a quite trivial problem: in ubuntu installation the configuration files are separated from the PGDATA. Apparently pg_rewind is trying to read postgresql.conf to get the restore_command, and I don't know how to specify the different location of the postgresql.conf (cannot specifcy -c as in postgres): $ /usr/lib/postgresql/17/bin/pg_rewind -D /var/lib/postgresql/17/main --source-server="user=replica_fluca host=dev-psqlha3 dbname=replica_fluca" -R -P --debug -c postgres: could not access the server configuration file "/var/lib/postgresql/17/main/postgresql.conf": No such file or directory no data was returned by command "/usr/lib/postgresql/17/bin/postgres -D /var/lib/postgresql/17/main -C restore_command" child process exited with exit code 2 pg_rewind: error: could not read restore_command from target cluster Any idea? Clearly, postgresql.auto.conf is within PGDATA, and since my recovery_command is there, one trick could be to touch and empty PGDATA/postgresql.conf, pg_rewind, remove the fake configurtion file. But I'm sure there is a smarter solution. Thanks, Luca
> > Any idea? > Clearly, postgresql.auto.conf is within PGDATA, and since my > recovery_command is there, one trick could be to touch and empty > PGDATA/postgresql.conf, pg_rewind, remove the fake configurtion file. > But I'm sure there is a smarter solution. > > Thanks, > Luca > > A symlink from $PGDATA to where actual file?
On Thu, May 8, 2025 at 4:04 PM Rob Sargent <robjsargent@gmail.com> wrote: > > > A symlink from $PGDATA to where actual file? > Could be, I need to experiment with pg_basebackup to ensure it is not conflicting with the /etc/ configuration file when creating a clone. Luca
On 5/8/25 04:26, Luca Ferrari wrote: > On Thu, May 8, 2025 at 8:54 AM Luca Ferrari <fluca1978@gmail.com> wrote: >> >> I've pgbackrest making backups, so I have an archive_command. I'm >> going to see if putting a restore_command can fix the problem. >> > > But I'm facing a quite trivial problem: in ubuntu installation the > configuration files are separated from the PGDATA. > Apparently pg_rewind is trying to read postgresql.conf to get the > restore_command, and I don't know how to specify the different > location of the postgresql.conf (cannot specifcy -c as in postgres): > > $ /usr/lib/postgresql/17/bin/pg_rewind -D /var/lib/postgresql/17/main > --source-server="user=replica_fluca host=dev-psqlha3 > dbname=replica_fluca" -R -P --debug -c > postgres: could not access the server configuration file > "/var/lib/postgresql/17/main/postgresql.conf": No such file or > directory > no data was returned by command "/usr/lib/postgresql/17/bin/postgres > -D /var/lib/postgresql/17/main -C restore_command" > child process exited with exit code 2 > pg_rewind: error: could not read restore_command from target cluster > > Any idea? /usr/lib/postgresql/17/bin/pg_rewind --help pg_rewind resynchronizes a PostgreSQL cluster with another copy of the cluster. Usage: pg_rewind [OPTION]... Options: -c, --restore-target-wal use "restore_command" in target configuration to retrieve WAL files from archives -D, --target-pgdata=DIRECTORY existing data directory to modify --source-pgdata=DIRECTORY source data directory to synchronize with --source-server=CONNSTR source server to synchronize with -n, --dry-run stop before modifying anything -N, --no-sync do not wait for changes to be written safely to disk -P, --progress write progress messages -R, --write-recovery-conf write configuration for replication (requires --source-server) --config-file=FILENAME use specified main server configuration file when running target cluster --debug write a lot of debug messages --no-ensure-shutdown do not automatically fix unclean shutdown --sync-method=METHOD set method for syncing files to disk -V, --version output version information, then exit -?, --help show this help, then exit So use --config-file=FILENAME? > Clearly, postgresql.auto.conf is within PGDATA, and since my > recovery_command is there, one trick could be to touch and empty > PGDATA/postgresql.conf, pg_rewind, remove the fake configurtion file. > But I'm sure there is a smarter solution. > > Thanks, > Luca > > -- Adrian Klaver adrian.klaver@aklaver.com
On Thu, May 8, 2025 at 5:11 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote: > /usr/lib/postgresql/17/bin/pg_rewind --help > pg_rewind resynchronizes a PostgreSQL cluster with another copy of the > cluster. > --config-file=FILENAME use specified main server configuration shame on me! I was grepping config_file as in pg_ctl... Thanks! Luca