Обсуждение: ...
I'm using PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20070115 (SUSE Linux), 64-bit.
I set up streaming replication for a read/write primary and a read/only standby. The replication works fine for a while, and then out of the blue BOTH machines become read/write, but with no replication from the original primary to the newly read/write standby.
The only log entry that seems relevant is as follows: FATAL,57P01,"terminating walreceiver process due to administrator command",,,,,,,,"ProcessWalRcvInterrupts, walreceiver.c:150",""
Any help/guidance would be appreciated. Thanks in advance.
I set up streaming replication for a read/write primary and a read/only standby. The replication works fine for a while, and then out of the blue BOTH machines become read/write, but with no replication from the original primary to the newly read/write standby.
The only log entry that seems relevant is as follows: FATAL,57P01,"terminating walreceiver process due to administrator command",,,,,,,,"ProcessWalRcvInterrupts, walreceiver.c:150",""
Any help/guidance would be appreciated. Thanks in advance.
Sounds like you might have a "trigger_file" set in your recovery.conf. Do you? That or someone is issuing a pg_ctl promote command.
I'm using PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20070115 (SUSE Linux), 64-bit.
I set up streaming replication for a read/write primary and a read/only standby. The replication works fine for a while, and then out of the blue BOTH machines become read/write, but with no replication from the original primary to the newly read/write standby.
The only log entry that seems relevant is as follows: FATAL,57P01,"terminating walreceiver process due to administrator command",,,,,,,,"ProcessWalRcvInterrupts, walreceiver.c:150",""
Any help/guidance would be appreciated. Thanks in advance.
I would check for any automated jobs that touch the trigger file on the slave
On Fri, Apr 25, 2014 at 1:37 PM, Henry Korszun <henryk302@yahoo.com> wrote:
I'm using PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20070115 (SUSE Linux), 64-bit.
I set up streaming replication for a read/write primary and a read/only standby. The replication works fine for a while, and then out of the blue BOTH machines become read/write, but with no replication from the original primary to the newly read/write standby.
The only log entry that seems relevant is as follows: FATAL,57P01,"terminating walreceiver process due to administrator command",,,,,,,,"ProcessWalRcvInterrupts, walreceiver.c:150",""
Any help/guidance would be appreciated. Thanks in advance.
Also, if you are using chef/puppet to automate the configurations, maybe the file recovery.conf is being overwritten or removed.
On Fri, Apr 25, 2014 at 1:46 PM, Payal Singh <payal@omniti.com> wrote:
I would check for any automated jobs that touch the trigger file on the slaveOn Fri, Apr 25, 2014 at 1:37 PM, Henry Korszun <henryk302@yahoo.com> wrote:I'm using PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20070115 (SUSE Linux), 64-bit.
I set up streaming replication for a read/write primary and a read/only standby. The replication works fine for a while, and then out of the blue BOTH machines become read/write, but with no replication from the original primary to the newly read/write standby.
The only log entry that seems relevant is as follows: FATAL,57P01,"terminating walreceiver process due to administrator command",,,,,,,,"ProcessWalRcvInterrupts, walreceiver.c:150",""
Any help/guidance would be appreciated. Thanks in advance.
There IS a trigger file, which does appear to have been "touch"ed. But the problem is that a fail-over hasn't really occurred since the original read/write primary continues to be a fully functioning read/write machine. But it's no longer replicating to the erstwhile standby, which has become read/write. Bottom line, I now have 2 read/write machines, but with no replication between them.
On Friday, April 25, 2014 1:43 PM, Scott Whitney <scott@journyx.com> wrote:
Sounds like you might have a "trigger_file" set in your recovery.conf. Do you? That or someone is issuing a pg_ctl promote command.
I'm using PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20070115 (SUSE Linux), 64-bit.
I set up streaming replication for a read/write primary and a read/only standby. The replication works fine for a while, and then out of the blue BOTH machines become read/write, but with no replication from the original primary to the newly read/write standby.
The only log entry that seems relevant is as follows: FATAL,57P01,"terminating walreceiver process due to administrator command",,,,,,,,"ProcessWalRcvInterrupts, walreceiver.c:150",""
Any help/guidance would be appreciated. Thanks in advance.
Once a trigger file is touched on slave, it makes the slave standalone, but doesn't stop the old primary server automatically.You have to handle that, by either stopping the old primary altogether or pointing a virtual ip to the newslave. On Fri, Apr 25, 2014 at 10:57:16AM -0700, Henry Korszun wrote: > There IS a trigger file, which does appear to have been "touch"ed. But the problem is that a fail-over hasn't really occurredsince the original read/write primary continues to be a fully functioning read/write machine. But it's no longerreplicating to the erstwhile standby, which has become read/write. Bottom line, I now have 2 read/write machines,but with no replication between them. > > On Friday, April 25, 2014 1:43 PM, Scott Whitney <scott@journyx.com> wrote: > > Sounds like you might have a "trigger_file" set in your recovery.conf. Do you? That or someone is issuing a pg_ctl promotecommand. > > ________________________________ > > I'm using PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20070115 (SUSE Linux), 64-bit. > > > >I set up streaming replication for a read/write primary and a read/only > standby. The replication works fine for a while, and then out of the > blue BOTH machines become read/write, but with no replication from the > original primary to the newly read/write standby. > > > >The only log entry that seems relevant is as follows: > FATAL,57P01,"terminating walreceiver process due to administrator > command",,,,,,,,"ProcessWalRcvInterrupts, walreceiver.c:150","" > > > >Any help/guidance would be appreciated. Thanks in advance. > > > >
The slave doesn't "turn off" the master. The trigger file is intended to be touched _when the master is down_.
Since the master never WENT down (or came back up) and the trigger file was touched, the slave got promoted.
You'll need to stop the slave, run your select pg_startbackup(), rsync, etc to get your slave back to slave mode.
There IS a trigger file, which does appear to have been "touch"ed. But the problem is that a fail-over hasn't really occurred since the original read/write primary continues to be a fully functioning read/write machine. But it's no longer replicating to the erstwhile standby, which has become read/write. Bottom line, I now have 2 read/write machines, but with no replication between them.On Friday, April 25, 2014 1:43 PM, Scott Whitney <scott@journyx.com> wrote:Sounds like you might have a "trigger_file" set in your recovery.conf. Do you? That or someone is issuing a pg_ctl promote command.I'm using PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20070115 (SUSE Linux), 64-bit.
I set up streaming replication for a read/write primary and a read/only standby. The replication works fine for a while, and then out of the blue BOTH machines become read/write, but with no replication from the original primary to the newly read/write standby.
The only log entry that seems relevant is as follows: FATAL,57P01,"terminating walreceiver process due to administrator command",,,,,,,,"ProcessWalRcvInterrupts, walreceiver.c:150",""
Any help/guidance would be appreciated. Thanks in advance.
I understand what you're saying, but I don't know what's causing the "touch" in the first place. I guess I need to further examine/debug. Thanks for your help.
On Friday, April 25, 2014 2:00 PM, Scott Whitney <scott@journyx.com> wrote:
The slave doesn't "turn off" the master. The trigger file is intended to be touched _when the master is down_.
Since the master never WENT down (or came back up) and the trigger file was touched, the slave got promoted.
You'll need to stop the slave, run your select pg_startbackup(), rsync, etc to get your slave back to slave mode.
There IS a trigger file, which does appear to have been "touch"ed. But the problem is that a fail-over hasn't really occurred since the original read/write primary continues to be a fully functioning read/write machine. But it's no longer replicating to the erstwhile standby, which has become read/write. Bottom line, I now have 2 read/write machines, but with no replication between them.On Friday, April 25, 2014 1:43 PM, Scott Whitney <scott@journyx.com> wrote:Sounds like you might have a "trigger_file" set in your recovery.conf. Do you? That or someone is issuing a pg_ctl promote command.I'm using PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20070115 (SUSE Linux), 64-bit.
I set up streaming replication for a read/write primary and a read/only standby. The replication works fine for a while, and then out of the blue BOTH machines become read/write, but with no replication from the original primary to the newly read/write standby.
The only log entry that seems relevant is as follows: FATAL,57P01,"terminating walreceiver process due to administrator command",,,,,,,,"ProcessWalRcvInterrupts, walreceiver.c:150",""
Any help/guidance would be appreciated. Thanks in advance.
Henry Korszun <henryk302@yahoo.com> writes: > I understand what you're saying, but I don't know what's causing the "touch" in the first place. I guess I need to furtherexamine/debug. Thanks for your help. > On Friday, April 25, 2014 2:00 PM, Scott Whitney <scott@journyx.com> wrote: > The slave doesn't "turn off" the master. The trigger file is intended to be touched _when the master is down_. Nor do we. Possibly your system is running some HA software and it's onlining your standby due to false-positive. > > Since the master never WENT down (or came back up) and the trigger file was touched, the slave got promoted. > > You'll need to stop the slave, run your select pg_startbackup(), rsync, etc to get your slave back to slave mode. > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > There IS a trigger file, which does appear to have been "touch"ed. But the problem is that a fail-over hasn't reallyoccurred since the original read/write primary > continues to be a fully functioning read/write machine. But it's no longer replicating to the erstwhile standby, whichhas become read/write. Bottom line, I now > have 2 read/write machines, but with no replication between them. > On Friday, April 25, 2014 1:43 PM, Scott Whitney <scott@journyx.com> wrote: > Sounds like you might have a "trigger_file" set in your recovery.conf. Do you? That or someone is issuing a pg_ctlpromote command. > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > I'm using PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20070115 (SUSE Linux), 64-bit. > > I set up streaming replication for a read/write primary and a read/only standby. The replication works fine fora while, and then out of the blue BOTH machines > become read/write, but with no replication from the original primary to the newly read/write standby. > > The only log entry that seems relevant is as follows: FATAL,57P01,"terminating walreceiver process due to administrator > command",,,,,,,,"ProcessWalRcvInterrupts, walreceiver.c:150","" > > Any help/guidance would be appreciated. Thanks in advance. > -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@comcast.net p: 312.241.7800
On Fri, Apr 25, 2014 at 01:23:30PM -0500, Jerry Sievers wrote: > Henry Korszun <henryk302@yahoo.com> writes: > > I understand what you're saying, but I don't know what's causing the "touch" in the first place. I guess I need to furtherexamine/debug. Thanks for your help. it may be a semantic difference, but is recovery_mode dependent on the existence of the trigger file, or the timestamp. "touch" in the above context could either be the creation of the file, or simply updating the timestamp of the file. i suspect that the recovery is triggered by the mere existence of the file, while henry might be talking in the 'update timestamp' context. --jim -- Jim Mercer Reptilian Research jim@reptiles.org +1 416 410-5633 "He who dies with the most toys is nonetheless dead"
Jim Mercer wrote > On Fri, Apr 25, 2014 at 01:23:30PM -0500, Jerry Sievers wrote: >> Henry Korszun < > henryk302@ > > writes: >> > I understand what you're saying, but I don't know what's causing the >> "touch" in the first place. I guess I need to further examine/debug. >> Thanks for your help. > > it may be a semantic difference, but is recovery_mode dependent on the > existence > of the trigger file, or the timestamp. > > "touch" in the above context could either be the creation of the file, > or simply updating the timestamp of the file. > > i suspect that the recovery is triggered by the mere existence of the > file, > while henry might be talking in the 'update timestamp' context. I'm seriously doubting the timestamp info matters - the timestamp would almost always be in the past (specifying a future recovery date doesn't make sense anyway) and no arbitrary age is reasonable to make the file invalid and would be extremely confusing if one did. "Touch" is a shorthand for "a file whose mere existence is all that is necessary" and by convention implies that what is in the file doesn't matter (since actually touching a non-existent file creates a new empty file). If the file already existed (not that timeframes are being defined all that well here) the system would never have been in recovery mode... Henry's last comment is that it is not known what process is creating the empty trigger file in the first place - whether that process uses touch or some other means to create the file is irrelevant to the issue at hand. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/no-subject-tp5801536p5801671.html Sent from the PostgreSQL - admin mailing list archive at Nabble.com.