Обсуждение: WAL Replication query
List,
I am trying WAL replication,the setup is working with a primary and Standby server.
Query is : When the primary crashes due an unforeseen reason, what may happen to the uncommitted transactions at primary ?
Those won't replicate to the standby right ?
How much data loss occurs in this case ?
What is the mechanism to overcome this uncommitted transaction loss due to primary server crash which is not replicated to the standby server ?
Thank you,
Krishane
On Tue, 2022-11-01 at 11:25 +0530, KK CHN wrote: > I am trying WAL replication,the setup is working with a primary and Standby server. > > Query is : When the primary crashes due an unforeseen reason, > what may happen to the uncommitted transactions at primary ? Their effects remain invisible. > Those won't replicate to the standby right ? They may be replicated, but their effects will remain invisible there too. > How much data loss occurs in this case ? Data loss? I don't understand. There is no data loss with replication. If an incomplete transaction is rolled back, that is no data loss. > What is the mechanism to overcome this uncommitted transaction loss due to primary > server crash which is not replicated to the standby server ? I don't understand. Are you perhaps talking about failover? Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
If replication is in asynchronous then there will be data loss, right Laurenz? Thanks, Sacheen Birhade. -----Original Message----- From: Laurenz Albe <laurenz.albe@cybertec.at> Sent: Tuesday, November 1, 2022 12:11 PM To: KK CHN <kkchn.in@gmail.com>; pgsql-admin@lists.postgresql.org Subject: [External] Re: WAL Replication query On Tue, 2022-11-01 at 11:25 +0530, KK CHN wrote: > I am trying WAL replication,the setup is working with a primary and Standby server. > > Query is : When the primary crashes due an unforeseen reason, what > may happen to the uncommitted transactions at primary ? Their effects remain invisible. > Those won't replicate to the standby right ? They may be replicated, but their effects will remain invisible there too. > How much data loss occurs in this case ? Data loss? I don't understand. There is no data loss with replication. If an incomplete transaction is rolled back, that is no data loss. > What is the mechanism to overcome this uncommitted transaction loss > due to primary server crash which is not replicated to the standby server ? I don't understand. Are you perhaps talking about failover? Yours, Laurenz Albe -- Cybertec | https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cybertec-postgresql.com%2F&data=05%7C01%7Csacheen.birhade%40veritas.com%7Ce6d7b7ce704e44eaa52208dabbd41e93%7Cfc8e13c0422c4c55b3eaca318e6cac32%7C0%7C0%7C638028816981291760%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=e5mP%2BJjxdGUSMFXWUowX0CUZJYDOu%2FRneDQ0jSSK3nA%3D&reserved=0
On Tue, 2022-11-01 at 06:44 +0000, Sacheen Birhade wrote: > If replication is in asynchronous then there will be data loss, right Laurenz? Why? The data will perhaps show up on the standby a little later, but why is that data loss? Remember that the question was about replication, and there was no mention of failover. Yours, Laurenz Albe
Yes, Laurenz. You are right. -----Original Message----- From: Laurenz Albe <laurenz.albe@cybertec.at> Sent: Tuesday, November 1, 2022 12:23 PM To: Sacheen Birhade <sacheen.birhade@veritas.com>; KK CHN <kkchn.in@gmail.com>; pgsql-admin@lists.postgresql.org Subject: Re: [External] Re: WAL Replication query On Tue, 2022-11-01 at 06:44 +0000, Sacheen Birhade wrote: > If replication is in asynchronous then there will be data loss, right Laurenz? Why? The data will perhaps show up on the standby a little later, but why is that data loss? Remember that the questionwas about replication, and there was no mention of failover. Yours, Laurenz Albe
On 11/1/22 01:53, Laurenz Albe wrote:
No, the question was about a crash during replication: OP (not Sacheen, unless that person is using two email addresses) explicitly asked "When the primary crashes due an unforeseen reason (what happens)?"
If the two database systems are really busy, and especially if the network connection isn't fast enough, async replication means there might be some transactions committed on Primary which were queued for transmission, but hadn't yet made it to the Secondary, right?
On Tue, 2022-11-01 at 06:44 +0000, Sacheen Birhade wrote:If replication is in asynchronous then there will be data loss, right Laurenz?Why? The data will perhaps show up on the standby a little later, but why is that data loss? Remember that the question was about replication, and there was no mention of failover.
No, the question was about a crash during replication: OP (not Sacheen, unless that person is using two email addresses) explicitly asked "When the primary crashes due an unforeseen reason (what happens)?"
If the two database systems are really busy, and especially if the network connection isn't fast enough, async replication means there might be some transactions committed on Primary which were queued for transmission, but hadn't yet made it to the Secondary, right?
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.
On Tue, 2022-11-01 at 02:43 -0500, Ron wrote: > On 11/1/22 01:53, Laurenz Albe wrote: > > On Tue, 2022-11-01 at 06:44 +0000, Sacheen Birhade wrote: > > > If replication is in asynchronous then there will be data loss, right Laurenz? > > > > Why? The data will perhaps show up on the standby a little later, but why is > > that data loss? Remember that the question was about replication, and there > > was no mention of failover. > > No, the question was about a crash during replication: OP (not Sacheen, unless that person > is using two email addresses) explicitly asked "When the primary crashes due an unforeseen > reason (what happens)?" > > If the two database systems are really busy, and especially if the network connection > isn't fast enough, async replication means there might be some transactions committed > on Primary which were queued for transmission, but hadn't yet made it to the Secondary, right? Right. And how does that constitute data loss? If you start the primary again, the transaction will be replicated just fine. Now if you call it *data delay*, I would agree. Yours, Laurenz Albe
On 11/1/22 08:34, Laurenz Albe wrote:
Primary has crashed, according to OP; it's not coming back any time soon. And thus you promote Secondary to be New Primary, and go about your work. When the Old Primary comes back up (hours or days later), you do a pg_basebackup to make it the New Secondary.
On Tue, 2022-11-01 at 02:43 -0500, Ron wrote:On 11/1/22 01:53, Laurenz Albe wrote:On Tue, 2022-11-01 at 06:44 +0000, Sacheen Birhade wrote:If replication is in asynchronous then there will be data loss, right Laurenz?Why? The data will perhaps show up on the standby a little later, but why is that data loss? Remember that the question was about replication, and there was no mention of failover.No, the question was about a crash during replication: OP (not Sacheen, unless that person is using two email addresses) explicitly asked "When the primary crashes due an unforeseen reason (what happens)?" If the two database systems are really busy, and especially if the network connection isn't fast enough, async replication means there might be some transactions committed on Primary which were queued for transmission, but hadn't yet made it to the Secondary, right?Right. And how does that constitute data loss? If you start the primary again, the transaction will be replicated just fine. Now if you call it *data delay*, I would agree.
Primary has crashed, according to OP; it's not coming back any time soon. And thus you promote Secondary to be New Primary, and go about your work. When the Old Primary comes back up (hours or days later), you do a pg_basebackup to make it the New Secondary.
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.
> On Nov 1, 2022, at 7:34 AM, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > Right. And how does that constitute data loss? If you start the primary again, the transaction > will be replicated just fine. Now if you call it *data delay*, I would agree. It comes down to this: there is no magic. Replication is async with delay between primary & secondary, or replication issynchronous with the delay visible to clients. Transaction commit completes, or it doesn't. And so on.
On Tue, 2022-11-01 at 08:49 -0500, Ron wrote: > Primary has crashed, according to OP; it's not coming back any time soon. And thus you promote > Secondary to be New Primary, and go about your work. When the Old Primary comes back up > (hours or days later), you do a pg_basebackup to make it the New Secondary. There was no word about failover or "not coming back any time soon" (that's why I asked for clarification in my answer). In fact, PostgreSQL will often undergo crash recovery and come back up automatically. See the parameter "restart_after_crash", which defaults to "on". Yours, Laurenz Albe
On 11/1/22 08:56, Laurenz Albe wrote:
On of the purposes of replication is so that you can fail over to it when Primary crashes, right?
It does of course depend on what crashes, and how long it takes to restart.
Just the other day, Veeam somehow (for at least the second time) broke all connections to a VM running under ESX. It took six hours to get fixed. That VM was running Postgresql. If the Pg cluster was replicated, I'd have promoted the Secondary to Primary for my customer.
On Tue, 2022-11-01 at 08:49 -0500, Ron wrote:Primary has crashed, according to OP; it's not coming back any time soon. And thus you promote Secondary to be New Primary, and go about your work. When the Old Primary comes back up (hours or days later), you do a pg_basebackup to make it the New Secondary.There was no word about failover
On of the purposes of replication is so that you can fail over to it when Primary crashes, right?
or "not coming back any time soon" (that's why I asked for clarification in my answer).
It does of course depend on what crashes, and how long it takes to restart.
Just the other day, Veeam somehow (for at least the second time) broke all connections to a VM running under ESX. It took six hours to get fixed. That VM was running Postgresql. If the Pg cluster was replicated, I'd have promoted the Secondary to Primary for my customer.
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.
On Tue, Nov 1, 2022 at 7:26 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Tue, 2022-11-01 at 08:49 -0500, Ron wrote:
> Primary has crashed, according to OP; it's not coming back any time soon. And thus you promote
> Secondary to be New Primary, and go about your work. When the Old Primary comes back up
> (hours or days later), you do a pg_basebackup to make it the New Secondary.
There was no word about failover or "not coming back any time soon" (that's why I asked for
clarification in my answer). In fact, PostgreSQL will often undergo crash recovery and come
back up automatically. See the parameter "restart_after_crash", which defaults to "on".
Sorry, It was my mistake not to mention the failover to the Standby required.
I mean when the Primary Server crashes due to a H/W failure or whatever other reason, Then in my case I am forced to promote the Standby server as Primary until the old primary server H/W vendor replaces the hardware component( it may vary from hours to days as pre SLA terms with H/W vendor).
In this scenario any transactions happened in crashed Primary server which were not transferred due to the delay in transfer to the Standby server. But I am in need to promote the Standby to Primary and it will run for another one day or two, until my Old crashed server failed component is replaced with a new H/W part.
So there will be an inconsistency in the data between the current Primary ( promoted standby) and my recovered /repaired old server ( which was primary before the crash).
How to handle this data inconsistency or what best mechanism needs to be implemented if you were me in this use case scenario?
Yours,
Laurenz Albe
On 11/1/22 23:00, KK CHN wrote:
[snip]
Before promoting Secondary to New Primary, I'd record it's current pg_stat_replication.replay_lsn value somewhere and then compare it to the relevant LSN on Old Primary when it comes back up.
If they're the same, then there's no data inconsistency.
If there is, then you've got to manually check tables, since every application and database structure is different
[snip]
So there will be an inconsistency in the data between the current Primary ( promoted standby) and my recovered /repaired old server ( which was primary before the crash).How to handle this data inconsistency or what best mechanism needs to be implemented if you were me in this use case scenario?
Before promoting Secondary to New Primary, I'd record it's current pg_stat_replication.replay_lsn value somewhere and then compare it to the relevant LSN on Old Primary when it comes back up.
If they're the same, then there's no data inconsistency.
If there is, then you've got to manually check tables, since every application and database structure is different
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.
On Wed, 2022-11-02 at 09:30 +0530, KK CHN wrote: > I mean when the Primary Server crashes due to a H/W failure or whatever other reason, > Then in my case I am forced to promote the Standby server as Primary until the > old primary server H/W vendor replaces the hardware component( it may vary from hours to days as pre SLA terms withH/W vendor). > > In this scenario any transactions happened in crashed Primary server which were not transferred due to the > delay in transfer to the Standby server. But I am in need to promote the Standby to Primary and > it will run for another one day or two, until my Old crashed server failed component is replaced with a new H/W part. > > So there will be an inconsistency in the data between the current Primary ( promoted standby) > and my recovered /repaired old server ( which was primary before the crash). > > How to handle this data inconsistency or what best mechanism needs to be implemented if you were me in this use casescenario? Thanks for the clarification. Yes, in that case you can lose committed data. The normal way to avoid that is to use synchronous replication. That is easy to set up, but has some implications: - it will slow down data modifications considerably - you should have a low network latency between the machines - the whole system becomes unavailable if the standby goes down, so have at least two synchronous standbys if you want good availability Yours, Laurenz Albe
Yes, in that case you can lose committed data. The normal way to avoid that is to use
synchronous replication. That is easy to set up, but has some implications:
- it will slow down data modifications considerably
- you should have a low network latency between the machines
- the whole system becomes unavailable if the standby goes down, so have at
least two synchronous standbys if you want good availability
Thanking all for the prompt replies and guidance for clarifying the doubts and queries.
Krishane