Обсуждение: Would you ever recommend Shared Disk Failover for HA?

Поиск
Список
Период
Сортировка

Would you ever recommend Shared Disk Failover for HA?

От
"norbert poellmann"
Дата:
Admins,

https://www.postgresql.org/docs/current/different-replication-solutions.html
is listing a shared disk solution for HA.

It also mentions, "that the standby server should never access the shared storage while the primary server is
running."

In a datacenter, where we have postgresql servers running on vmware VMs, the shared disk configuration sounds like an
appealingsolution: simple configuration, single server at a given time, simple fail-over, fully non-or-nothing write
mechanics,no hazzle with replication_slots during/after failover, etc...
 

But "Attempts to use PostgreSQL in multi-master shared storage configurations will result in extremely severe data
corruption"(https://wiki.postgresql.org/wiki/Shared_Storage).
 



So it seems to me, getting the comfort of a single server solution, which, in a failover, gets replaced by another
singleserver, is paid by getting the low risk of high damage.
 

I know of the provisions of fencing, STONITH, etc. - but in practise, what is a robust solution? 

For example: How can I STONITH a node while having network problems? 
Whithout reaching the host, I cannot shoot it, nor shut it. 
I also cannot wait for it get visible in the network again: another client might have interfered and commited a
transactionupon the retired master server, faster than I can do some fencing/stonith or whatever.
 


Would you share your opinions or practical business experiences on this topic?

Thanks a lot

cheers

Norbert Poellmann

--
Norbert Poellmann EDV-Beratung             email  : np@ibu.de
Severinstrasse 5                           telefon: 089 38469995  
81541 Muenchen, Germany                    telefon: 0179 2133436




Re: Would you ever recommend Shared Disk Failover for HA?

От
Ron Johnson
Дата:
On Thu, Feb 22, 2024 at 2:35 PM norbert poellmann <np@ibu.de> wrote:
Admins,

https://www.postgresql.org/docs/current/different-replication-solutions.html
is listing a shared disk solution for HA.

It also mentions, "that the standby server should never access the shared storage while the primary server is running."

In a datacenter, where we have postgresql servers running on vmware VMs, the shared disk configuration sounds like an appealing solution: simple configuration, single server at a given time, simple fail-over, fully non-or-nothing write mechanics, no hazzle with replication_slots during/after failover, etc...

But "Attempts to use PostgreSQL in multi-master shared storage configurations will result in extremely severe data corruption" (https://wiki.postgresql.org/wiki/Shared_Storage).

Our DB servers are also VMware VMs, with the disks managed by VMware, too.  If a blade dies, the VM automatically restarts on a different blade.  (Heck, ESX might automagically migrate it with no downtime. I've never *known* this to happen, but I don't have access to the VMware console; they just stay up for months, getting migrated around as necessary for load management.)

Re: Would you ever recommend Shared Disk Failover for HA?

От
Laurenz Albe
Дата:
On Thu, 2024-02-22 at 20:34 +0100, norbert poellmann wrote:
> https://www.postgresql.org/docs/current/different-replication-solutions.html
> is listing a shared disk solution for HA.
>
> It also mentions, "that the standby server should never access the shared storage
> while the primary server is running."
>
> In a datacenter, where we have postgresql servers running on vmware VMs, the
> shared disk configuration sounds like an appealing solution
>
> But [...]
>
> So it seems to me, getting the comfort of a single server solution, which, in a
> failover, gets replaced by another single server, is paid by getting the low risk
> of high damage.
>
> I know of the provisions of fencing, STONITH, etc. - but in practise, what is a robust solution?
>
> For example: How can I STONITH a node while having network problems?
> Whithout reaching the host, I cannot shoot it, nor shut it.
>
> Would you share your opinions or practical business experiences on this topic?

Back in the old days, we had special hardware devices for STONITH.

Anyway, my personal experience with a shared disk setup is a bad one.

Imagine two nodes, redundantly attached to disks mirrored across data
centers with fibrechannel.  No single point of failure, right?
Well, one day one of the fibrechannel cables had intermittent failures,
which led to a corrupted file system.
So we ended up with a currupted file system, nicely mirrored across
data centers.  We had to restore the 3TB database from backup.

Yours,
Laurenz Albe



Re: Would you ever recommend Shared Disk Failover for HA?

От
Ron Johnson
Дата:
On Fri, Feb 23, 2024 at 2:02 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
[snip] 

Anyway, my personal experience with a shared disk setup is a bad one.

Imagine two nodes, redundantly attached to disks mirrored across data
centers with fibrechannel.  No single point of failure, right?
Well, one day one of the fibrechannel cables had intermittent failures,
which led to a corrupted file system.
So we ended up with a currupted file system, nicely mirrored across
data centers.  We had to restore the 3TB database from backup.

1. Sounds like both nodes were turned on. 
2. Couldn't this happen in /any/ SAN with redundant cabling?


Re: Would you ever recommend Shared Disk Failover for HA?

От
vignesh kumar
Дата:
With the knowledge and design architecture of Postgres engine. It is not a redundant solution to have a shared storage for a huge database cluster. Here is why?

  • When a cluster directory is initiated - It has checksums based on the system and the WAL block size.
  • so the database engine shared object files and its related binary system call will make an attempt to write to disk or read to disk.
    • so this has the file lock enabled on the data cluster.
      • now imagine that you have a shared storage.
      • since more than one postmaster process trying to write to it. will lead to file buffer ( conflicts )
        • you will end up with inode corruption
  • Ideally by design it is not possible to have them enabled.
    • that is the reason is designed in such a way to hold the data cluster directory implicit.
  • Now consider the below scenario.
    • Lets assume we have a shared storage space.
      • INODE buffer will try to page write on cluster database directory 
        • Say allocated MEMBUFF Address [x000Fxxxxx] - some address in memory.
        • This will be refered before writing to the disk. as it will first map the inode index to flush the data.
      • So when another postmaster process attempts to write to the file system
        • It will over ride the MEMBUFF 
        • leading to page corruption / conflict.

This case would be more worse to repair the corrupted data at page level.
  • doing stacktrace
  • fetch the disk sector which is affected.
  • fetch all systemcalls affecting the data page
    • make the page invisible
    • fix / remove the data page

Hence it doesnt have it by default.

Just sharing my thoughts. hope this helps you

Thanks & Regards,
Viggu


From: Laurenz Albe <laurenz.albe@cybertec.at>
Sent: Friday, February 23, 2024 12:32 PM
To: norbert poellmann <np@ibu.de>; pgsql-admin@lists.postgresql.org <pgsql-admin@lists.postgresql.org>
Subject: Re: Would you ever recommend Shared Disk Failover for HA?
 
On Thu, 2024-02-22 at 20:34 +0100, norbert poellmann wrote:
> https://www.postgresql.org/docs/current/different-replication-solutions.html
> is listing a shared disk solution for HA.
>
> It also mentions, "that the standby server should never access the shared storage
> while the primary server is running."
>
> In a datacenter, where we have postgresql servers running on vmware VMs, the
> shared disk configuration sounds like an appealing solution
>
> But [...]
>
> So it seems to me, getting the comfort of a single server solution, which, in a
> failover, gets replaced by another single server, is paid by getting the low risk
> of high damage.
>
> I know of the provisions of fencing, STONITH, etc. - but in practise, what is a robust solution?
>
> For example: How can I STONITH a node while having network problems?
> Whithout reaching the host, I cannot shoot it, nor shut it.
>
> Would you share your opinions or practical business experiences on this topic?

Back in the old days, we had special hardware devices for STONITH.

Anyway, my personal experience with a shared disk setup is a bad one.

Imagine two nodes, redundantly attached to disks mirrored across data
centers with fibrechannel.  No single point of failure, right?
Well, one day one of the fibrechannel cables had intermittent failures,
which led to a corrupted file system.
So we ended up with a currupted file system, nicely mirrored across
data centers.  We had to restore the 3TB database from backup.

Yours,
Laurenz Albe


Re: Would you ever recommend Shared Disk Failover for HA?

От
Laurenz Albe
Дата:
On Fri, 2024-02-23 at 02:21 -0500, Ron Johnson wrote:
> On Fri, Feb 23, 2024 at 2:02 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> [snip] 
> >
> > Anyway, my personal experience with a shared disk setup is a bad one.
> >
> > Imagine two nodes, redundantly attached to disks mirrored across data
> > centers with fibrechannel.  No single point of failure, right?
> > Well, one day one of the fibrechannel cables had intermittent failures,
> > which led to a corrupted file system.
> > So we ended up with a currupted file system, nicely mirrored across
> > data centers.  We had to restore the 3TB database from backup.
>
> 1. Sounds like both nodes were turned on.

Possible, but I think that's not relevant.

> 2. Couldn't this happen in /any/ SAN with redundant cabling?

I'd say yes, and the redundant cabling is irrelevant too - I just
wanted to emphasize that hardware redundancy is not enough.

The single point of failure was the file system.

Yours,
Laurenz Albe



Re: Would you ever recommend Shared Disk Failover for HA?

От
Stephen Frost
Дата:
Greetings,

* norbert poellmann (np@ibu.de) wrote:
> https://www.postgresql.org/docs/current/different-replication-solutions.html
> is listing a shared disk solution for HA.

Yeah.  Frankly, it's bad advice and we should remove it.  "Rapid
failover" is a bit laughable compared to replication when you consider
that crash recovery can take a very, very long time (depending on how
much outstanding WAL has been written since the last checkpoint but with
extended checkpoints and single-process WAL replay, crash recovery could
be on the order of hours ...) and promoting an online replica takes only
moments.

Ditto for block-based replication.

Probably should talk about WAL shipping more as "Physical Replication".

At the least, physical replication should really be listed first and
then logical replication, perhaps even in a distinct "included as part
of PostgreSQL" section with everything else pushed down to "some other
things exist that you could try"...

Thanks,

Stephen

Вложения