Обсуждение: Running PostgreSQL in Kubernetes?
Dear Colleagues, Is anyone running PostgreSQL in Kubernetes? If you do, what solution do you prefer? What I've tried myself: 1. Running the official postgres:12.6 images in a StatefulSet. It works fine but there is neither failover nor replication. 2. Zalando's postgres-operator. Replication and failover work out of the box, but the documentation is not the best. Things like wal-g backup and PITR are probably possible, but you need to be an egghead to figure out how to make them work. Why, I cannot even figure out how to add a team after installing the postgres-operator from the official Helm repo. I'd be especially grateful if you shared your own personal experience with Kubernetes and PostgreSQL. -- Victor Sudakov VAS4-RIPE http://vas.tomsk.ru/ 2:5005/49@fidonet
Вложения
> Is anyone running PostgreSQL in Kubernetes? If you do, what solution do > you prefer? > > What I've tried myself: > > 1. Running the official postgres:12.6 images in a StatefulSet. It works > fine but there is neither failover nor replication. > > 2. Zalando's postgres-operator. Replication and failover work out of the > box, but the documentation is not the best. Things like wal-g backup and > PITR are probably possible, but you need to be an egghead to figure out > how to make them work. Why, I cannot even figure out how to add a > team after installing the postgres-operator from the official Helm repo. > > I'd be especially grateful if you shared your own personal experience > with Kubernetes and PostgreSQL. We're using CrunchyData's operator, which in turn uses Patroni & pgbackrest...
Scott Ribe wrote: > > Is anyone running PostgreSQL in Kubernetes? If you do, what solution do > > you prefer? > > > > What I've tried myself: > > > > 1. Running the official postgres:12.6 images in a StatefulSet. It works > > fine but there is neither failover nor replication. > > > > 2. Zalando's postgres-operator. Replication and failover work out of the > > box, but the documentation is not the best. Things like wal-g backup and > > PITR are probably possible, but you need to be an egghead to figure out > > how to make them work. Why, I cannot even figure out how to add a > > team after installing the postgres-operator from the official Helm repo. > > > > I'd be especially grateful if you shared your own personal experience > > with Kubernetes and PostgreSQL. > > We're using CrunchyData's operator, which in turn uses Patroni & pgbackrest... Hello Scott, Thank you for the hint! The CrunchyData's operator looks like a cleaner implementation than Zalando's. I'll continue testing. Have you been able to a) upload backups to S3 and b) perform PITR from S3 with CrunchyData? -- Victor Sudakov VAS4-RIPE http://vas.tomsk.ru/ 2:5005/49@fidonet
Вложения
> On Jul 27, 2021, at 12:57 AM, Victor Sudakov <vas@sibptus.ru> wrote: > > Have you been able to a) upload backups to S3 and b) perform PITR from > S3 with CrunchyData? I have never tried either. I have recovered from the backups, but never used PITR from the WAL archives.
Scott Ribe wrote: > > On Jul 27, 2021, at 12:57 AM, Victor Sudakov <vas@sibptus.ru> wrote: > > > > Have you been able to a) upload backups to S3 and b) perform PITR from > > S3 with CrunchyData? > > I have never tried either. > > I have recovered from the backups, Using the CrunchyData operator built-in backup service? > but never used PITR from the WAL archives. It's really very efficient and elegant with wal-g. wal-g fetches data from S3 in parallel and can prefetch. I've done PITR from S3 many times in a lab, and a couple of times on real data. But to use wal-g inside Kubernetes, I probably need Zalando's operator which I dislike. -- Victor Sudakov VAS4-RIPE http://vas.tomsk.ru/ 2:5005/49@fidonet
Вложения
> On Jul 27, 2021, at 10:41 AM, Victor Sudakov <vas@sibptus.ru> wrote: > >>> On Jul 27, 2021, at 12:57 AM, Victor Sudakov <vas@sibptus.ru> wrote: >>> >>> Have you been able to a) upload backups to S3 and b) perform PITR from >>> S3 with CrunchyData? >> >> I have never tried either. >> >> I have recovered from the backups, > > Using the CrunchyData operator built-in backup service? Using its built-in backup, then both using the built-in restore and dropping down to use pgbackrest directly to restore toa PG instance not managed by Crunchy/Patroni.
Greetings, * Victor Sudakov (vas@sibptus.ru) wrote: > Scott Ribe wrote: > > > On Jul 27, 2021, at 12:57 AM, Victor Sudakov <vas@sibptus.ru> wrote: > > > > > > Have you been able to a) upload backups to S3 and b) perform PITR from > > > S3 with CrunchyData? > > > > I have never tried either. > > > > I have recovered from the backups, > > Using the CrunchyData operator built-in backup service? > > > but never used PITR from the WAL archives. > > It's really very efficient and elegant with wal-g. wal-g fetches data > from S3 in parallel and can prefetch. > > I've done PITR from S3 many times in a lab, and a couple of times on > real data. But to use wal-g inside Kubernetes, I probably need > Zalando's operator which I dislike. The Crunchy operator uses pgbackrest, which also supports parallel restore and pre-fetching. Thanks, Stephen
Вложения
I am just asking in a different thread so as to not derail the original conversation.
What was the real need to have postgresql cluster running in kubernetes?
I have experience in apache mesos, and did some beginner tutorial in kubernetes, but in my case those were good for stateless services.
For dbs, we used to have dedicated vms for postgresql clusters on lvm and vmware. It was very rare that we lost a physical server beyond recovery, else most of the time, a reboot of the physical server would resolve the issue. If we were able to figure out the problem with the physical server, we would just vmotion(live migrate) the vm from one server to another and survive fine.
so a large tb sized db, did not really require a resync on server problems.
Now with k8s, every time the server sneezes, the pods would get spawned onto different servers that would result in a full resync unless volumes could be moved.
Since containers are now in a shared environment, and if it is mostly over committed, then tuning of various params of a instance would be totally different compared to what was it on a dedicated vm.
Noisy neighbour's, a typical heavy activity like a bot attack on some services, which do not touch the db that is on the same server will have a serious impact due to shortage of resources.
in our case dns was under huge stress due to constant bouncing of services and discovery compared to original monoliths but were not tuned to handle that amount of changes and suffered stale cache lookups. For apps it would be OK, as they implement circuit breakers , but intra pg setup for barman or logical replication or pgbackrest would suffer a longer outage ?
Lastly, shared resources resulting in a poor query plan, like slow vacuuming may degrade the db.
Now, I have 0 exp in kubernetes, but I tried to understand the basics and found most of then similar to apache mesos. My use case was dbs grow and they grow really fast so they cannot be the same as immutable containers , but idk. Like when in need of a increased memory, it was OK to have do that for a vm and reboot, but for a pod, there is a risk of deployment and moving the instance on another server ? Or else all the pods that server would get bounced?
The point of these queries is just to have a conversation and understanding the need to put dbs on k8s, not about making it about right or wrong.
Maybe just like ssd, hdd, postgresql community would come up with a cost param to generate plans if run on k8s vs dedicated servers :).
Feel free to ignore, it's perfectly fine.
Vijaykumar Jain wrote: > I am just asking in a different thread so as to not derail the original > conversation. > What was the real need to have postgresql cluster running in kubernetes? > I have experience in apache mesos, and did some beginner tutorial in > kubernetes, but in my case those were good for stateless services. > For dbs, we used to have dedicated vms for postgresql clusters on lvm and > vmware. It was very rare that we lost a physical server beyond recovery, I think you may want to run Postgres in Kubernetes when your Postgres clusters are cattle, not pets. For example, if you create and tear down many Postgres clusters daily for testing or development purposes. Like, a dedicated test Postgres DB for each commit in your CI/CD system. If your database clusters are pets, I see no reason to run them in Kubernetes. Maybe someone will prove me wrong and name some reasons. > > Now with k8s, every time the server sneezes, the pods would get spawned > onto different servers that would result in a full resync unless volumes > could be moved. Not actually. The core of both Zalando's and CrunchyData's operators is Patroni which relies on Postgres' own physical replication. As soon as the master Pod dies, a standby will be promoted by Patroni, the failover process is really quick. You should setup some kind of anti-affinity so that the leader and the standby Pods should not end up on the same node though. At least I *hope* the operators work that way. > Since containers are now in a shared environment, and if it is mostly over > committed, then tuning of various params of a instance would be totally > different compared to what was it on a dedicated vm. Your other questions about shared resources in a Kubernetes cluster are very interesting for me too. I hope someone can reply. I feel that the degree of control you have over your Postgres cluster when it's running on VMs (EC2 instances etc) cannot be achieved in Kubernetes. [dd] > > Maybe just like ssd, hdd, postgresql community would come up with a cost > param to generate plans if run on k8s vs dedicated servers :). I think Postgres itself can neither know no care if it's running in a Pod. What tunables or cost params can you think of? -- Victor Sudakov VAS4-RIPE http://vas.tomsk.ru/ 2:5005/49@fidonet
Вложения
Yes, in today's rich world pets and cattles have no definition :)
When I was working, we had 120+ pg clusters per env (all puppet managed, fdw, shards, multiple replicas, LR , PITR, and more) of size varying from 2GB to 1.5TB, and none were use and throw.
But I get your point, if we have many pg nodes given per app one db kind-of design, we need some kind of automation to scale to that level, and given k8s marketing and sidecar systems, I appreciate that opinion.
And it seems k8s can handle persistent storage based design well, much better than apache mesos.
Ofcourse, my exp in postgres dba role is less than 2 yrs, so I ask too many questions :), as I was mostly exposed to stateless services on container based environments. But what do I have to lose by asking:)
My point of concern was how pgs were tuned for heavy workloads in a shared environment. We used to tune kernel params based on typical workload requirement.
Autoscaling for pg is not the same as stateless systems. a connection bump requires a restart(Yes pgbounver helps but when apps autoscale, they hammer db hard), that restart has to be orchestrated in such a way that cluster lives or else the nodes shut down due to discrepancies in param values between primary and replica.
But since crunchy and Zalando both have operators, I think I should learn to do a deploy them in mini kube kind of a setup to play with my concerns.
Anyways, thanks for answering. That helped.
> \On Jul 28, 2021, at 11:18 PM, Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote: > > What was the real need to have postgresql cluster running in kubernetes? When you have everything else running in K8s, it's awkward to keep around a cluster of VMs just for your db--and runningon dedicated hardware has its own tradeoffs. > Now with k8s, every time the server sneezes, the pods would get spawned onto different servers that would result in a fullresync unless volumes could be moved. You either have shared network volumes (persistent volume claims in K8s terminology), in which case the migrated server re-mountsthe same volume. Or you can use local storage in which case the servers are bound to specific nodes with that storageand don't migrate (you have to manage this a bit manually, and it's a tradeoff for likely higher-performing storage). Also, what makes you think the server will "sneeze" often? I cannot remember the last time postgres quit unexpectedly. > Since containers are now in a shared environment, and if it is mostly over committed, then tuning of various params ofa instance would be totally different compared to what was it on a dedicated vm. We don't find params to be different, but we are not really over-committed. > Noisy neighbour's, a typical heavy activity like a bot attack on some services, which do not touch the db that is on thesame server will have a serious impact due to shortage of resources. This is not really different than VMs. You either are able to manage this reasonably, or you need dedicated hardware. > in our case dns was under huge stress due to constant bouncing of services and discovery compared to original monolithsbut were not tuned to handle that amount of changes and suffered stale cache lookups. For apps it would be OK, asthey implement circuit breakers , but intra pg setup for barman or logical replication or pgbackrest would suffer a longeroutage ? You needed to fix your services. If your DNS is overloaded because your apps are moving so much, then something is terribly,terribly wrong. Any way, your postgres instances certainly should not be moving, so stale lookups should not bea problem, even in such a circumstance. > Lastly, shared resources resulting in a poor query plan, like slow vacuuming may degrade the db. Shared resources can slow things down, but I have no experience of that affecting what the appropriate query plan shouldbe. > Now, I have 0 exp in kubernetes, but I tried to understand the basics and found most of then similar to apache mesos. Myuse case was dbs grow and they grow really fast so they cannot be the same as immutable containers , but idk. Like whenin need of a increased memory, it was OK to have do that for a vm and reboot, but for a pod, there is a risk of deploymentand moving the instance on another server ? Or else all the pods that server would get bounced? As discussed above, movement of pods is not the problem you think it is. Network volumes, no problem moving; local storage,can't move.
Top posting, inline editing on phone is something I need to learn.
I was not aware complete infra is now k8 managed. I have missed the k8 bus for sure.
Ofcourse we had problematic services (20 yrs abd still running) and worst, there was little monitoring around it, it's just the overloaded services exposed those problems. Hopefully it is now more stable via upgrading network devices and server patches. But I read ec2 instances crash a lot, maybe I generalized some blogs.
-- movement is not required.
That makes it perfect, if they do not bounce on to server glitches. We had old servers, old raid controllers and we had to no control on where the dbs would be deployed old, or new. hence we used vmware based migration for live movement.
We did not really have split brain, but zombie instances.
The instances used to detach from the service registry due to network outage, so new instances used to spin up to maintain that number of instance states in the registry, but old instance could not commit suicide or kill itself, making it available for connections from some instances. It's when we introduced replication lag as a haproxy layer health check, we were able to get rid of that problem.
Any customized setup will require baby sitting initially, but can be automated later once stable.
But it looks like dba roles are old now. I'll have to upgrade myself to k8s to be surviving.
On Thu, Jul 29, 2021, 6:17 PM Scott Ribe <scott_ribe@elevated-dev.com> wrote:
> \On Jul 28, 2021, at 11:18 PM, Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:
>
> What was the real need to have postgresql cluster running in kubernetes?
When you have everything else running in K8s, it's awkward to keep around a cluster of VMs just for your db--and running on dedicated hardware has its own tradeoffs.
> Now with k8s, every time the server sneezes, the pods would get spawned onto different servers that would result in a full resync unless volumes could be moved.
You either have shared network volumes (persistent volume claims in K8s terminology), in which case the migrated server re-mounts the same volume. Or you can use local storage in which case the servers are bound to specific nodes with that storage and don't migrate (you have to manage this a bit manually, and it's a tradeoff for likely higher-performing storage).
Also, what makes you think the server will "sneeze" often? I cannot remember the last time postgres quit unexpectedly.
> Since containers are now in a shared environment, and if it is mostly over committed, then tuning of various params of a instance would be totally different compared to what was it on a dedicated vm.
We don't find params to be different, but we are not really over-committed.
> Noisy neighbour's, a typical heavy activity like a bot attack on some services, which do not touch the db that is on the same server will have a serious impact due to shortage of resources.
This is not really different than VMs. You either are able to manage this reasonably, or you need dedicated hardware.
> in our case dns was under huge stress due to constant bouncing of services and discovery compared to original monoliths but were not tuned to handle that amount of changes and suffered stale cache lookups. For apps it would be OK, as they implement circuit breakers , but intra pg setup for barman or logical replication or pgbackrest would suffer a longer outage ?
You needed to fix your services. If your DNS is overloaded because your apps are moving so much, then something is terribly, terribly wrong. Any way, your postgres instances certainly should not be moving, so stale lookups should not be a problem, even in such a circumstance.
> Lastly, shared resources resulting in a poor query plan, like slow vacuuming may degrade the db.
Shared resources can slow things down, but I have no experience of that affecting what the appropriate query plan should be.
> Now, I have 0 exp in kubernetes, but I tried to understand the basics and found most of then similar to apache mesos. My use case was dbs grow and they grow really fast so they cannot be the same as immutable containers , but idk. Like when in need of a increased memory, it was OK to have do that for a vm and reboot, but for a pod, there is a risk of deployment and moving the instance on another server ? Or else all the pods that server would get bounced?
As discussed above, movement of pods is not the problem you think it is. Network volumes, no problem moving; local storage, can't move.