[MASSMAIL]CSN snapshots in hot standby
От | Heikki Linnakangas |
---|---|
Тема | [MASSMAIL]CSN snapshots in hot standby |
Дата | |
Msg-id | 08da26cc-95ef-4c0e-9573-8b930f80ce27@iki.fi обсуждение исходный текст |
Ответы |
Re: CSN snapshots in hot standby
|
Список | pgsql-hackers |
You cannot run queries on a Hot Standby server until the standby has seen a running-xacts record. Furthermore if the subxids cache had overflowed, you also need to wait for those transactions to finish. That is usually not a problem, because we write a running-xacts record after each checkpoint, and most systems don't use so many subtransactions that the cache would overflow. Still, you can run into it if you're unlucky, and it's annoying when you do. It occurred to me that we could replace the known-assigned-xids machinery with CSN snapshots. We've talked about CSN snapshots many times in the past, and I think it would make sense on the primary too, but for starters, we could use it just during Hot Standby. With CSN-based snapshots, you don't have the limitation with the fixed-size known-assigned-xids array, and overflowed sub-XIDs are not a problem either. You can always enter Hot Standby and start accepting queries as soon as the standby is in a physically consistent state. I dusted up and rebased the last CSN patch that I found on the mailing list [1], and modified it so that it's only used during recovery. That makes some things simpler and less scary. There are no changes to how transaction commit happens in the primary, the CSN log is only kept up-to-date in the standby, when commit/abort records are replayed. The CSN of each transaction is the LSN of its commit record. The CSN approach is much simpler than the existing known-assigned-XIDs machinery, as you can see from "git diff --stat" with this patch: 32 files changed, 773 insertions(+), 1711 deletions(-) With CSN snapshots, we don't need the known-assigned-XIDs machinery, and we can get rid of the xact-assignment records altogether. We no longer need the running-xacts records for Hot Standby either, but I wasn't able to remove that because it's still used by logical replication, in snapbuild.c. I have a feeling that that could somehow be simplified too, but didn't look into it. This is obviously v18 material, so I'll park this at the July commitfest for now. There are a bunch of little FIXMEs in the code, and needs performance testing, but overall I was surprised how easy this was. (We ran into this issue particularly hard with Neon, because with Neon you don't need to perform WAL replay at standby startup. However, when you don't perform WAL replay, you don't get to see the running-xact record after the checkpoint either. If the primary is idle, it doesn't generate new running-xact records, and the standby cannot start Hot Standby until the next time something happens in the primary. It's always a potential problem with overflowed sub-XIDs cache, but the lack of WAL replay made it happen even when there are no subtransactions involved.) [1] https://www.postgresql.org/message-id/2020081009525213277261%40highgo.ca -- Heikki Linnakangas Neon (https://neon.tech)
Вложения
В списке pgsql-hackers по дате отправления: