I want to reactivate $subject. I took Petr Jelinek's patch from [0],
rebased it, added a bit of testing. It basically works, but as
mentioned in [0], there are various issues to work out.
The idea is that the standby runs a background worker to periodically
fetch replication slot information from the primary. On failover, a
logical subscriber would then ideally find up-to-date replication slots
on the new publisher and can just continue normally.
The previous thread didn't have a lot of discussion, but I have gathered
from off-line conversations that there is a wider agreement on this
approach. So the next steps would be to make it more robust and
configurable and documented. As I said, I added a small test case to
show that it works at all, but I think a lot more tests should be added.
I have also found that this breaks some seemingly unrelated tests in
the recovery test suite. I have disabled these here. I'm not sure if
the patch actually breaks anything or if these are just differences in
timing or implementation dependencies. This patch adds a LIST_SLOTS
replication command, but I think this could be replaced with just a
SELECT FROM pg_replication_slots query now. (This patch is originally
older than when you could run SELECT queries over the replication protocol.)
So, again, this isn't anywhere near ready, but there is already a lot
here to gather feedback about how it works, how it should work, how to
configure it, and how it fits into an overall replication and HA
architecture.
[0]:
https://www.postgresql.org/message-id/flat/3095349b-44d4-bf11-1b33-7eefb585d578%402ndquadrant.com