Re: Synchronizing slots from primary to standby
От | Masahiko Sawada |
---|---|
Тема | Re: Synchronizing slots from primary to standby |
Дата | |
Msg-id | CAD21AoD23jkCwvHkrGOyXRs5Ugh_SF0g0-aJpNTekCT0HHYcrg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Synchronizing slots from primary to standby (shveta malik <shveta.malik@gmail.com>) |
Ответы |
Re: Synchronizing slots from primary to standby
(shveta malik <shveta.malik@gmail.com>)
Re: Synchronizing slots from primary to standby (Amit Kapila <amit.kapila16@gmail.com>) |
Список | pgsql-hackers |
Hi, Thank you for working on this. On Tue, Dec 26, 2023 at 9:27 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Tue, Dec 26, 2023 at 4:41 PM Zhijie Hou (Fujitsu) > <houzj.fnst@fujitsu.com> wrote: > > > > On Wednesday, December 20, 2023 7:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Wed, Dec 20, 2023 at 3:29 PM shveta malik <shveta.malik@gmail.com> > > > wrote: > > > > > > > > On Wed, Dec 20, 2023 at 9:12 AM Amit Kapila <amit.kapila16@gmail.com> > > > wrote: > > > > > > > > > > On Tue, Dec 19, 2023 at 5:30 PM shveta malik <shveta.malik@gmail.com> > > > wrote: > > > > > > > > > > > > Thanks for reviewing. I have addressed these in v50. > > > > > > > > > > > > > > > > I was looking at this patch to see if something smaller could be > > > > > independently committable. I think we can extract > > > > > pg_get_slot_invalidation_cause() and commit it as that function > > > > > could be independently useful as well. What do you think? > > > > > > > > > > > > > Sure, forked another thread [1] > > > > [1]: > > > > > > > https://www.postgresql.org/message-id/CAJpy0uBpr0ym12%2B0mXpjcRFA6 > > > N%3D > > > > anX%2BYk9aGU4EJhHNu%3DfWykQ%40mail.gmail.com > > > > > > > > > > Thanks, thinking more, we can split the patch into the following three patches > > > which can be committed separately (a) Allowing the failover property to be set > > > for a slot via SQL API and subscription commands > > > (b) sync slot worker infrastructure (c) GUC standby_slot_names and the the > > > corresponding wait logic in server-side. > > > > > > Thoughts? > > > > I agree. Here is the V54 patch set which was split based on the suggestion. > > The commit message in each patch is also improved. > > > > I would like to revisit the current dependency of slotsync worker on > dbname used in 002 patch. Currently we accept dbname in > primary_conninfo and thus the user has to make sure to provide one (by > manually altering it) even in case of a conf file auto-generated by > "pg_basebackup -R". > Thus I would like to discuss if there are better ways to do it. > Complete background is as follow: > > We need dbname for 2 purposes: > > 1) to connect to remote db in order to run SELECT queries to fetch the > info needed by slotsync worker. > 2) to make connection in slot-sync worker itself in order to be able > to use libpq APIs for 1) > > We run 3 kind of select queries in slot-sync worker currently: > > a) To fetch all failover slots (logical slots) info at once in > synchronize_slots(). > b) To fetch a particular slot info during > wait_for_primary_slot_catchup() logic (logical slot). > c) To validate primary slot (physical one) and also to distinguish > between standby and cascading standby by running pg_is_in_recovery(). > > 1) One approach to avoid dependency on dbname is using commands > instead of SELECT. This will need implementing LIST_SLOTS command for > a), and for b) we can use LIST_SLOTS and fetch everything (even though > it is not needed) or have LIST_SLOTS with a filter on slot-name or > extend READ_REPLICATION_SLOT, and for c) we can have some other > command to get pg_is_in_recovery() info. But, I feel by relying on > commands we will be making the extension of the slot-sync feature > difficult. In future, if there is some more requirement to fetch any > other info, > then there too we have to implement a command. I am not sure if it is > good and extensible approach. > > 2) Another way to avoid asking for a dbname in primary_conninfo is to > use the default dbname internally. This brings us to two questions: > 'How' and 'Which default db'? > > 2.1) To answer 'How': > Using default dbname is simpler for the purpose of slot-sync worker > having its own db-connection, but is a little tricky for the purpose > of connection to remote_db. This is because we have to inject this > dbname internally in our connection-info. > > 2.1.1) Say we use primary_conninfo (i.e. original one w/o dbname), > then currently it could have 2 formats: > > a) The simple "=" format for key-value pairs, example: > 'user=replication host=127.0.0.1 port=5433 dbname=postgres'. > b) URI format, example: > postgresql://other@localhost/otherdb?connect_timeout=10&application_name=myapp > > We can distinguish between the 2 formats using 'uri_prefix_length' but > injecting the dbname part will be messy specially for URI format. If > we want to do it w/o injecting and only by changing libpq interfaces > to accept dbname separately apart from conninfo, then there is no > current simpler way available. It will need a good amount of changes > in libpq. > > 2.1.2) Another way is to not rely on primary_conninfo directly but > rely on 'WalRcv->conninfo' in order to connect to remote_db. This is > because the latter is never URI format, it is some parsed format and > appending may work. As an example, primary_conninfo = > 'postgresql://replication@localhost:5433', WalRcv->conninfo loaded > internally is: > "user=replication passfile=/home/shveta/.pgpass channel_binding=prefer > dbname=replication host=localhost port=5433 > fallback_application_name=walreceiver sslmode=prefer sslcompression=0 > sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2 > gssencmode=disable krbsrvname=postgres gssdelegation=0 > target_session_attrs=any load_balance_hosts=disable", '\000' > > So we can try appending our default dbname to this. But all the > defaults loaded in WalRcv->conninfo need some careful analysis to > figure out if they work for slot-sync worker case. > > 2.2) Now coming to 'Which default db': > > 2.2.1) If we use 'template1' as default db, it may block 'create db' > operations on primary for the time when the slot-sync worker is > connected to remote using this dbname. Example: > > postgres=# create database newdb1; > ERROR: source database "template1" is being accessed by other users > DETAIL: There is 1 other session using the database. > > 2.2.2) If we use 'postgres' as default db, there are chances that it > can be dropped as unlike 'template1', it is allowed to be dropped by > user, and if slotsync worker is connected to it, user may see: > newdb1=# drop database postgres; > ERROR: database "postgres" is being accessed by other users > DETAIL: There is 1 other session using the database. > > But once the slot-sync worker or standby goes down, user can always > drop this and next time slot-sync worker may not be able to come up. > Other random ideas for discussion are: 3) The slotsync worker uses primary_conninfo but also uses a new GUC parameter, say slot_sync_dbname, to specify the database to connect. The slot_sync_dbname overwrites the dbname if primary_conninfo also specifies it. If both don't have a dbname, raise an error. 4) The slotsync worker uses a new GUC parameter, say slot_sync_conninfo, to specify the connection string to the primary aside from primary_conninfo. And pg_basebackup -R generates slot_sync_conninfo as well if required (new option required). BTW given that the slotsync worker executes only normal SQL queries, is there any reason why it uses a replication connection? It's slightly odd to me that the pg_stat_replication view shows one entry that remains in the "startup" state. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
В списке pgsql-hackers по дате отправления:
Следующее
От: Japin LiДата:
Сообщение: Re: Tab complete for CREATE SUBSCRIPTION ... CONECTION does not work