Re: speed up a logical replica setup
От | Shubham Khanna |
---|---|
Тема | Re: speed up a logical replica setup |
Дата | |
Msg-id | CAHv8RjK35FQ7UJ6=Xf12aN8aAaaHRadiMRvCMp5+QKBVeSb+TQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: speed up a logical replica setup ("Euler Taveira" <euler@eulerto.com>) |
Ответы |
RE: speed up a logical replica setup
("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
|
Список | pgsql-hackers |
On Wed, Feb 7, 2024 at 10:24 AM Euler Taveira <euler@eulerto.com> wrote: > > On Fri, Feb 2, 2024, at 6:41 AM, Hayato Kuroda (Fujitsu) wrote: > > Thanks for updating the patch! > > > Thanks for taking a look. > > > > I'm still working on the data structures to group options. I don't like the way > it was grouped in v13-0005. There is too many levels to reach database name. > The setup_subscriber() function requires the 3 data structures. > > > > Right, your refactoring looks fewer stack. So I pause to revise my refactoring > patch. > > > I didn't complete this task yet so I didn't include it in this patch. > > > > The documentation update is almost there. I will include the modifications in > the next patch. > > > > OK. I think it should be modified before native speakers will attend to the > thread. > > > Same for this one. > > > > Regarding v13-0004, it seems a good UI that's why I wrote a comment about it. > However, it comes with a restriction that requires a similar HBA rule for both > regular and replication connections. Is it an acceptable restriction? We might > paint ourselves into the corner. A reasonable proposal is not to remove this > option. Instead, it should be optional. If it is not provided, primary_conninfo > is used. > > > > I didn't have such a point of view. However, it is not related whether -P exists > or not. Even v14-0001 requires primary to accept both normal/replication connections. > If we want to avoid it, the connection from pg_createsubscriber can be restored > to replication-connection. > (I felt we do not have to use replication protocol even if we change the connection mode) > > > That's correct. We made a decision to use non physical replication connections > (besides the one used to connect primary <-> standby). Hence, my concern about > a HBA rule falls apart. I'm not convinced that using a modified > primary_conninfo is the only/right answer to fill the subscription connection > string. Physical vs logical replication has different requirements when we talk > about users. The physical replication requires only the REPLICATION privilege. > On the other hand, to create a subscription you must have the privileges of > pg_create_subscription role and also CREATE privilege on the specified > database. Unless, you are always recommending the superuser for this tool, I'm > afraid there will be cases that reusing primary_conninfo will be an issue. The > more I think about this UI, the more I think that, if it is not hundreds of > lines of code, it uses the primary_conninfo the -P is not specified. > > The motivation why -P is not needed is to ensure the consistency of nodes. > pg_createsubscriber assumes that the -P option can connect to the upstream node, > but no one checks it. Parsing two connection strings may be a solution but be > confusing. E.g., what if some options are different? > I think using a same parameter is a simplest solution. > > > Ugh. An error will occur the first time (get_primary_sysid) it tries to connect > to primary. > > I found that no one refers the name of temporary slot. Can we remove the variable? > > > It is gone. I did a refactor in the create_logical_replication_slot function. > Slot name is assigned internally (no need for slot_name or temp_replslot) and > temporary parameter is included. > > Initialization by `CreateSubscriberOptions opt = {0};` seems enough. > All values are set to 0x0. > > > It is. However, I keep the assignments for 2 reasons: (a) there might be > parameters whose default value is not zero, (b) the standard does not say that > a null pointer must be represented by zero and (c) there is no harm in being > paranoid during initial assignment. > > You said "target server must be a standby" in [1], but I cannot find checks for it. > IIUC, there are two approaches: > a) check the existence "standby.signal" in the data directory > b) call an SQL function "pg_is_in_recovery" > > > I applied v16-0004 that implements option (b). > > I still think they can be combined as "bindir". > > > I applied a patch that has a single variable for BINDIR. > > WriteRecoveryConfig() writes GUC parameters to postgresql.auto.conf, but not > sure it is good. These settings would remain on new subscriber even after the > pg_createsubscriber. Can we avoid it? I come up with passing these parameters > via pg_ctl -o option, but it requires parsing output from GenerateRecoveryConfig() > (all GUCs must be allign like "-c XXX -c XXX -c XXX..."). > > > I applied a modified version of v16-0006. > > Functions arguments should not be struct because they are passing by value. > They should be a pointer. Or, for modify_subscriber_sysid and wait_for_end_recovery, > we can pass a value which would be really used. > > > Done. > > 07. > ``` > static char *get_base_conninfo(char *conninfo, char *dbname, > const char *noderole); > ``` > > Not sure noderole should be passed here. It is used only for the logging. > Can we output string before calling the function? > (The parameter is not needed anymore if -P is removed) > > > Done. > > 08. > The terminology is still not consistent. Some functions call the target as standby, > but others call it as subscriber. > > > The terminology should reflect the actual server role. I'm calling it "standby" > if it is a physical replica and "subscriber" if it is a logical replica. Maybe > "standby" isn't clear enough. > > 09. > v14 does not work if the standby server has already been set recovery_target* > options. PSA the reproducer. I considered two approaches: > > a) raise an ERROR when these parameter were set. check_subscriber() can do it > b) overwrite these GUCs as empty strings. > > > I prefer (b) that's exactly what you provided in v16-0006. > > 10. > The execution always fails if users execute --dry-run just before. Because > pg_createsubscriber stops the standby anyway. Doing dry run first is quite normal > use-case, so current implementation seems not user-friendly. How should we fix? > Below bullets are my idea: > > a) avoid stopping the standby in case of dry_run: seems possible. > b) accept even if the standby is stopped: seems possible. > c) start the standby at the end of run: how arguments like pg_ctl -l should be specified? > > > I prefer (a). I applied a slightly modified version of v16-0005. > > This new patch contains the following changes: > > * check whether the target is really a standby server (0004) > * refactor: pg_create_logical_replication_slot function > * use a single variable for pg_ctl and pg_resetwal directory > * avoid recovery errors applying default settings for some GUCs (0006) > * don't stop/start the standby in dry run mode (0005) > * miscellaneous fixes > > I don't understand why v16-0002 is required. In a previous version, this patch > was using connections in logical replication mode. After some discussion we > decided to change it to regular connections and use SQL functions (instead of > replication commands). Is it a requirement for v16-0003? > > I started reviewing v16-0007 but didn't finish yet. The general idea is ok. > However, I'm still worried about preventing some use cases if it provides only > the local connection option. What if you want to keep monitoring this instance > while the transformation is happening? Let's say it has a backlog that will > take some time to apply. Unless, you have a local agent, you have no data about > this server until pg_createsubscriber terminates. Even the local agent might > not connect to the server unless you use the current port. I tried verifying few scenarios by using 5 databases and came across the following errors: ./pg_createsubscriber -D ../new_standby -P "host=localhost port=5432 dbname=postgres" -S "host=localhost port=9000 dbname=postgres" -d db1 -d db2 -d db3 -d db4 -d db5 pg_createsubscriber: error: publisher requires 6 wal sender processes, but only 5 remain pg_createsubscriber: hint: Consider increasing max_wal_senders to at least 7. It is successful only with 7 wal senders, so we can change error messages accordingly. pg_createsubscriber: error: publisher requires 6 replication slots, but only 5 remain pg_createsubscriber: hint: Consider increasing max_replication_slots to at least 7. It is successful only with 7 replication slots, so we can change error messages accordingly. Thanks and Regards, Shubham Khanna,
В списке pgsql-hackers по дате отправления: