Re: Perform streaming logical transactions by background workers and parallel apply
От | Peter Smith |
---|---|
Тема | Re: Perform streaming logical transactions by background workers and parallel apply |
Дата | |
Msg-id | CAHut+PvA10Bp9Jaw9OS2+puKHr7ry_xB3Tf2-bbv5gyxD5E_gw@mail.gmail.com обсуждение исходный текст |
Ответ на | RE: Perform streaming logical transactions by background workers and parallel apply ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>) |
Ответы |
Re: Perform streaming logical transactions by background workers and parallel apply
(Amit Kapila <amit.kapila16@gmail.com>)
RE: Perform streaming logical transactions by background workers and parallel apply ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>) |
Список | pgsql-hackers |
Hi, here are some review comments for patch v78-0001. ====== General 1. (terminology) AFAIK everywhere until now we’ve been referring everywhere (docs/comments/code) to the parent apply worker as the "leader apply worker". Not the "main apply worker". Not the "apply leader worker". Not any other variations... From this POV I think the worker member "apply_leader_pid" would be better named "leader_apply_pid", but I see that this was already committed to HEAD differently. Maybe it is not possible (or you don't want) to change that internal member name but IMO at least all the new code and docs should try to be using consistent terminology (e.g. leader_apply_XXX) where possible. ====== Commit message 2. main_worker_pid is Process ID of the leader apply worker, if this process is a apply parallel worker. NULL if this process is a leader apply worker or a synchronization worker. IIUC, this text is just cut/paste from the monitoring.sgml. In a review comment below I suggest some changes to that text, so then this commit message should also change to be the same. ~~ 3. The new column can make it easier to distinguish leader apply worker and apply parallel worker which is also similar to the 'leader_pid' column in pg_stat_activity. SUGGESTION The new column makes it easier to distinguish parallel apply workers from other kinds of workers. It is implemented this way to be similar to the 'leader_pid' column in pg_stat_activity. ====== doc/src/sgml/logical-replication.sgml 4. + being synchronized. Moreover, if the streaming transaction is applied in + parallel, there will be additional workers. SUGGESTION there will be additional workers -> there may be additional parallel apply workers ====== doc/src/sgml/monitoring.sgml 5. pg_stat_subscription @@ -3198,11 +3198,22 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i <row> <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>apply_leader_pid</structfield> <type>integer</type> + </para> + <para> + Process ID of the leader apply worker, if this process is a apply + parallel worker. NULL if this process is a leader apply worker or a + synchronization worker. + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> <structfield>relid</structfield> <type>oid</type> </para> <para> OID of the relation that the worker is synchronizing; null for the - main apply worker + main apply worker and the parallel apply worker </para></entry> </row> 5a. (Same as general comment #1 about terminology) "apply_leader_pid" --> "leader_apply_pid" ~~ 5b. The current text feels awkward. I see it was copied from the similar text of 'pg_stat_activity' but perhaps it can be simplified a bit. SUGGESTION Process ID of the leader apply worker if this process is a parallel apply worker; otherwise NULL. ~~ 5c. BEFORE null for the main apply worker and the parallel apply worker AFTER null for the leader apply worker and parallel apply workers ~~ 5c. <structfield>relid</structfield> <type>oid</type> </para> <para> OID of the relation that the worker is synchronizing; null for the - main apply worker + main apply worker and the parallel apply worker </para></entry> main apply worker -> leader apply worker ~~~ 6. @@ -3212,7 +3223,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i </para> <para> Last write-ahead log location received, the initial value of - this field being 0 + this field being 0; null for the parallel apply worker </para></entry> </row> BEFORE null for the parallel apply worker AFTER null for parallel apply workers ~~~ 7. @@ -3221,7 +3232,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i <structfield>last_msg_send_time</structfield> <type>timestamp with time zone</type> </para> <para> - Send time of last message received from origin WAL sender + Send time of last message received from origin WAL sender; null for the + parallel apply worker </para></entry> </row> (same as #6) BEFORE null for the parallel apply worker AFTER null for parallel apply workers ~~~ 8. @@ -3230,7 +3242,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i <structfield>last_msg_receipt_time</structfield> <type>timestamp with time zone</type> </para> <para> - Receipt time of last message received from origin WAL sender + Receipt time of last message received from origin WAL sender; null for + the parallel apply worker </para></entry> </row> (same as #6) BEFORE null for the parallel apply worker AFTER null for parallel apply workers ~~~ 9. @@ -3239,7 +3252,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i <structfield>latest_end_lsn</structfield> <type>pg_lsn</type> </para> <para> - Last write-ahead log location reported to origin WAL sender + Last write-ahead log location reported to origin WAL sender; null for + the parallel apply worker </para></entry> </row> (same as #6) BEFORE null for the parallel apply worker AFTER null for parallel apply workers ~~~ 10. @@ -3249,7 +3263,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i </para> <para> Time of last write-ahead log location reported to origin WAL - sender + sender; null for the parallel apply worker </para></entry> </row> </tbody> (same as #6) BEFORE null for the parallel apply worker AFTER null for parallel apply workers ====== src/backend/catalog/system_views.sql 11. @@ -949,6 +949,7 @@ CREATE VIEW pg_stat_subscription AS su.oid AS subid, su.subname, st.pid, + st.apply_leader_pid, st.relid, st.received_lsn, st.last_msg_send_time, (Same as general comment #1 about terminology) "apply_leader_pid" --> "leader_apply_pid" ====== src/backend/replication/logical/launcher.c 12. + if (worker.apply_leader_pid == InvalidPid) nulls[3] = true; else - values[3] = LSNGetDatum(worker.last_lsn); - if (worker.last_send_time == 0) + values[3] = Int32GetDatum(worker.apply_leader_pid); + 12a. (Same as general comment #1 about terminology) "apply_leader_pid" --> "leader_apply_pid" ~~ 12b. I wondered if here the code should be using the isParallelApplyWorker(worker) macro here for readability. e.g. if (isParallelApplyWorker(worker)) values[3] = Int32GetDatum(worker.apply_leader_pid); else nulls[3] = true; ====== src/include/catalog/pg_proc.dat 13. + proallargtypes => '{oid,oid,oid,int4,int4,pg_lsn,timestamptz,timestamptz,pg_lsn,timestamptz}', + proargmodes => '{i,o,o,o,o,o,o,o,o,o}', + proargnames => '{subid,subid,relid,pid,apply_leader_pid,received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time}', (Same as general comment #1 about terminology) "apply_leader_pid" --> "leader_apply_pid" ====== src/test/regress/expected/rules.out 14. @@ -2094,6 +2094,7 @@ pg_stat_ssl| SELECT s.pid, pg_stat_subscription| SELECT su.oid AS subid, su.subname, st.pid, + st.apply_leader_pid, st.relid, st.received_lsn, st.last_msg_send_time, @@ -2101,7 +2102,7 @@ pg_stat_subscription| SELECT su.oid AS subid, st.latest_end_lsn, st.latest_end_time FROM (pg_subscription su - LEFT JOIN pg_stat_get_subscription(NULL::oid) st(subid, relid, pid, received_lsn, last_msg_send_time, last_msg_receipt_time, latest_end_lsn, latest_end_time) ON ((st.subid = su.oid))); + LEFT JOIN pg_stat_get_subscription(NULL::oid) st(subid, relid, pid, apply_leader_pid, received_lsn, last_msg_send_time, last_msg_receipt_time, latest_end_lsn, latest_end_time) ON ((st.subid = su.oid))); pg_stat_subscription_stats| SELECT ss.subid, s.subname, ss.apply_error_count, (Same comment as elsewhere) "apply_leader_pid" --> "leader_apply_pid" ------ Kind Regards, Peter Smith. Fujitsu Australia
В списке pgsql-hackers по дате отправления: