RE: Failed transaction statistics to measure the logical replication progress
От | osumi.takamichi@fujitsu.com |
---|---|
Тема | RE: Failed transaction statistics to measure the logical replication progress |
Дата | |
Msg-id | TYCPR01MB83739F01E478BED5EDECD0DFED3B9@TYCPR01MB8373.jpnprd01.prod.outlook.com обсуждение исходный текст |
Ответ на | RE: Failed transaction statistics to measure the logical replication progress ("tanghy.fnst@fujitsu.com" <tanghy.fnst@fujitsu.com>) |
Список | pgsql-hackers |
On Tuesday, February 22, 2022 10:15 AM Tang, Haiying/唐 海英 <tanghy.fnst@fujitsu.com> wrote: > On Mon, Feb 21, 2022 11:46 AM osumi.takamichi@fujitsu.com > <osumi.takamichi@fujitsu.com> wrote: > > > > On Saturday, February 19, 2022 12:00 AM osumi.takamichi@fujitsu.com > > <osumi.takamichi@fujitsu.com> wrote: > > > On Friday, February 18, 2022 3:34 PM Tang, Haiying/唐 海英 > > > <tanghy.fnst@fujitsu.com> wrote: > > > > On Wed, Jan 12, 2022 8:35 PM osumi.takamichi@fujitsu.com > > > > <osumi.takamichi@fujitsu.com> wrote: > > > > 4) I noticed that the abort_count doesn't include aborted > > > > streaming transactions. > > > > Should we take this case into consideration? > > > Hmm, we can add this into this column, when there's no objection. > > > I'm not sure but someone might say those should be separate columns. > > I've addressed this point in a new v23 patch, since there was no > > opinion on this so far. > > > > Kindly have a look at the attached one. > > > > Thanks for updating the patch. > > I found a problem when using it. When a replication workers exits, the > transaction stats should be sent to stats collector if they were not sent before > because it didn't reach PGSTAT_STAT_INTERVAL. But I saw that the stats > weren't updated as expected. > > I looked into it and found that the replication worker would send the transaction > stats (if any) before it exits. But it got invalid subid in > pgstat_send_subworker_xact_stats(), which led to the following result: > > postgres=# select pg_stat_get_subscription_worker(0, null); > pg_stat_get_subscription_worker > --------------------------------- > (0,,2,0,0,,,,0,"",) > (1 row) > > I think that's because subid has already been cleaned when trying to send the > stats. I printed the value of before_shmem_exit_list, the functions in this list > would be called in shmem_exit() when the worker exits. > logicalrep_worker_onexit() would clean up the worker info (including subid), > and > pgstat_shutdown_hook() would send stats if any. logicalrep_worker_onexit() > was called before calling pgstat_shutdown_hook(). > > (gdb) p before_shmem_exit_list > $1 = {{function = 0xa88f1e <pgstat_shutdown_hook>, arg = 0}, {function = > 0xb619e7 <BeforeShmemExit_Files>, arg = 0}, {function = 0xb07b5c > <ReplicationSlotShmemExit>, arg = 0}, { > function = 0xabdd93 <logicalrep_worker_onexit>, arg = 0}, {function = > 0xe30c89 <ShutdownPostgres>, arg = 0}, {function = 0x0, arg = 0} <repeats 15 > times>} > > Maybe we should make some modification to fix it. Thank you for letting me know this issue. I'll investigate this and will report the result. Best Regards, Takamichi Osumi
В списке pgsql-hackers по дате отправления: