Re: pgsql: Add a view to show the stats of subscription workers.
От | Masahiko Sawada |
---|---|
Тема | Re: pgsql: Add a view to show the stats of subscription workers. |
Дата | |
Msg-id | CAD21AoChP5wOT2AYziF+-j7vvThF2NyAs7wr+yy+8hsnu=8Rgg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: pgsql: Add a view to show the stats of subscription workers. (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
Re: pgsql: Add a view to show the stats of subscription workers.
Re: pgsql: Add a view to show the stats of subscription workers. |
Список | pgsql-committers |
On Tue, Nov 30, 2021 at 3:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Nov 30, 2021 at 9:05 AM Amit Kapila <akapila@postgresql.org> wrote: > > > > Add a view to show the stats of subscription workers. > > > > sidewinder is showing failure in the test added by this patch: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2021-11-30%2005%3A05%3A25 > > I'll investigate and share my findings. I've looked at the issue. Here is my analysis: Looking at the result the test actually got, we had two error entries for test_tab1 instead of one: # Failed test 'check the error reported by the apply worker' # at t/026_worker_stats.pl line 33. # got: 'tap_sub|INSERT|test_tab1|t # tap_sub||test_tab1|t' # expected: 'tap_sub|INSERT|test_tab1|t' The possible scenarios are: The table sync worker for test_tab1 failed due to an error unrelated to apply changes: 2021-11-30 06:24:02.137 CET [18990:2] ERROR: replication origin with OID 2 is already active for PID 23706 At this time, the view had one error entry for the table sync worker. After retrying table sync, it succeeded: 2021-11-30 06:24:04.202 CET [28117:2] LOG: logical replication table synchronization worker for subscription "tap_sub", table "test_tab1" has finished Then after inserting a row on the publisher, the apply worker inserted the row but failed due to violating a unique key violation, which is expected: 2021-11-30 06:24:04.307 CET [4806:2] ERROR: duplicate key value violates unique constraint "test_tab1_pkey" 2021-11-30 06:24:04.307 CET [4806:3] DETAIL: Key (a)=(1) already exists. 2021-11-30 06:24:04.307 CET [4806:4] CONTEXT: processing remote data during "INSERT" for replication target relation "public.test_tab1" in transaction 721 at 2021-11-30 06:24:04.305096+01 As a result, we had two error entries for test_tab1: the table sync worker error and the apply worker error. I didn't expect that the table sync worker for test_tab1 failed due to "replication origin with OID 2 is already active for PID 23706” error. Looking at test_subscription_error() in 026_worker_stats.pl, we have two checks; in the first check, we wait for the view to show the error entry for the given relation name and xid. This check was passed since we had the second error (i.g., apply worker error). In the second check, we get error entries from pg_stat_subscription_workers by specifying only the relation name. Therefore, we ended up getting two entries and failed the tests. To fix this issue, I think that in the second check, we can get the error from pg_stat_subscription_workers by specifying the relation name *and* xid like the first check does. I've attached the patch. What do you think? FWIW, I'm still not investigating why "replication origin with OID 2 is already active for PID 23706" occurred. But regardless of the reason why this error occurred, the attached fix is necessary to make the tests robust. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Вложения
В списке pgsql-committers по дате отправления: