Re: RFC: Logging plan of the running query
От | torikoshia |
---|---|
Тема | Re: RFC: Logging plan of the running query |
Дата | |
Msg-id | 00dac323d6732d59ba2c03be729966dc@oss.nttdata.com обсуждение исходный текст |
Ответ на | Re: RFC: Logging plan of the running query (James Coleman <jtc331@gmail.com>) |
Ответы |
Re: RFC: Logging plan of the running query
(torikoshia <torikoshia@oss.nttdata.com>)
|
Список | pgsql-hackers |
On 2023-06-16 01:34, James Coleman wrote: > On Thu, Jun 15, 2023 at 9:00 AM torikoshia <torikoshia@oss.nttdata.com> > wrote: >> >> On 2023-06-15 01:48, James Coleman wrote: >> > On Tue, Jun 13, 2023 at 11:53 AM James Coleman <jtc331@gmail.com> >> > wrote: >> >> >> >> ... >> >> I'm going to re-run tests with my patch version + resetting the flag >> >> on SIGINT (and any other error condition) to be certain that the issue >> >> you uncovered (where backends get stuck after a SIGINT not responding >> >> to the requested plan logging) wasn't masking any other issues. >> >> >> >> As long as that run is clean also then I believe the patch is safe >> >> as-is even without the re-entrancy guard. >> >> >> >> I'll report back with the results of that testing. >> > >> > The tests have been running since last night, but have been apparently >> > hung now for many hours. I haven't been able to fully look into it, >> > but so far I know the hung (100% CPU) backend last logged this: >> > >> > 2023-06-14 02:00:30.045 UTC client backend[84461] >> > pg_regress/updatable_views LOG: query plan running on backend with >> > PID 84461 is: >> > Query Text: SELECT table_name, column_name, is_updatable >> > FROM information_schema.columns >> > WHERE table_name LIKE E'r_\\_view%' >> > ORDER BY table_name, ordinal_position; >> > >> > The last output from the regression test harness was: >> > >> > # parallel group (5 tests): index_including create_view >> > index_including_gist create_index create_index_spgist >> > ok 66 + create_index 36508 ms >> > ok 67 + create_index_spgist 38588 ms >> > ok 68 + create_view 1394 ms >> > ok 69 + index_including 654 ms >> > ok 70 + index_including_gist 1701 ms >> > # parallel group (16 tests): errors create_cast drop_if_exists >> > create_aggregate roleattributes constraints hash_func typed_table >> > infinite_recurse >> > >> > Attaching gdb to the hung backend shows this: >> > >> > #0 0x00005601ab1f9529 in ProcLockWakeup >> > (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 >> > <default_lockmethod>, lock=lock@entry=0x7f5325c913f0) at proc.c:1655 >> > #1 0x00005601ab1e99dc in CleanUpLock (lock=lock@entry=0x7f5325c913f0, >> > proclock=proclock@entry=0x7f5325d40d60, >> > lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 >> > <default_lockmethod>, >> > hashcode=hashcode@entry=573498161, wakeupNeeded=<optimized out>) >> > at lock.c:1673 >> > #2 0x00005601ab1e9e21 in LockRefindAndRelease >> > (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 >> > <default_lockmethod>, proc=<optimized out>, >> > locktag=locktag@entry=0x5601ac3d7998, lockmode=lockmode@entry=1, >> > >> > decrement_strong_lock_count=decrement_strong_lock_count@entry=false) >> > at lock.c:3150 >> > #3 0x00005601ab1edb27 in LockReleaseAll >> > (lockmethodid=lockmethodid@entry=1, allLocks=false) at lock.c:2295 >> > #4 0x00005601ab1f8599 in ProcReleaseLocks >> > (isCommit=isCommit@entry=true) at proc.c:781 >> > #5 0x00005601ab37f1f4 in ResourceOwnerReleaseInternal >> > (owner=<optimized out>, phase=phase@entry=RESOURCE_RELEASE_LOCKS, >> > isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at >> > resowner.c:618 >> > #6 0x00005601ab37f7b7 in ResourceOwnerRelease (owner=<optimized out>, >> > phase=phase@entry=RESOURCE_RELEASE_LOCKS, >> > isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at >> > resowner.c:494 >> > #7 0x00005601aaec1d84 in CommitTransaction () at xact.c:2334 >> > #8 0x00005601aaec2b22 in CommitTransactionCommand () at xact.c:3067 >> > #9 0x00005601ab200a66 in finish_xact_command () at postgres.c:2783 >> > #10 0x00005601ab20338f in exec_simple_query ( >> > query_string=query_string@entry=0x5601ac3b0858 "SELECT table_name, >> > column_name, is_updatable\n FROM information_schema.columns\n WHERE >> > table_name LIKE E'r_\\\\_view%'\n ORDER BY table_name, >> > ordinal_position;") at postgres.c:1300 >> > >> > I am unable to connect to the regression test Postgres instance -- >> > psql just hangs, so the lock seems to have affected the postmaster >> > also. >> > >> > I'm wondering if this might represent a bug in the current patch. >> >> Thanks for running and analyzing the test! > > Sure thing! > >> Could you share me how you are running the test? >> >> I imagined something like below, but currently couldn't reproduce it. >> - apply both v26-0001 and v27-0002 and build >> - run PostgreSQL with default GUCssaaa >> - make installcheck-world >> - run 'SELECT pg_log_query_plan(pid) FROM pg_stat_activity \watch 0.1' >> during make installcheck-world > > Apologies, I should have attached my updated patch (with the fix for > the bug you'd reporting with the re-entrancy guard). Attached is v28 > which sets ProcessLogQueryPlanInterruptActive to false in errfinish > when necessary. Once built with those two patches I'm simply running > `make check`. Thanks! However, I haven't succeeded in reproducing the problem as below: > The tests have been running since last night, but have been apparently > hung now for many hours. I haven't been able to fully look into it, > but so far I know the hung (100% CPU) backend last logged this: Did you do something like this? $ ./configure --prefix=/home/ubuntu/pgsql/master --enable-cassert $ git apply ../patch/v28-0001-Add-function-to-log-the-plan-of-the-query.patch ../patch/v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch $ make $ make check -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
В списке pgsql-hackers по дате отправления: