Re: adding wait_start column to pg_locks
От | torikoshia |
---|---|
Тема | Re: adding wait_start column to pg_locks |
Дата | |
Msg-id | 62fe634bce475fb7e38e6fb3a2fff124@oss.nttdata.com обсуждение исходный текст |
Ответ на | Re: adding wait_start column to pg_locks (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Список | pgsql-hackers |
On 2021-02-16 16:59, Fujii Masao wrote: > On 2021/02/15 15:17, Fujii Masao wrote: >> >> >> On 2021/02/10 10:43, Fujii Masao wrote: >>> >>> >>> On 2021/02/09 23:31, torikoshia wrote: >>>> On 2021-02-09 22:54, Fujii Masao wrote: >>>>> On 2021/02/09 19:11, Fujii Masao wrote: >>>>>> >>>>>> >>>>>> On 2021/02/09 18:13, Fujii Masao wrote: >>>>>>> >>>>>>> >>>>>>> On 2021/02/09 17:48, torikoshia wrote: >>>>>>>> On 2021-02-05 18:49, Fujii Masao wrote: >>>>>>>>> On 2021/02/05 0:03, torikoshia wrote: >>>>>>>>>> On 2021-02-03 11:23, Fujii Masao wrote: >>>>>>>>>>>> 64-bit fetches are not atomic on some platforms. So spinlock >>>>>>>>>>>> is necessary when updating "waitStart" without holding the >>>>>>>>>>>> partition lock? Also GetLockStatusData() needs spinlock when >>>>>>>>>>>> reading "waitStart"? >>>>>>>>>>> >>>>>>>>>>> Also it might be worth thinking to use 64-bit atomic >>>>>>>>>>> operations like >>>>>>>>>>> pg_atomic_read_u64(), for that. >>>>>>>>>> >>>>>>>>>> Thanks for your suggestion and advice! >>>>>>>>>> >>>>>>>>>> In the attached patch I used pg_atomic_read_u64() and >>>>>>>>>> pg_atomic_write_u64(). >>>>>>>>>> >>>>>>>>>> waitStart is TimestampTz i.e., int64, but it seems >>>>>>>>>> pg_atomic_read_xxx and pg_atomic_write_xxx only supports >>>>>>>>>> unsigned int, so I cast the type. >>>>>>>>>> >>>>>>>>>> I may be using these functions not correctly, so if something >>>>>>>>>> is wrong, I would appreciate any comments. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> About the documentation, since your suggestion seems better >>>>>>>>>> than v6, I used it as is. >>>>>>>>> >>>>>>>>> Thanks for updating the patch! >>>>>>>>> >>>>>>>>> + if (pg_atomic_read_u64(&MyProc->waitStart) == 0) >>>>>>>>> + pg_atomic_write_u64(&MyProc->waitStart, >>>>>>>>> + >>>>>>>>> pg_atomic_read_u64((pg_atomic_uint64 *) &now)); >>>>>>>>> >>>>>>>>> pg_atomic_read_u64() is really necessary? I think that >>>>>>>>> "pg_atomic_write_u64(&MyProc->waitStart, now)" is enough. >>>>>>>>> >>>>>>>>> + deadlockStart = >>>>>>>>> get_timeout_start_time(DEADLOCK_TIMEOUT); >>>>>>>>> + pg_atomic_write_u64(&MyProc->waitStart, >>>>>>>>> + pg_atomic_read_u64((pg_atomic_uint64 *) >>>>>>>>> &deadlockStart)); >>>>>>>>> >>>>>>>>> Same as above. >>>>>>>>> >>>>>>>>> + /* >>>>>>>>> + * Record waitStart reusing the deadlock timeout >>>>>>>>> timer. >>>>>>>>> + * >>>>>>>>> + * It would be ideal this can be synchronously done >>>>>>>>> with updating >>>>>>>>> + * lock information. Howerver, since it gives >>>>>>>>> performance impacts >>>>>>>>> + * to hold partitionLock longer time, we do it here >>>>>>>>> asynchronously. >>>>>>>>> + */ >>>>>>>>> >>>>>>>>> IMO it's better to comment why we reuse the deadlock timeout >>>>>>>>> timer. >>>>>>>>> >>>>>>>>> proc->waitStatus = waitStatus; >>>>>>>>> + pg_atomic_init_u64(&MyProc->waitStart, 0); >>>>>>>>> >>>>>>>>> pg_atomic_write_u64() should be used instead? Because waitStart >>>>>>>>> can be >>>>>>>>> accessed concurrently there. >>>>>>>>> >>>>>>>>> I updated the patch and addressed the above review comments. >>>>>>>>> Patch attached. >>>>>>>>> Barring any objection, I will commit this version. >>>>>>>> >>>>>>>> Thanks for modifying the patch! >>>>>>>> I agree with your comments. >>>>>>>> >>>>>>>> BTW, I ran pgbench several times before and after applying >>>>>>>> this patch. >>>>>>>> >>>>>>>> The environment is virtual machine(CentOS 8), so this is >>>>>>>> just for reference, but there were no significant difference >>>>>>>> in latency or tps(both are below 1%). >>>>>>> >>>>>>> Thanks for the test! I pushed the patch. >>>>>> >>>>>> But I reverted the patch because buildfarm members rorqual and >>>>>> prion don't like the patch. I'm trying to investigate the cause >>>>>> of this failures. >>>>>> >>>>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2021-02-09%2009%3A20%3A10 >>>>> >>>>> - relation | locktype | mode >>>>> ------------------+----------+--------------------- >>>>> - test_prepared_1 | relation | RowExclusiveLock >>>>> - test_prepared_1 | relation | AccessExclusiveLock >>>>> -(2 rows) >>>>> - >>>>> +ERROR: invalid spinlock number: 0 >>>>> >>>>> "rorqual" reported that the above error happened in the server >>>>> built with >>>>> --disable-atomics --disable-spinlocks when reading pg_locks after >>>>> the transaction was prepared. The cause of this issue is that >>>>> "waitStart" >>>>> atomic variable in the dummy proc created at the end of prepare >>>>> transaction >>>>> was not initialized. I updated the patch so that >>>>> pg_atomic_init_u64() is >>>>> called for the "waitStart" in the dummy proc for prepared >>>>> transaction. >>>>> Patch attached. I confirmed that the patched server built with >>>>> --disable-atomics --disable-spinlocks passed all the regression >>>>> tests. >>>> >>>> Thanks for fixing the bug, I also tested v9.patch configured with >>>> --disable-atomics --disable-spinlocks on my environment and >>>> confirmed >>>> that all tests have passed. >>> >>> Thanks for the test! >>> >>> I found another bug in the patch. InitProcess() initializes >>> "waitStart", >>> but previously InitAuxiliaryProcess() did not. This could cause >>> "invalid >>> spinlock number" error when reading pg_locks in the standby server. >>> I fixed that. Attached is the updated version of the patch. >> >> I pushed this version. Thanks! > > While reading the patch again, I found two minor things. > > 1. As discussed in another thread [1], the atomic variable "waitStart" > should > be initialized at the postmaster startup rather than the startup of > each > child process. I changed "waitStart" so that it's initialized in > InitProcGlobal() and also reset to 0 by using pg_atomic_write_u64() > in > InitProcess() and InitAuxiliaryProcess(). > > 2. Thanks to the above change, InitProcGlobal() initializes "waitStart" > even in PGPROC entries for prepare transactions. But those entries > are > zeroed in MarkAsPreparingGuts(), so "waitStart" needs to be > initialized > again. Currently TwoPhaseGetDummyProc() initializes "waitStart" in > the > PGPROC entry for prepare transaction. But it's better to do that in > MarkAsPreparingGuts() instead because that function initializes other > PGPROC variables. So I moved that initialization code from > TwoPhaseGetDummyProc() to MarkAsPreparingGuts(). > > Patch attached. Thought? Thanks for updating the patch! It seems to me that the modification is right. I ran some regression tests but didn't find problems. Regards, -- Atsushi Torikoshi
В списке pgsql-hackers по дате отправления: