Re: Optimize LISTEN/NOTIFY
| От | Arseniy Mukhin | 
|---|---|
| Тема | Re: Optimize LISTEN/NOTIFY | 
| Дата | |
| Msg-id | CAE7r3MK-3AOdh1mpZ8hw9h6F_i0D5RMoAy7CttnfCJRpB8GJDA@mail.gmail.com обсуждение исходный текст  | 
		
| Ответ на | Re: Optimize LISTEN/NOTIFY (Chao Li <li.evan.chao@gmail.com>) | 
| Ответы | 
                	
            		Re: Optimize LISTEN/NOTIFY
            		
            		 | 
		
| Список | pgsql-hackers | 
Hi, On Thu, Oct 23, 2025 at 11:17 AM Chao Li <li.evan.chao@gmail.com> wrote: > > > > > On Oct 21, 2025, at 00:43, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote: > > > > > > I managed to reproduce the race with v20-alt3. I tried to write a TAP > > test reproducing the issue, so it was easier to validate changes. > > Please find the attached TAP test. I added it to some random package > > for simplicity. > > > > With alt3, as we have acquired the notification lock after reading every message to update the POS, I think we can do alittle bit more optimization: > > The notifier: in SignalBackend() > * Now we check if a listener’s pos equals to beforeWritePos, then we do “directly advancement” > * We can change to if a listener’s pos is between beforeWritePos and afterWritePos, then we can do the advancement. > > The listener: in asyncQueueReadAllNotifications(): > * With alt3, we only lock and update pos > * We can do more. If current pos in shared memory is after that local pos, then meaning some notifier has done an advancement,so it can stop reading. > I think this would be a reasonable optimization if it weren't for the race condition mentioned above. The problem is that if the local pos lags behind the shared memory pos, it could point to a truncated queue segment, so we shouldn't allow that. > I tried to run your TAP test on my MacBook, but failed: > > ``` > t/008_listen-pos-race.pl .. Dubious, test returned 32 (wstat 8192, 0x2000) > No subtests run > > Test Summary Report > ------------------- > t/008_listen-pos-race.pl (Wstat: 8192 (exited 32) Tests: 0 Failed: 0) > Non-zero exit status: 32 > Parse errors: No plan found in TAP output > Files=1, Tests=0, 3 wallclock secs ( 0.01 usr 0.01 sys + 0.10 cusr 0.29 csys = 0.41 CPU) > Result: FAIL > ``` > > I didn’t spend time debugging the problem. If you can figure the problem, maybe I can run the test from my side. > Thank you for trying the test. I think the test works for you as expected, it should fail with error and I have the same error status. Sorry, I failed to realize it could be confusing, probably it was better to fail on some assert instead, but I thought error is enough for temp reproducer. Please see 008_listen-pos-race_test.log for details. Best regards, Arseniy Mukhin
В списке pgsql-hackers по дате отправления: