Re: Problem with synchronous replication
От | Dongming Liu |
---|---|
Тема | Re: Problem with synchronous replication |
Дата | |
Msg-id | b573dd70-a77a-4315-a918-3d51259bf704.lingce.ldm@alibaba-inc.com обсуждение исходный текст |
Ответ на | Problem with synchronous replication ("Dongming Liu" <lingce.ldm@alibaba-inc.com>) |
Ответы |
Re: Problem with synchronous replication
|
Список | pgsql-hackers |
Can someone help me to confirm that these two problems are bugs?
If they are bugs, please help review the patch or provide better fix suggestions.
Thanks.
Best regards,
--
Dongming Liu
------------------------------------------------------------------From:LIU Dongming <lingce.ldm@alibaba-inc.com>Sent At:2019 Oct. 25 (Fri.) 15:18To:pgsql-hackers <pgsql-hackers@postgresql.org>Subject:Problem with synchronous replicationHi,I recently discovered two possible bugs about synchronous replication.1. SyncRepCleanupAtProcExit may delete an element that has been deletedSyncRepCleanupAtProcExit first checks whether the queue is detached, if it is not detached,acquires the SyncRepLock lock and deletes it. If this element has been deleted by walsender,it will be deleted repeatedly, SHMQueueDelete will core with a segment fault.IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then checkwhether the queue is detached or not.2. SyncRepWaitForLSN may not call SyncRepCancelWait if ereport check one interrupt.For SyncRepWaitForLSN, if a query cancel interrupt arrives, we just terminate the waitwith suitable warning. As follows:a. set QueryCancelPending to falseb. errport outputs one warningc. calls SyncRepCancelWait to delete one element from the queueIf another cancel interrupt arrives when we are outputting warning at step b, the errfinishwill call CHECK_FOR_INTERRUPTS that will output an ERROR, such as "canceling autovacuumtask", then the process will jump to the sigsetjmp. Unfortunately, the step c will be skippedand the element that should be deleted by SyncRepCancelWait is remained.The easiest way to fix this is to swap the order of step b and step c. On the other hand,let sigsetjmp clean up the queue may also be a good choice. What do you think?Attached the patch, any feedback is greatly appreciated.Best regards,--Dongming Liu
В списке pgsql-hackers по дате отправления: