[BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24
От | marko@joh.to |
---|---|
Тема | [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24 |
Дата | |
Msg-id | 20170926182935.14128.65278@wrigleys.postgresql.org обсуждение исходный текст |
Ответы |
Re: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24
Re: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24 |
Список | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 14830 Logged by: Marko Tiikkaja Email address: marko@joh.to PostgreSQL version: Unsupported/Unknown Operating system: Ubuntu 14.04 Description: Hey, I understand this is not much information to go on (but the problem is extremely difficult to reproduce), and that 9.1 is technically out of support (but I don't think the relevant code has changed significantly, either), so I fully expect that nobody will be able to figure out what's wrong based on that. But I thought I'd post anyway. For the past two days I've been tracking down a bug where it would appear that some NOTIFications are simply lost. Then a minute later when the notification is resent by a different transaction, it comes through just fine. We have a single program connected to the database all the time, which LISTENs on around 800 channels and delivers the notifications to its own clients. The problem seems to only start happening, or perhaps gets worse the longer this application is connected to the database. I'm attaching two excerpts from the strace which, if I'm reading this correctly, would suggest that there's a bug in postgres here. Here's how I read this: 1) In strace2.txt, the send on line #1 corresponds to 28:3486 in strace1.txt. I know this because notification payloads on that channel are unique. 2) In strace2.txt, on line #5 something slightly out of the ordinary happens. We have around 75 semop calls compared to 5400 semop calls in the full strace, so no biggie, but perhaps noteworthy. Contention with another backend, perhaps. 3) The send on line #6 seems to correspond to 28:3600 in strace1.txt. 4) Then here's where the problemseems to occur: the next send, on line 25, corresponds to 28:4458 in strace1.txt. Within that ~850 bytes that the sending backend seemingly jumped over, we have multiple notifications on channels we know the backend was listening on. That's including a notification on channel "workerid48101842", which is the one our application was desperately missing in this case. PostgreSQL's logs and the state of the database indicate that at least the transaction which wrote the "workerid48101842" notification committed, and I have no reason to believe that any of the other ones near it did not commit. So.. any ideas? Unfortunately I can't reproduce this in an isolated environment, and in production this seems to be taking some time before it builds up into a proper issue. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
В списке pgsql-bugs по дате отправления: