Re: BUG #3504: Some listening sessions never return from writing, problems ensue

Поиск

Список

Период

Сортировка

От	Peter Koczan
Тема	Re: BUG #3504: Some listening sessions never return from writing, problems ensue
Дата	7 августа 2007 г. 04:36:40
Msg-id	4544e0330708061805wcafa72br623aad2e50fe250b@mail.gmail.com обсуждение исходный текст
Ответ на	Re: BUG #3504: Some listening sessions never return from writing, problems ensue (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-bugs

Дерево обсуждения

Not quite, here's what happens every second when idling, for both good and
bad connections (with a different values to each syscall). It is trying to
read data, I am a little concerned about the EAGAIN return value, though.

[koczan@ator] ~ $ strace -p 6987
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLPRI}, {fd=8,
events=POLLIN}], 3, 869) = 0
gettimeofday({1186445557, 631779}, NULL) = 0
read(5, 0x9651540, 5)                   = -1 EAGAIN (Resource temporarily
unavailable)
gettimeofday({1186445557, 632197}, NULL) = 0
ioctl(3, FIONREAD, [0])                 = 0
poll(<unfinished ...>
Process 6987 detached

However, looking at straces when an async notify actually happens, I notice
something interesting. The number of syscalls over the same interrupt on the
bad connection is nearly 5 times that of the good connection. By the time
the good connection is finished, the bad connection is still mucking with
data. I can send you the actual straces if you think it would help.

Here's my theory (and feel free to tell me that I'm full of it)...somehow, a
lot of notifies happened at once, or in a very short period of time, to the
point where the app was still processing notifies when the timer clicked off
another second. The connection (or app, or perl module) never marked those
notifies as being processed, or never updated its timestamp of when it
finished, so when the next notify came around, it tried to reprocess the old
data (or data since the last time it finished), and yet again couldn't
finish. Lather, rinse, repeat. In sum, it might be that trying to call
pg_notifies while processing notifies tickles a race condition and tricks
the connection into thinking its in a bad state.

The holes in my theory are that the app doesn't try to read everything every
second for bad connections (so notifies are being processed somehow), and it
doesn't explain why the recv-q is filling up even for good connections.
Hopefully this will help somehow, though.

Peter

P.S. Just in case its useful, the contents of pg_listener for bad
connections (I verified that notification = 0 for good connections).
csdb=# select * from pg_listener where notification > 0;
 relname | listenerpid | notification
---------+-------------+--------------
 req     |        3292 |        19141
 req     |       25848 |        16308
 req     |       17004 |        23815
 req     |       19583 |         1703
(4 rows)

On 8/6/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> "Peter Koczan" <pjkoczan@gmail.com> writes:
> > So, it looks like the app polls for async notifies every second.
> Hopefully
> > you can glean some insight from that.
>
> My guess is that some condition causes that to stop happening.
>
> One way to monitor this from outside is to use strace to watch the app.
> According to the above you should see a recv() attempt on the libpq
> socket every second.  I suspect that you won't see any more once the app
> gets into the state where the recv-q is backing up.  Looking at what
> happens around the time of the last recv() might give some clues.
>
>                         regards, tom lane
>

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #3504: Some listening sessions never return from writing, problems ensue