Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
Дата
Msg-id 20240208215920.myib264liwbkdndp@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
Hi,

On 2024-02-08 12:04:24 -0500, Tom Lane wrote:
> Etsuro Fujita <etsuro.fujita@gmail.com> writes:
> > IIUC I think the assertion failure was caused by an
> > error-during-error-recovery loop caused by the "epoll_create1 failed:
> > Too many open files" error raised in WaitLatchOrSocket called from
> > pgfdw_get_cleanup_result, which is called during abort cleanup.  I
> > think a simple fix to avoid such a loop is to modify the PG_CATCH
> > block in pgfdw_get_cleanup_result so that it just ignores the passed
> > error, not re-throwing it, and restores InterruptHoldoffCount and the
> > memory context, like the attached.  In the patch I also modified
> > callers of pgfdw_get_cleanup_result to issue a warning when ignoring
> > the error.  I might be missing something, though.
> 
> I do not think ignoring the passed error is *ever* acceptable.
> You have no idea what the error condition is or whether your
> hack is sufficient to recover from it.

+1


I think we ought to understand *why* we are getting the "Too many open
files". The AcquireExternalFD() in CreateWaitEventSet() should prevent
that.

One annoying bit is that AcquireExternalFD() failing emits the same error as
if epoll_create1() itself failing, including the same errno. So we don't know
if the problem is that there are too many connections and because of that we
are running out of "file descriptor slots", or whether something was using up
file descriptors outside of our system, or ...

I also wonder if postgres_fdw should strive to use a longer lived wait event
set. For efficiency, if nothing else? That'd avoid the need to create one
during error handling.

Greetings,

Andres Freund



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds