Re: Funny hang on PostgreSQL 10 during parallel index scan on slave
От | Chris Travers |
---|---|
Тема | Re: Funny hang on PostgreSQL 10 during parallel index scan on slave |
Дата | |
Msg-id | CAN-RpxB4iVAkGFowRSh=Sj8ShYHJE7nmbpT=Z4iKO7JKZgQi5A@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Funny hang on PostgreSQL 10 during parallel index scan on slave (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: Funny hang on PostgreSQL 10 during parallel index scan on slave
|
Список | pgsql-hackers |
On Wed, Sep 5, 2018 at 6:55 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2018-09-05 18:48:44 +0200, Chris Travers wrote:
> Will submit a patch here shortly. Thanks! Should we do for master and
> 10? Or 9.6 too?
Please don't top-post on this list. This needs to be done in all
branches where the posix_fallocate call is present.
> > Yep, Maybe we should check for signals there.
> >
> > On Wed, Sep 5, 2018 at 5:27 PM Thomas Munro <thomas.munro@enterprisedb.com>
> > wrote:
> >
> >> On Wed, Sep 5, 2018 at 8:23 AM Chris Travers <chris.travers@adjust.com>
> >> wrote:
> >> > 1. The query is in a parallel index scan or similar
> >> > 2. A process is executing a parallel plan and allocating a significant
> >> chunk of memory (2MB for example) in dynamic shared memory.
> >> > 3. The startup process goes into a loop where it sends a sigusr1,
> >> sleeps 5m, and sends another sigusr1 etc.
> >> > 4. The sigusr1 aborts the system call, which is then retried.
> >> > 5. Because the system call takes more than 5ms, we end up in an
> >> endless loop
What you're presumably encountering here is a recovery conflict.
Agreed but the question is how to correct what is a fairly interesting race condition.
> On Wed, Sep 5, 2018 at 6:40 PM Chris Travers <chris.travers@adjust.com>
> wrote:
> >> Do you mean this loop in dsm_impl_posix_resize() is getting
> >> interrupted constantly and never completing?
> >>
> >> /* We may get interrupted, if so just retry. */
> >> do
> >> {
> >> rc = posix_fallocate(fd, 0, size);
> >> } while (rc == EINTR);
> >>
Probably worthwile to check that the dsm code is properly robust if
errors are thrown from within here.
Will check that too. Thanks!
Greetings,
Andres Freund
Best Regards,
Chris Travers
Head of Database
Saarbrücker Straße 37a, 10405 Berlin
В списке pgsql-hackers по дате отправления: