Re: select_parallel test failure: gather sometimes losing tuples(maybe during rescans)?

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: select_parallel test failure: gather sometimes losing tuples(maybe during rescans)?
Дата
Msg-id CAEepm=0pAqV8Nqdt2D6+nH=bkTkchKjL46tokcGUmGo5P7+4zg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: select_parallel test failure: gather sometimes losing tuples (maybe during rescans)?  (Andres Freund <andres@anarazel.de>)
Ответы Re: select_parallel test failure: gather sometimes losing tuples(maybe during rescans)?  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-hackers
On Sun, Mar 4, 2018 at 3:40 PM, Andres Freund <andres@anarazel.de> wrote:
> On March 3, 2018 6:36:51 PM PST, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>On 03/04/2018 03:20 AM, Thomas Munro wrote:
>>> Hi,
>>>
>>> I saw a one-off failure like this:
>>>
>>>                                   QUERY PLAN
>>>
>>--------------------------------------------------------------------------
>>>    Aggregate (actual rows=1 loops=1)
>>> !    ->  Nested Loop (actual rows=98000 loops=1)
>>>            ->  Seq Scan on tenk2 (actual rows=10 loops=1)
>>>                  Filter: (thousand = 0)
>>>                  Rows Removed by Filter: 9990
>>> !          ->  Gather (actual rows=9800 loops=10)
>>>                  Workers Planned: 4
>>>                  Workers Launched: 4
>>>                  ->  Parallel Seq Scan on tenk1 (actual rows=1960
>>loops=50)
>>> --- 485,495 ----
>>>                                   QUERY PLAN
>>>
>>--------------------------------------------------------------------------
>>>    Aggregate (actual rows=1 loops=1)
>>> !    ->  Nested Loop (actual rows=97984 loops=1)
>>>            ->  Seq Scan on tenk2 (actual rows=10 loops=1)
>>>                  Filter: (thousand = 0)
>>>                  Rows Removed by Filter: 9990
>>> !          ->  Gather (actual rows=9798 loops=10)
>>>                  Workers Planned: 4
>>>                  Workers Launched: 4
>>>                  ->  Parallel Seq Scan on tenk1 (actual rows=1960
>>loops=50)
>>>
>>>
>>> Two tuples apparently went missing.
>>>
>>> Similar failures on the build farm:
>>>
>>>
>>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=okapi&dt=2018-03-03%2020%3A15%3A01
>>>
>>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2018-03-03%2018%3A13%3A32
>>>
>>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2018-03-03%2017%3A55%3A11
>>>
>>> Could this be related to commit
>>> 34db06ef9a1d7f36391c64293bf1e0ce44a33915 or commit
>>> 497171d3e2aaeea3b30d710b4e368645ad07ae43?
>>>
>>
>>I think the same failure (or at least very similar plan diff) was
>>already mentioned here:
>>
>>https://www.postgresql.org/message-id/17385.1520018934@sss.pgh.pa.us
>>
>>So I guess someone else already noticed, but I don't see the cause
>>identified in that thread.

Oh.  Sorry, I didn't recognise that as the same thing, from the title.
Doesn't seem to be related to number of workers launched at all... it
looks more like the tuple queue is misbehaving.  Though I haven't got
any proof of anything yet.

> Robert and I started discussing it a bit over IM. No conclusion. Robert tried to reproduce locally, including
disablingatomics, without luck.
 
>
> Can anybody reproduce locally?

I've seen it several times on Travis CI.  (So I would normally have
been able to tell you about this problem before the was committed,
except that the email thread was too long and the mail archive app
cuts long threads off!)  Will try on some different kind of computers
that I have local control off...  I suspect (knowing how we run it on
Travis CI) that being way overloaded might be helpful...

-- 
Thomas Munro
http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: select_parallel test failure: gather sometimes losing tuples (maybe during rescans)?
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: select_parallel test failure: gather sometimes losing tuples(maybe during rescans)?