Re: crashes due to setting max_parallel_workers=0

Поиск
Список
Период
Сортировка
От Rushabh Lathia
Тема Re: crashes due to setting max_parallel_workers=0
Дата
Msg-id CAGPqQf0CcH+cyU=WK+bUvwgFEVFdGPFaBC2RMRz_hPCEKEFhwg@mail.gmail.com
обсуждение исходный текст
Ответ на crashes due to setting max_parallel_workers=0  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: crashes due to setting max_parallel_workers=0  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers


On Tue, Mar 28, 2017 at 10:00 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:


On 03/27/2017 01:40 PM, Rushabh Lathia wrote:

...
I was doing more testing with the patch and I found one more server
crash with the patch around same area, when we forced the gather
merge for the scan having zero rows.

create table dept ( deptno numeric, dname varchar(20);
set parallel_tuple_cost =0;
set parallel_setup_cost =0;
set min_parallel_table_scan_size =0;
set min_parallel_index_scan_size =0;
set force_parallel_mode=regress;
explain analyze select * from dept order by deptno;

This is because for leader we don't initialize the slot into gm_slots. So
in case where launched worker is zero and table having zero rows, we
end up having NULL slot into gm_slots array.

Currently gather_merge_clear_slots() clear out the tuple table slots for each
gather merge input and returns clear slot. In the patch I modified function
gather_merge_clear_slots() to just clear out the tuple table slots and
always return NULL when All the queues and heap us exhausted.


Isn't that just another sign the code might be a bit too confusing? I see two main issues in the code:

1) allocating 'slots' as 'nreaders+1' elements, which seems like a good way to cause off-by-one errors

2) mixing objects with different life spans (slots for readers vs. slot for the leader) seems like a bad idea too

I wonder how much we gain by reusing the slot from the leader (I'd be surprised if it was at all measurable). David posted a patch reworking this, and significantly simplifying the GatherMerge node. Why not to accept that?



I think we all agree that we should get rid of nreaders from the GatherMergeState
and need to do some code re-factor. But if I understood correctly that Robert's
concern was to do that re-factor as separate commit.

I picked David's patch and started reviewing the changes. I applied that patch
on top of my v2 patch (which does the re-factor of gather_merge_clear_slots).

In David's patch, into gather_merge_init(), a loop where tuple array is getting
allocate, that loop need to only up to nworkers_launched. Because we don't
hold the tuple array for leader. I changed that and did some other simple
changes based on mine v2 patch. I also performed manual testing with
the changes.

Please find attached re-factor patch, which is v2 patch submitted for the
server crash fix. (Attaching both the patch here again, for easy of access).

Thanks,


--
Rushabh Lathia
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mithun Cy
Дата:
Сообщение: Re: [POC] A better way to expand hash indexes.
Следующее
От: Pavel Stehule
Дата:
Сообщение: xmltable doc fix and example for XMLNAMESPACES