On 03/27/2017 01:40 PM, Rushabh Lathia wrote:
>
> ...
> I was doing more testing with the patch and I found one more server
> crash with the patch around same area, when we forced the gather
> merge for the scan having zero rows.
>
> create table dept ( deptno numeric, dname varchar(20);
> set parallel_tuple_cost =0;
> set parallel_setup_cost =0;
> set min_parallel_table_scan_size =0;
> set min_parallel_index_scan_size =0;
> set force_parallel_mode=regress;
> explain analyze select * from dept order by deptno;
>
> This is because for leader we don't initialize the slot into gm_slots. So
> in case where launched worker is zero and table having zero rows, we
> end up having NULL slot into gm_slots array.
>
> Currently gather_merge_clear_slots() clear out the tuple table slots for
> each
> gather merge input and returns clear slot. In the patch I modified function
> gather_merge_clear_slots() to just clear out the tuple table slots and
> always return NULL when All the queues and heap us exhausted.
>
Isn't that just another sign the code might be a bit too confusing? I
see two main issues in the code:
1) allocating 'slots' as 'nreaders+1' elements, which seems like a good
way to cause off-by-one errors
2) mixing objects with different life spans (slots for readers vs. slot
for the leader) seems like a bad idea too
I wonder how much we gain by reusing the slot from the leader (I'd be
surprised if it was at all measurable). David posted a patch reworking
this, and significantly simplifying the GatherMerge node. Why not to
accept that?
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services