infinite loop in parallel hash joins / DSA / get_best_segment
От | Tomas Vondra |
---|---|
Тема | infinite loop in parallel hash joins / DSA / get_best_segment |
Дата | |
Msg-id | 194c0706-c65b-7d81-ab32-2c248c3e2344@2ndquadrant.com обсуждение исходный текст |
Ответы |
Re: infinite loop in parallel hash joins / DSA / get_best_segment
|
Список | pgsql-hackers |
Hi, While performing some benchmarks on REL_11_STABLE (at 444455c2d9), I've repeatedly hit an apparent infinite loop on TPC-H query 4. I don't know what exactly are the triggering conditions, but the symptoms are these: 1) A parallel worker" process is consuming 100% CPU, with per for reporting profile like this: 34.66% postgres [.] get_segment_by_index 29.44% postgres [.] get_best_segment 29.22% postgres [.] unlink_segment.isra.2 6.66% postgres [.] fls 0.02% [unknown] [k] 0xffffffffb10014b0 So all the time seems to be spent within get_best_segment. 2) The backtrace looks like this (full backtrace attached): #0 0x0000561a748c4f89 in get_segment_by_index #1 0x0000561a748c5653 in get_best_segment #2 0x0000561a748c67a9 in dsa_allocate_extended #3 0x0000561a7466ddb4 in ExecParallelHashTupleAlloc #4 0x0000561a7466e00a in ExecParallelHashTableInsertCurrentBatch #5 0x0000561a7466fe00 in ExecParallelHashJoinNewBatch #6 ExecHashJoinImpl #7 ExecParallelHashJoin #8 ExecProcNode ... 3) The infinite loop seems to be pretty obvious - after setting breakpoint on get_segment_by_index we get this: Breakpoint 1, get_segment_by_index (area=0x560c03626e58, index=3) ... (gdb) c Continuing. Breakpoint 1, get_segment_by_index (area=0x560c03626e58, index=3) ... (gdb) c Continuing. Breakpoint 1, get_segment_by_index (area=0x560c03626e58, index=3) ... (gdb) c Continuing. That is, we call the function with the same index over and over. Why is that? Well: (gdb) print *area->segment_maps[3].header $1 = {magic = 216163851, usable_pages = 512, size = 2105344, prev = 3, next = 3, bin = 0, freed = false} So, we loop forever. I don't know what exactly are the triggering conditions here. I've only ever observed the issue on TPC-H with scale 16GB, partitioned lineitem table and work_mem set to 8MB and query #4. And it seems I can reproduce it pretty reliably. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
В списке pgsql-hackers по дате отправления: