Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
От | Alexander Lakhin |
---|---|
Тема | Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) |
Дата | |
Msg-id | 7f006842-975a-bb0a-d8cf-ffa4cc2bbe36@gmail.com обсуждение исходный текст |
Ответ на | Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Список | pgsql-hackers |
Hello Tomas, 01.09.2023 16:00, Tomas Vondra wrote: > Hmmm, I'm not very good at reading the binary code, but here's what > objdump produced for WaitEventSetWait. Maybe someone will see what the > issue is. At first glance, I can't see anything suspicious in the disassembly. IIUC, waiting = true presented there as: 805c38: b902ad18 str w24, [x8, #684] // pgstat_report_wait_start(): proc->wait_event_info = wait_event_info; // end of pgstat_report_wait_start(wait_event_info); 805c3c: b0ffdb09 adrp x9, 0x366000 <dsm_segment_address+0x24> 805c40: b0ffdb0a adrp x10, 0x366000 <dsm_segment_address+0x28> 805c44: f0000eeb adrp x11, 0x9e4000 <PMSignalShmemInit+0x4> 805c48: 52800028 mov w8, #1 // true 805c4c: 52800319 mov w25, #24 805c50: 5280073a mov w26, #57 805c54: fd446128 ldr d8, [x9, #2240] 805c58: 90000d7b adrp x27, 0x9b1000 <ModifyWaitEvent+0xb0> 805c5c: fd415949 ldr d9, [x10, #688] 805c60: f9071d68 str x8, [x11, #3640] // waiting = true (x8 = w8) So there are two simple mov's and two load operations performed in parallel, but I don't think it's similar to what we had in that case. > I thought about maybe just adding the barrier in the code, but then how > would we know it's the issue and this fixed it? It happens so rarely we > can't make any conclusions from a couple runs of tests. Probably I could construct a reproducer for the lockup if I had access to the such machine for a day or two. Best regards, Alexander
В списке pgsql-hackers по дате отправления: