Re: BUG #17619: AllocSizeIsValid violation in parallel hash join
От | Thomas Munro |
---|---|
Тема | Re: BUG #17619: AllocSizeIsValid violation in parallel hash join |
Дата | |
Msg-id | CA+hUKGJV54w8jVqdBcpP7LaCL8PhcEhT97-nfrTcD2rdKCcteA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #17619: AllocSizeIsValid violation in parallel hash join (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: BUG #17619: AllocSizeIsValid violation in parallel hash join
|
Список | pgsql-bugs |
On Wed, Sep 28, 2022 at 7:33 AM Peter Geoghegan <pg@bowt.ie> wrote: > On Tue, Sep 27, 2022 at 9:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Right, the missing piece is the intentional clobber. > > That does seem like the best place to start. The attached patch adds > clobbering that works exactly as you'd expect. This approach is > obviously correct. It also doesn't require any reasoning about > Valgrind's treatment of memory mappings for shared memory, which is > quite complicated given the inconsistent rules about who initializes > what memory (if it's leader or workers). > > I find that the tests pass with this patch -- so it probably won't > catch the bug that Thomas mentioned via running the tests (at least > not reliably). However, if I revert parallel VACUUM bugfix commit > 662ba729 and then run the tests, they fail very reliably, in several > places. That seems like a big improvement. The reason it doesn't catch that bug on master is because that npages shmem variable is only used to prevent further reading once a scan hits the end of a shared tuplestore chunk and needs to decide whether to read a new one, but if a chunk is partially filled then we end the scan sooner because there's a number-of-items counter in the chunk header. I noticed because the test module I wrote to study Dmitry's report fills chunks exactly to the end, so I assume the clobber patch + that test module patch would reveal the problem. I was assuming it didn't break the case you mentioned because that's just stats counters (maybe those finish up wrong but that's probably not a failure), but now it sounds like you've seen another reason. > I believe that Thomas was going to do something like this anyway. I'm > happy to leave it up to him, but I can pursue this separately if that > makes sense. Why not clobber "lower down" in dsm_create(), as I showed? You don't have to use the table-of-contents mechanism to use DSM memory.
В списке pgsql-bugs по дате отправления: