Re: BUG #17705: Segmentation fault in BufFileLoadBuffer
От | Vahur Sinijärv |
---|---|
Тема | Re: BUG #17705: Segmentation fault in BufFileLoadBuffer |
Дата | |
Msg-id | B6CDA783-A78E-4FC4-A359-DE4316325D58@icloud.com обсуждение исходный текст |
Ответ на | Re: BUG #17705: Segmentation fault in BufFileLoadBuffer (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: BUG #17705: Segmentation fault in BufFileLoadBuffer
|
Список | pgsql-bugs |
Hi! Where are these files normally created and what names do they have? We may still have them since last crash. Vahur > On 5. Dec 2022, at 05:41, Thomas Munro <thomas.munro@gmail.com> wrote: > > On Sun, Dec 4, 2022 at 10:57 AM PG Bug reporting form > <noreply@postgresql.org> wrote: >> We started having random segmentation faults with our postgres 13.4 server, >> running on RHEL 8.7. It was upgraded to 13.9, but the issue persists. The >> database is fairly large, about 250GB on disk. >> >> I got core dump of one of the crashes and it shows SIGSEGV in >> BufFileLoadBuffer. I tried to investigate this a little and it seems the >> reason for it is that ExecHashJoinGetSavedTuple reads {0, 0} as header, >> meaning hashvalue is 0 and tuple length is 0. Line 1277 in nodeHashjoin.c >> subtracts sizeof(uint32) from 0 and passes it as size to BufFileRead(). GDB >> shows size=18446744073627287632 at frame #1 which is not ((uint64_t) -4), >> but -82263984. I think this is caused by BufFileRead which decrements >> parameter 'size' by bytes read, so apparently it has read 82263980 bytes, >> overwriting BufFile struct passed to BufFileLoadBuffer. Its files field now >> contains ascii instead of pointer and file->files[file->curFile]; causes >> SIGSEGV. >> >> Why it has read {0, 0} as saved tuple header, or what could have written >> these zeroes there, I could not find out... > > Are you able to reproduce this on demand? Can you get your hands on > the temporary file(s) it's reading? How large is it/are they? > Perhaps we could write a little Python/whatever script to read the > tuples back one at a time until it hits this {0, 0} header to confirm > that it's definitely there, ie the bad header has actually been > written out, which would help narrow down the location of the bug.
В списке pgsql-bugs по дате отправления: