Re: [sqlsmith] crashes in RestoreSnapshot on hot standby
От | Amit Kapila |
---|---|
Тема | Re: [sqlsmith] crashes in RestoreSnapshot on hot standby |
Дата | |
Msg-id | CAA4eK1J9QOMhEAnOysFQrca_x3Ea4r+N6Vrz5EgYMmc0+zr67Q@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [sqlsmith] crashes in RestoreSnapshot on hot standby (Thomas Munro <thomas.munro@enterprisedb.com>) |
Ответы |
Re: [sqlsmith] crashes in RestoreSnapshot on hot standby
|
Список | pgsql-hackers |
On Fri, Jul 1, 2016 at 8:48 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>
> On Fri, Jul 1, 2016 at 2:17 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> > On Fri, Jul 1, 2016 at 6:26 AM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
> >> #1 0x0000000000822032 in RestoreSnapshot (start_address=start_address@entry=0x7f2701d5a110 <error: Cannot access memory at address 0x7f2701d5a110>) at snapmgr.c:2020
> >
> > memcpy(snapshot->subxip, serialized_xids + serialized_snapshot->xcnt,
> > serialized_snapshot->subxcnt * sizeof(TransactionId));
> > So this is choking here? Is one of those pointers NULL?
>
> Theory 1:
> If serialized_snapshot->xcnt == 0, then snapshot->xip never gets
> initialized to a non-NULL value. Then if serialized_snapshot->subxcnt
> > 0, we set snapshot->subxip = snapshot->xip +
> serialized_snapshot->xcnt (so that's NULL too). Then in line the line
> you show we call memcpy(snapshot->subxip, ...). The fix might be
> something like the attached.
>
GetSnapshotData()
{
/*
* We're in hot standby, so get XIDs from KnownAssignedXids.
--
>
> On Fri, Jul 1, 2016 at 2:17 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> > On Fri, Jul 1, 2016 at 6:26 AM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
> >> #1 0x0000000000822032 in RestoreSnapshot (start_address=start_address@entry=0x7f2701d5a110 <error: Cannot access memory at address 0x7f2701d5a110>) at snapmgr.c:2020
> >
> > memcpy(snapshot->subxip, serialized_xids + serialized_snapshot->xcnt,
> > serialized_snapshot->subxcnt * sizeof(TransactionId));
> > So this is choking here? Is one of those pointers NULL?
>
> Theory 1:
> If serialized_snapshot->xcnt == 0, then snapshot->xip never gets
> initialized to a non-NULL value. Then if serialized_snapshot->subxcnt
> > 0, we set snapshot->subxip = snapshot->xip +
> serialized_snapshot->xcnt (so that's NULL too). Then in line the line
> you show we call memcpy(snapshot->subxip, ...). The fix might be
> something like the attached.
>
I was just typing the mail, when I see this mail. I also reached to the conclusion that this is the reason of crash. You can see how CopySnapshot calculates the subxipoff, may be writing code that way will be more consistent. In case of recovery, I think serialized_snapshot->xcnt will always be zero as we fill everything in subxip array (refer below code in GetSnapshotData).
GetSnapshotData()
{
/*
* We're in hot standby, so get XIDs from KnownAssignedXids.
..
..
}
В списке pgsql-hackers по дате отправления: