Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication
От | Andres Freund |
---|---|
Тема | Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication |
Дата | |
Msg-id | 20191118222416.dkn5cdmbxmtcemaf@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Список | pgsql-bugs |
Hi, On 2019-11-18 21:58:16 +0100, Tomas Vondra wrote: > and the ReorderBufferToastReplace does this: > > newtup = change->data.tp.newtuple; > > heap_deform_tuple(&newtup->tuple, desc, attrs, isnull); > > but that fails, because the tuple pointer happens to be 0x8, which is > clearly bogus. Not sure where that comes from, I don't recognize that as > a typical patter. It indicates that change->data.tp.newtuple is NULL, afaict. newtup->tuple boils down to ((char *) newtup->tuple) + offsetof(ReorderBufferTupleBuf, tuple) and offsetof(ReorderBufferTupleBuf, tuple) is 0x8. > Can you create a core dump (see [1]), and print 'change' and 'txn' in > frame #2? I wonder if some the other fields are bogus too (but it can't > be entirely true ...), and if the transaction got serialized. Please print change and *change, both, please. I suspect what's happening is that somehow a change that shouldn't have toast changes - e.g. a DELETE - somehow has toast changes. Which then triggers a failure in ReorderBufferToastReplace(), which expects newtuple to be valid. It's probably worthwhile to add an elog(ERROR) check for this, even if this does not turn out to be the case. > > This behaviour does not depends on defined data in tables, because we see it > > in different database with different sets of tables in publications. > > I'm not sure I really believe that. Surely there has to be something > special about your schema, or possibly access patter that triggers this > bug in your environment and not elsewhere. Yea. Are there any C triggers present? Any unusual extensions? Users of the transaction hook, for example? > > Looks like a real issue in logical replication. > > I will happy to provide an additional information about that issue, but i > > should know what else to need to collect for helping to solve this > > problem. > > > > Well, if you can create a reproducer, that'd be the best option, because > then we can investigate locally instead of the ping-ping here. > > But if that's not possible, let's start with the schema and the > additional information from the core file. > > I'd also like to see the contents of the WAL, particularly for the XID > triggering this issue. Please run pg_waldump and see how much data is > there for XID 1667601527. It does commit at 25EE/D6DE6EB8, not sure > where it starts. It may have subtransactions, so don't do just grep. Yea, that'd be helpful. Greetings, Andres Freund
В списке pgsql-bugs по дате отправления: