The lightbulb just went on...
От | Tom Lane |
---|---|
Тема | The lightbulb just went on... |
Дата | |
Msg-id | 28096.971742986@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: The lightbulb just went on...
Re: The lightbulb just went on... |
Список | pgsql-hackers |
... with a blinding flash ... The VACUUM funnies I was complaining about before may or may not be real bugs, but they are not what's biting Alfred. None of them can lead to the observed crashes AFAICT. What's biting Alfred is the code that moves a tuple update chain, lines 1541 ff in REL7_0_PATCHES. This sets up a pointer to a source tuple in "tuple". Then it gets the destination page it plans to move the tuple to, and applies vc_vacpage to that page if it hasn't been done already. But when we're moving a tuple chain, *it is possible for the destination page to be the same as the source page*. Since vc_vacpage applies PageRepairFragmentation, all the live tuples on the page may get moved. Afterwards, tuple.t_data is out of date and pointing at some random chunk of some other tuple. The subsequent copy of the tuple copies garbage, which explains Alfred's several crashes in constructing index entries for the copied tuple (all of which bombed out from the index-build calls at lines 1634 ff, ie, for tuples being moved as part of a chain). Once in a while, the obsolete pointer will be pointing at the real header of a different tuple --- perhaps even the place where we are about to put the copy. This improbable case explains the one observed Assert crash in which a copied tuple's HEAP_MOVED_IN bit mysteriously got turned off. Reason: it was cleared through the old-tuple pointer just after being set via the new-tuple one. Proof that this is happening can be seen in the core dumps for Alfred's index-construction-crash cases: tuple.t_data does not point at the same place that the tuple.ip_posid'th page line item points at. This could only happen if the page was reshuffled since the tuple pointer was set up. The explanation for the Assert crash is a bit of a leap of faith, but I feel confident that it's right. The solution is to do everything we're going to do with the source tuple, especially copying it and updating its state, *before* we apply vc_vacpage to the destination page. Then we don't care if the source gets moved during vc_vacpage. I will prepare a patch along this line and send it to Alfred for testing. regards, tom lane
В списке pgsql-hackers по дате отправления: