Re: B-tree parent pointer and checkpoints
От | Tom Lane |
---|---|
Тема | Re: B-tree parent pointer and checkpoints |
Дата | |
Msg-id | 14543.1289488571@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: B-tree parent pointer and checkpoints (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Ответы |
Re: B-tree parent pointer and checkpoints
|
Список | pgsql-hackers |
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > GiST is different. When you insert a key to a leaf page, you (sometimes) > need to adjust the parent pointer to reflect the new key as well. B-tree > tolerates incomplete splits with the 'next page' pointer, but that is > not applicable to gist. Teodor described the issue back in 2005 when > WAL-logging was added to GiST > (http://archives.postgresql.org/pgsql-hackers/2005-06/msg00555.php): > Reading that I wonder: what harm would an incomplete insert cause if we > just left it in the tree? Imagine that you insert a key to a leaf page, > but crash before updating the parent. If you search for the key starting > from the root, you'll fail to find it, because the parent pointer claims > that there are no entries with such a key on the child page. But that's > OK, the inserting transaction aborted with the crash! I think it'd be okay as far as that one entry is concerned, since as you say it doesn't matter whether a search finds it. (We'd have to be sure that VACUUM would still find it to remove it, of course, but that doesn't use a normal search.) You're right that it poses a hazard of subsequent inserts deciding that they don't need to do work on upper levels because the lower ones look OK already. But depending on the details of the search algorithm, this might be a non-problem: if you remember that the upper level entries didn't cover your key when you descended, you'd still know you need to recompute them. Something else I just noticed is that WAL replay isn't capable of completely fixing the index anyway: * To complete insert we can't use basic insertion algorithm because* during insertion we can't call user-defined supportfunctions of opclass.* So, we insert 'invalid' tuples without real key and do it by separate algorithm.* 'invalid'tuple should be updated by vacuum full. Given that there's no more vacuum full, and nobody has been expected to run it routinely for a long time anyway, this fixup approach seems pretty completely broken anyhow. regards, tom lane
В списке pgsql-hackers по дате отправления: