_bt_split(), and the risk of OOM before its critical section

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема _bt_split(), and the risk of OOM before its critical section
Дата
Msg-id CAH2-WzkcWT_-NH7EeL=Az4efg0KCV+wArygW8zKB=+HoP=VWMw@mail.gmail.com
обсуждение исходный текст
Ответы Re: _bt_split(), and the risk of OOM before its critical section  (Peter Geoghegan <pg@bowt.ie>)
Re: _bt_split(), and the risk of OOM before its critical section  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Commit 8fa30f906be reduced the elevel of a number of "can't happen"
errors from PANIC to ERROR. These were all critical-section-adjacent
errors involved in nbtree page splits, and nbtree page deletion. It
also established the following convention within _bt_split(), which
allowed Tom to keep the length of the critical section just as short
as it had always been:

/*
 * origpage is the original page to be split.  leftpage is a temporary
 * buffer that receives the left-sibling data, which will be copied back
 * into origpage on success.  rightpage is the new page that receives the
 * right-sibling data.  If we fail before reaching the critical section,
 * origpage hasn't been modified and leftpage is only workspace. In
 * principle we shouldn't need to worry about rightpage either, because it
 * hasn't been linked into the btree page structure; but to avoid leaving
 * possibly-confusing junk behind, we are careful to rewrite rightpage as
 * zeroes before throwing any error.
 */

The INCLUDE indexes work looks like it subtly broke this, because it
allocated memory after the initialization of the right page --
allocating memory can always fail. On the other hand, even when
8fa30f906be went in back in 2010 this "rule" was arguably broken,
because we were already calling PageGetTempPage() after the right
sibling page is initialized, which palloc()s a full BLCKSZ, which is
far more than truncation is every likely to allocate.

On the other other hand, it seems to me that the PageGetTempPage()
thing might have been okay, because it happens before the high key is
inserted on the new right buffer page. The same cannot be said for the
way we generate a new high key for the left/old page via suffix
truncation, which happens to occur after the right buffer page is
first modified by inserted its high key (the original/left page's
original high key). I think that there may be a risk that VACUUM's
page deletion code will get confused by finding an errant right
sibling page from a failed page split when there is a high key. If so,
that would be a risk that was introduced in Postgres 11, and made much
more likely in practice in Postgres 12. (I haven't got as far as doing
an analysis of the risks to page deletion, though. The "fastpath"
rightmost page insertion optimization that was also added to Postgres
11 seems like it also might need to be considered here.)

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Attempt to consolidate reading of XLOG page
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Attempt to consolidate reading of XLOG page