Re: FSM Corruption (was: Could not read block at end of the relation)
От | Noah Misch |
---|---|
Тема | Re: FSM Corruption (was: Could not read block at end of the relation) |
Дата | |
Msg-id | 20240313165523.6a@rfd.leadboat.com обсуждение исходный текст |
Ответ на | Re: FSM Corruption (was: Could not read block at end of the relation) (Ronan Dunklau <ronan.dunklau@aiven.io>) |
Ответы |
Re: FSM Corruption (was: Could not read block at end of the relation)
(Ronan Dunklau <ronan.dunklau@aiven.io>)
Re: FSM Corruption (was: Could not read block at end of the relation) (Peter Geoghegan <pg@bowt.ie>) |
Список | pgsql-bugs |
On Wed, Mar 13, 2024 at 10:37:21AM +0100, Ronan Dunklau wrote: > Le mardi 12 mars 2024, 23:08:07 CET Noah Misch a écrit : > > > I would argue this is fine, as corruptions don't happen often, and FSM is > > > not an exact thing anyway. If we record a block as full, it will just be > > > unused until the next vacuum adds it back to the FSM with a correct > > > value. > > If this happened only under FSM corruption, it would be fine. I was > > thinking about normal extension, with an event sequence like this: > > > > - RelationExtensionLockWaiterCount() returns 10. > > - Backend B1 extends by 10 pages. > > - B1 records 9 pages in the FSM. > > - B2, B3, ... B11 wake up and fetch from the FSM. In each of those > > backends, fsm_does_block_exist() returns false, because the cached relation > > size is stale. Those pages remain unused until VACUUM re-records them in > > the FSM. > > > > PostgreSQL added the multiple-block extension logic (commit 719c84c) from > > hard-won experience bottlenecking there. If fixing $SUBJECT will lose 1% of > > that change's benefit, that's probably okay. If it loses 20%, we should > > work to do better. While we could rerun the 2016 benchmark to see how the > > patch regresses it, there might be a simple fix. fsm_does_block_exist() > > could skip the cache when blknumber>=cached, like this: > > > > return((BlockNumberIsValid(smgr->smgr_cached_nblocks[MAIN_FORKNUM]) > && > > blknumber < smgr- > >smgr_cached_nblocks[MAIN_FORKNUM]) || > > blknumber < RelationGetNumberOfBlocks(rel)); > > > > How do you see it? That still leaves us with an avoidable lseek in B2, B3, > > ... B11, but that feels tolerable. I don't know how to do better without > > switching to the ReadBufferMode approach. > That makes sense. I'm a bit worried about triggering additional lseeks when we > in fact have other free pages in the map, that would pass the test. I'm not > sure how relevant it is given the way we search the FSM with fp_next_slot > though... That's a reasonable thing to worry about. We could do wrong by trying too hard to use an FSM slot, and we could do wrong by not trying hard enough. > To address that, I've given a bit of thought about enabling / disabling the > auto-repair behaviour with a flag in GetPageWithFreeSpace to distinguish the > cases where we know we have a somewhat up-to-date value compared to the case > where we don't (as in, for heap, try without repair, then get an uptodate > value to try the last block, and if we need to look at the FSM again then ask > for it to be repaired) but it brings way too much complexity and would need > careful thought for each AM. > > So please find attached a patch with the change you propose. > > Do you have a link to the benchmark you mention though to evaluate the patch > against it ? Here is one from the thread that created commit 719c84c: https://www.postgresql.org/message-id/flat/CA%2BTgmob7xED4AhoqLspSOF0wCMYEomgHfuVdzNJnwWVoE_c60g%40mail.gmail.com There may be other benchmarks earlier in that thread.
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Tom LaneДата:
Сообщение: Re: BUG #18390: exponentiation produces float datatype, but nth-root produces integer
Следующее
От: Bruce MomjianДата:
Сообщение: Re: BUG #18387: Erroneous permission checks and/or misleading error messages with refresh materialized view