Re: BUG #17949: Adding an index introduces serialisation anomalies.
От | Heikki Linnakangas |
---|---|
Тема | Re: BUG #17949: Adding an index introduces serialisation anomalies. |
Дата | |
Msg-id | 4638fef6-ea51-0e0e-d463-d5269575b021@iki.fi обсуждение исходный текст |
Ответ на | Re: BUG #17949: Adding an index introduces serialisation anomalies. (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: BUG #17949: Adding an index introduces serialisation anomalies.
|
Список | pgsql-bugs |
On 26/06/2023 03:25, Thomas Munro wrote: > On Sun, Jun 25, 2023 at 1:59 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote: >> I've managed to resolve it, or at least reduce the chances for the issue to >> appear, via semi-randomly adding more CheckForSerializableConflictIn / >> PredicateLock around the new sublist that has to be created in >> ginHeapTupleFastInsert. I haven't seen the reproducer failing with this >> changeset after running it multiple times for a couple of minutes, where on the >> main branch, with the two fixes from Thomas included, it was failing within a >> couple of seconds. > > Ahh, right, thanks. I don't think we need to lock all those pages as > you showed, as this whole "fast" path is supposed to be covered by the > meta page (in other words, GIN is expected to have the highest > possible serialisation failure rate under SSI unless you turn fast > updates off). But there is an ordering bug with the existing > predicate lock in that code, which allows this to happen: > > S1: CheckForSerializableConflictIn(meta) > > S2: PredicateLockPage(meta) > S2: scan, find no tuples > > S1: BufferLock(EXCLUSIVE > S1: modify stuff... > > CheckForSerializableConflictIn() was written with the assumption that > you're inserting a tuple (ie you have the page containing the tuple > locked), so you'll either conflict with a reader who already has a > predicate lock at that point OR you'll insert first and then the > reader will see your (invisible-to-snapshot) tuples, but here we're > doing some fancy footwork with a meta page, and we screwed up the > ordering and left a window where neither of those things happens. > Perhaps it was coded that way because there is drop-then-reacquire > dance, but it's easy enough to move the check in both branches. Does > that make sense? Yes, +1 on the patches. Any chance of constructing test cases for these? The above race condition is hard to reach, but some of these other bugs seem more testable. Some minor nits: In v3-0001-Fix-race-in-SSI-interaction-with-empty-btrees.patch: > /* > - * We only get here if the index is completely empty. Lock relation > - * because nothing finer to lock exists. > + * Since we have no pages locked, it's possible for another > + * transaction to insert data between _bt_search() and > + * PredicateLockRelation(). We have to try again after taking a > + * relation-level predicate lock, to close a narrow window where we > + * wouldn't scan concurrently inserted tuples, but the writer wouldn't > + * see our predicate lock. > */ I'd like to keep the old comment here, it's good context, and add the new text in addition to the old. v3-0002-Fix-race-in-SSI-interaction-with-bitmap-heap-scan.patch: Can we keep the optimization when not using SSI? -- Heikki Linnakangas Neon (https://neon.tech)
В списке pgsql-bugs по дате отправления: