Re: BUG #17116: Assert failed in SerialSetActiveSerXmin() on commit of parallelized serializable transaction
От | Thomas Munro |
---|---|
Тема | Re: BUG #17116: Assert failed in SerialSetActiveSerXmin() on commit of parallelized serializable transaction |
Дата | |
Msg-id | CA+hUKGJd-=mdUdS+Gh-FN4rZgAg4M-V==G7FMCU-KbUffyPJDw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #17116: Assert failed in SerialSetActiveSerXmin() on commit of parallelized serializable transaction (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: BUG #17116: Assert failed in SerialSetActiveSerXmin() on commit of parallelized serializable transaction
|
Список | pgsql-bugs |
On Fri, Jul 30, 2021 at 9:41 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Wed, Jul 28, 2021 at 5:02 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Wed, Jul 28, 2021 at 9:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: > > > (Except for read-only-anomaly-3, that causes another > > > assertion fail but I believe that it's not related to the initial issue > > > and deserves another bug report.) The problem is that if we don't take the fast exit here: if (XactReadOnly && PredXact->WritableSxactCount == 0) { ReleasePredXact(sxact); LWLockRelease(SerializableXactHashLock); return snapshot; } ... then we search for writable SSI transactions here: for (othersxact = FirstPredXact(); othersxact != NULL; othersxact = NextPredXact(othersxact)) { if (!SxactIsCommitted(othersxact) && !SxactIsDoomed(othersxact) && !SxactIsReadOnly(othersxact)) { SetPossibleUnsafeConflict(sxact, othersxact); } } ... but that can find nothing if all writers happen to be doomed. WritableSxactCount doesn't count read-only or committed transactions so we don't have to worry about those confusing us, but it does count doomed transactions. Having an empty list here is mostly fine, except that nobody will ever set our RO_SAFE flag, and in the case of DEFERRABLE, the assertion that someone has set it will fail. This is the case since: === commit bdaabb9b22caa71021754d3967b4032b194d9880 Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> Date: Fri Jul 8 00:36:30 2011 +0300 There's a small window wherein a transaction is committed but not yet on the finished list, and we shouldn't flag it as a potential conflict if so. We can also skip adding a doomed transaction to the list of possible conflicts because we know it won't commit. Dan Ports and Kevin Grittner. === I see three fixes: 1. Check if the list is empty, and if so, set our own SXACT_FLAG_RO_SAFE. Then the assertion in GetSafeSnapshot() will pass, and also non-DEFERRABLE callers will eventually see the flag and opt out. 2. Check if the list is empty, and if so, opt out immediately (as we do when WriteableSxactCount == 0). This would be strictly better than #1, because it happens sooner while we still have the lock. On the other hand, we'd have to undo more effects, and that involves writing more fragile and very rarely run code. 3. Just delete the assertion in GetSafeSnapshot(). This is attractively simple but it means that READ ONLY non-DEFERRABLE transactions arbitrarily miss out on the RO_SAFE optimisation in this (admittedly rare) case. The attached 0002 has idea #1, which I prefer for back-patching. I think #2 might be a good idea if we take the filtering logic further so that it becomes more likely that you get an empty list (for example, why don't we skip transactions that are running in a different database?), but then that'd be new code, not something we back-patch. Thoughts? If someone is wanting to reproduce this case, try removing everything from isolation_schedule except read-only-anomaly* and read-write-unique*, and then run something like this while you go out for lunch. It should work as far back as 11 (before that, those tests didn't exist and the isolation tester couldn't run them due to lack of handling for DEFERRABLE blocking): rounds=1000 concurrency=100 output=/tmp/junk for i in `seq $rounds` ; do echo "=== round $i ===" for c in `seq $concurrency` ; do mkdir -p $output/results_${i}_${c} (make -C src/test/isolation installcheck \ EXTRA_REGRESS_OPTS="--dbname=regress_${c} --outputdir=$output/results_${i}_${c}" || echo FAIL) \ > junk/out.${i}.${c}.log 2>&1 & pids[${c}]=$! done for c in `seq $concurrency` ; do wait ${pids[${c}]} done done
Вложения
В списке pgsql-bugs по дате отправления: