Re: brin index vacuum versus transaction snapshots
От | Alvaro Herrera |
---|---|
Тема | Re: brin index vacuum versus transaction snapshots |
Дата | |
Msg-id | 20150731175700.GX2441@postgresql.org обсуждение исходный текст |
Ответ на | Re: brin index vacuum versus transaction snapshots (Kevin Grittner <kgrittn@ymail.com>) |
Ответы |
Re: brin index vacuum versus transaction snapshots
Re: brin index vacuum versus transaction snapshots |
Список | pgsql-hackers |
Let me explain a bit more what's the mechanism at play. When scanning a block range to summarize it, we use an MVCC snapshot. Any tuples that are inserted by transactions that appear as in-progress to that snapshot will not be seen by that range scan; therefore we need another mechanism to include their values in the summary tuple. This mechanism is a "placeholder tuple." We insert this tuple into the index; any process that tries to insert into the range will update the placeholder tuple. All insertions that are not visible according to the MVCC scan must update the placeholder tuple. So there are two events that are time-critical regarding each other: the insertion of the placeholder tuple, and the acquisition of the snapshot for the MVCC scan. All transactions in progress for the snapshot *must* see the placeholder tuple; therefore, we make sure we insert the placeholder tuple first, and acquire the snapshot later. This is all regardless of a transaction being a user-declared transaction block, or an implicit transaction opened by some command; regardless of vacuum being user-invoked or autovacuum. The description of the above is critical for correctness in all those cases. I assumed (wrongly) that vacuum would not acquire a snapshot. (For one thing, we already rely for correctness on lazy vacuum not holding xmin back, so this is not entirely without sense.) The problem is that PortalRunUtility does it inconditionally. This is not a problem in read-committed mode, because we simply freshen the snapshot each time and all is well. In transaction-snapshot mode, taking another snapshot is a no-op and thus the whole thing is broken. The code is right to raise an error. (I don't really give a damn whether the error message is 100% correct or not. Fixing that part is trivial.) Even assuming that PortalRunUtility would not grab a snapshot at start, things would still be broken if vacuum processed more than one range in transaction-snapshot isolation mode: insert placeholder tuple, grab snapshot for the first range, scan range: all is well up to this point. Then we need to process the second range: insert placeholder tuple, grab snapshot ... which reuses the previous snapshot, older than the placeholder tuple. Oops. I think the real solution to this problem is to avoid use of GetTransactionSnapshot(), and instead use GetLatestSnapshot(). As far as I can see, that should completely close the hole. This requires patching IndexBuildHeapRangeScan() to allow for that. Untested patch attached. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
В списке pgsql-hackers по дате отправления: