GIN fast-insert vs autovacuum scheduling
От | Tom Lane |
---|---|
Тема | GIN fast-insert vs autovacuum scheduling |
Дата | |
Msg-id | 29127.1237830982@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: GIN fast-insert vs autovacuum scheduling
|
Список | pgsql-hackers |
I'm looking again at the fast-insert patch, and I find myself still desperately unhappy about the mechanism for scheduling autovacuum cleanup of pending insertions. I complained about that before, but I think I only cited a worry about adding overhead to statistics tracking in order to have the "recently inserted tuples" counts. It's got worse problems though: 1. The "recently inserted tuples" count is simply the wrong measurement if the index is partial --- it could be a drastic overestimate. 2. Since the patch has pgstats unconditionally resetting the count to zero after every vacuum, it's not safe for an index AM to use any other cleanup policy except "flush all pending insertions on every vacuum". This doesn't seem particularly optimal to me; isn't the idea to make sure we insert lots of tuples at once? Seems like if there's not very much in the pending list it'd be better to do nothing. 3. Given that ginHeapTupleFastInsert forces a cleanup cycle whenever the pending list gets too big, it's far from clear why we should have to force autovacuum just because of pending list size at all. I also note that such cleanups aren't being accounted for in the "recently inserted tuples" stat, anyhow. On top of those issues, there are implementation problems in the proposed relation_has_pending_indexes() check: it has hard-wired knowledge about GIN indexes, which means the feature cannot be extended to add-on index AMs; and it's examining indexes without any lock whatsoever on either the indexes or their parent table. (And we really would rather not let autovacuum take a lock here.) So I'm fairly strongly tempted to just rip out the whole mechanism, and rely on existing autovacuum rules plus the ginHeapTupleFastInsert- driven cleanups. The only case that I can see where this is really any step backwards is that following a bulk insert operation, autovacuum will only think it needs to ANALYZE the table, but we would like it to clean out the pending insertion lists too. But even then, the patch's mechanism isn't all that desirable because it forces a useless VACUUM pass over the heap. ISTM what might be a better, more flexible approach is to allow the amvacuumcleanup hook to be called at the end of ANALYZE too, letting the index AM make its own decision about whether it needs to do anything then. A decision at that point could be made on the actual size of the index's pending list, rather than any stats-driven guess. Comments? regards, tom lane
В списке pgsql-hackers по дате отправления: