Hi Tom,
On 2017-09-29 20:26:42 +0000, Tom Lane wrote:
> get_rel_oids used to not take any relation locks at all, but that stopped
> being a good idea with commit 3c3bb9933, which inserted a syscache lookup
> into the function. A concurrent DROP TABLE could now produce "cache lookup
> failed", which we don't want to have happen in normal operation. The best
> solution seems to be to transiently take a lock on the relation named by
> the RangeVar (which also makes the result of RangeVarGetRelid a lot less
> spongy). But we shouldn't hold the lock beyond this function, because we
> don't want VACUUM to lock more than one table at a time. (That would not
> be a big problem right now, but it will become one after the pending
> feature patch to allow multiple tables to be named in VACUUM.)
I'm not sure how big a problem this is, but I suspect it is
one. Autovacuum used to skip relations when they're locked, and the
vacuum isn't an anti-wraparound one. But after this change it appears
we'll get stuck behind this new lock, even if VACOPT_NOWAIT is
specified. That's bad because now relations that are AEL locked
(e.g. because of some longrunning DDL) can block global autovacuum
progress.
I noticed this when reviewing a patch that adds NOWAIT (now renamed to
SKIP LOCKED) as a user visible knob and it getting stuck behind an
AEL. The updated version of the patch
http://archives.postgresql.org/message-id/7327B413-1A57-477F-A6A0-6FD80CB5482D%40amazon.com
has an attempt at fixing the issue, although I've not yet looked at it
in any sort of detail.
I suspect we should break out that part once reviewed, and backpatch to
10?
Greetings,
Andres Freund