Обсуждение: BUG #17686: SELECT pg_advisory_lock(...) with low lock_timeout sometimes times out but the lock is acquired

Поиск
Список
Период
Сортировка

BUG #17686: SELECT pg_advisory_lock(...) with low lock_timeout sometimes times out but the lock is acquired

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      17686
Logged by:          Mike Adelson
Email address:      mike.adelson314@gmail.com
PostgreSQL version: 12.2
Operating system:   Windows 10
Description:

I am trying to use the pg_advisory_lock function in combination with setting
lock_timeout to wait on the lock with a timeout.

Here is my query:
```
SET LOCAL statement_timeout = 0;
SET LOCAL lock_timeout = 200;
SELECT pg_catalog.pg_advisory_lock(12345)
```

I'm finding that with relatively small values of lock_timeout and when the
system is under load (e.g. 8 connections acquiring concurrently), I will
encounter a case where the query exits with state 55P03 (lock_not_available)
and yet the lock was actually acquired (I can tell it has been acquired by
querying pg_locks and because other connections' calls to pg_advisory_lock
block).

Here's a standalone repro application which reliably recreates the behavior
for me: https://github.com/Tzachi009/distributed-locks-issue
More discussion here: https://github.com/madelson/DistributedLock/issues/147


PG Bug reporting form <noreply@postgresql.org> writes:
> I'm finding that with relatively small values of lock_timeout and when the
> system is under load (e.g. 8 connections acquiring concurrently), I will
> encounter a case where the query exits with state 55P03 (lock_not_available)
> and yet the lock was actually acquired (I can tell it has been acquired by
> querying pg_locks and because other connections' calls to pg_advisory_lock
> block).

Yeah, there's a fairly basic race condition there, which is that the
lock might get granted to us immediately after the timeout fires.
In ordinary usage this isn't very problematic because the lock would
get released anyway during the transaction abort resulting from the
timeout error.  However, when you're asking for a session-level
advisory lock, that doesn't happen.

I spent a little time studying whether there's a way to close the race,
but if it's possible at all it'd take major restructuring of what's
already quite complex and delicate code.  I can't get excited about
putting such effort into it, because related problems will exist
no matter what: if you get an error from the pg_advisory_lock command,
was the lock granted before that error occurred?  I don't think there's
any bulletproof way to deal with that except to check the lock status
afterwards (e.g., via pg_locks).  It's very much like the inherent issue
with errors late in COMMIT --- there's an atomic point where the commit
is logged, but the error report isn't going to be very clear about
whether we got past that, if indeed the error report gets to the client
at all.

            regards, tom lane



Mike Adelson <mike.adelson314@gmail.com> writes:
> Glad to have the race condition confirmed. In my view the “real” fix
> here would be to offer an advisory locking function that accepts a
> timeout.

How would that remove the low-level race condition?

            regards, tom lane