Обсуждение: SELECT FOR UPDATE on rows that don't exist

Поиск
Список
Период
Сортировка

SELECT FOR UPDATE on rows that don't exist

От
Joe
Дата:
Hi,
I'd like to implement a distributed mutex, and was thinking of this:
CREATE TABLE locks (section VARCHAR PRIMARY KEY, holder VARCHAR);

And then do:
BEGIN;
SELECT * FROM locks WHERE section = $1 FOR UPDATE;
INSERT INTO locks (section, holder) VALUES ($1, $2);  # *1

... critical section ...

DELETE FROM locks WHERE section = $1 and holder = $2; #*2
COMMIT;
 
A few questions:
1) What are the semantics of SELECT FOR UPDATE when the row doesn't exist yet?

Without #*1, a simple experiment shows that two processes can be in the critical section at the same time. Add #*1 seems to achieve the desired behavior, but is it really? I didn't find much on the web (it looks like MySQL locks the index meaning the INSERT wouldn't be necessary). If Postgresql was also locking the index, the INSERT would not add anything, but the experiment without the INSERT would have worked. If it's the row being locked, since the row doesn't exist outside the transaction, the second process shouldn't be able to see it and wouldn't block waiting for the first transaction.

2) The DELETE @ #2 is so that the row is never present when not executing in the critical section mainly so that #1 can be a simple insert rather than an upsert. Is there a more standard pattern for this?

3)  Using the DB as a distributed mutex seems like a common application but nothing came up in various DB and PostgreSQL books I consulted or on the web. Is this a bad idea, or are there gotchas I'm missing?

Thanks!
Joe

Re: SELECT FOR UPDATE on rows that don't exist

От
"David G. Johnston"
Дата:
On Wed, Mar 23, 2022 at 12:52 PM Joe <symphony.red+pg@gmail.com> wrote:
1) What are the semantics of SELECT FOR UPDATE when the row doesn't exist yet?

You've informed the system you are going to be updating rows on the table but as yet have not given it specific rows to protect.

Without #*1, a simple experiment shows that two processes can be in the critical section at the same time. Add #*1 seems to achieve the desired behavior, but is it really?

The index will not allow duplicates to be inserted and the first one to try forces all other potential insertions to wait until the first one commits;

"Short-term share/exclusive page-level locks are used for read/write access. Locks are released immediately after each index row is fetched or inserted. These index types provide the highest concurrency without deadlock conditions."

But, once the insertion happens the second transaction sees the potentially conflicting record and so continues waiting to see whether it commits with the conflict in place (at which point it returns the duplicate key error) or not (in which case one of them then gets to proceed with their insertion).  This manifests as a lock on the transactionid of the first session as is most easily seen in in pg_stat_activity wait_event/wait_event_type fields.

I picked up the last point through my own experimentation, if there is documentation someone else will hopefully point it out.


2) The DELETE @ #2 is so that the row is never present when not executing in the critical section mainly so that #1 can be a simple insert rather than an upsert. Is there a more standard pattern for this?

Not sure.  But while definitely not standard, and somewhat limited, there is the advisory locks feature.



3)  Using the DB as a distributed mutex seems like a common application but nothing came up in various DB and PostgreSQL books I consulted or on the web. Is this a bad idea, or are there gotchas I'm missing?


For a job queue typically you'd fire off an update to claim the work - ensuring only one session gets the assignment - and another to indicate completion.  The "critical section" is usually so long as to be harmful to the database to be performed while holding a transaction open specifically for locking purposes.  I have to imagine there are better tools, especially in-memory ones, designed to handle distributed process coordination. That said, the goodness or badness of doing it in the database heavily depends on knowing more than just "distributed mutex".  The critical section details, not just existence, factors into such a subjective evaluation.  The fact you have a functioning database does make things appealing.

David J.

Re: SELECT FOR UPDATE on rows that don't exist

От
Joe
Дата:
David,
Thank you very much. I missed the advisory locks feature. That feels like the perfect thing; the BIGINT keys are a bit clunky but that's minor.

Your description of the sequence of events for the two SELECT FOR UPDATE transactions makes sense. To apply it to my pseudo code, the second transaction would unblock when the row was deleted in the first rather than when the first transaction commits or rolls back.

One question though:

You've informed the system you are going to be updating rows on the table but as yet have not given it specific rows to protect.

Without #*1, a simple experiment shows that two processes can be in the critical section at the same time. Add #*1 seems to achieve the desired behavior, but is it really?

The index will not allow duplicates to be inserted and the first one to try forces all other potential insertions to wait until the first one commits;

Is this true in all situations? That is, will an index insertion in one transaction block the index insertion in another transaction? Presumably only only if the index entries match, but how does this work in practice? Does the second transaction block when it sees the conflict? Or does it proceed on its own version of the index proceeding until it tries to commit?

Thanks again,
Joe