Обсуждение: BUG #17141: SELECT LIMIT WITH TIES FOR UPDATE SKIP LOCKED returns wrong number of rows

Поиск
Список
Период
Сортировка

BUG #17141: SELECT LIMIT WITH TIES FOR UPDATE SKIP LOCKED returns wrong number of rows

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      17141
Logged by:          Emil Iggland
Email address:      emil@iggland.com
PostgreSQL version: 13.3
Operating system:   Windows
Description:

I am trying to create a queue which should assign multiple tasks to a
worker. I use row locking with FOR UPDATE SKIP LOCKED, but the number of
rows returned are inconsistent with what I expect. 

Minimum example:
 
CREATE TABLE queue (task INTEGER); 
INSERT INTO queue (task)
VALUES (180),(280),(380),(480),(580),(180),(280),(380),(480),(580);

BEGIN; 
SELECT * FROM queue
ORDER BY task DESC
FETCH FIRST 1 ROWS WITH TIES
FOR UPDATE SKIP LOCKED;
/* Some work to be done here */
COMMIT;

select version();
PostgreSQL 13.3, compiled by Visual C++ build 1914, 64-bit

Expected result Worker 1: (580), (580), Actual result Worker 1: (580),
(580)
Expected result Worker 2: (480), (480), Actual result Worker 2: (480)


Re: BUG #17141: SELECT LIMIT WITH TIES FOR UPDATE SKIP LOCKED returns wrong number of rows

От
Emil Iggland
Дата:
FWIW, the following query exhibits the behaviour that I expect, but
isn't really usable in production as I do not know how many rows to
expect per worker.

BEGIN;
SELECT * FROM queue
ORDER BY task DESC
LIMIT 2 OFFSET 0
FOR UPDATE SKIP LOCKED;
COMMIT;





Re: BUG #17141: SELECT LIMIT WITH TIES FOR UPDATE SKIP LOCKED returns wrong number of rows

От
Emil Iggland
Дата:
As does

SELECT * FROM queue
ORDER BY task DESC
FETCH NEXT 1 ROWS WITH TIES
FOR UPDATE SKIP LOCKED;

Note the use of NEXT instead of FIRST


On 2021-08-11 18:38, PG Bug reporting form wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      17141
> Logged by:          Emil Iggland
> Email address:      emil@iggland.com
> PostgreSQL version: 13.3
> Operating system:   Windows
> Description:        
> 
> I am trying to create a queue which should assign multiple tasks to a
> worker. I use row locking with FOR UPDATE SKIP LOCKED, but the number of
> rows returned are inconsistent with what I expect. 
> 
> Minimum example:
>  
> CREATE TABLE queue (task INTEGER); 
> INSERT INTO queue (task)
> VALUES (180),(280),(380),(480),(580),(180),(280),(380),(480),(580);
> 
> BEGIN; 
> SELECT * FROM queue
> ORDER BY task DESC
> FETCH FIRST 1 ROWS WITH TIES
> FOR UPDATE SKIP LOCKED;
> /* Some work to be done here */
> COMMIT;
> 
> select version();
> PostgreSQL 13.3, compiled by Visual C++ build 1914, 64-bit
> 
> Expected result Worker 1: (580), (580), Actual result Worker 1: (580),
> (580)
> Expected result Worker 2: (480), (480), Actual result Worker 2: (480)
> 



Re: BUG #17141: SELECT LIMIT WITH TIES FOR UPDATE SKIP LOCKED returns wrong number of rows

От
Alvaro Herrera
Дата:
On 2021-Aug-11, PG Bug reporting form wrote:

> BEGIN; 
> SELECT * FROM queue
> ORDER BY task DESC
> FETCH FIRST 1 ROWS WITH TIES
> FOR UPDATE SKIP LOCKED;
> /* Some work to be done here */
> COMMIT;
> 
> select version();
> PostgreSQL 13.3, compiled by Visual C++ build 1914, 64-bit
> 
> Expected result Worker 1: (580), (580), Actual result Worker 1: (580), (580)
> Expected result Worker 2: (480), (480), Actual result Worker 2: (480)

Ouch, we already saw this actually:
https://postgr.es/m/16676-fd62c3c835880da6@postgresql.org
The problem is that the first worker locks the first (480) row (even
though it does not return it), so the second worker skips it due to SKIP
LOCKED.

I have this on my list of things to look at, but it's not at the top
yet sadly ...

-- 
Álvaro Herrera



Re: BUG #17141: SELECT LIMIT WITH TIES FOR UPDATE SKIP LOCKED returns wrong number of rows

От
David Christensen
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

> On 2021-Aug-11, PG Bug reporting form wrote:
>
>> BEGIN; 
>> SELECT * FROM queue
>> ORDER BY task DESC
>> FETCH FIRST 1 ROWS WITH TIES
>> FOR UPDATE SKIP LOCKED;
>> /* Some work to be done here */
>> COMMIT;
>> 
>> select version();
>> PostgreSQL 13.3, compiled by Visual C++ build 1914, 64-bit
>> 
>> Expected result Worker 1: (580), (580), Actual result Worker 1: (580), (580)
>> Expected result Worker 2: (480), (480), Actual result Worker 2: (480)
>
> Ouch, we already saw this actually:
> https://postgr.es/m/16676-fd62c3c835880da6@postgresql.org
> The problem is that the first worker locks the first (480) row (even
> though it does not return it), so the second worker skips it due to SKIP
> LOCKED.
>
> I have this on my list of things to look at, but it's not at the top
> yet sadly ...

Yeah, I'd looked at this when I found it, and short of detecting the situation "WITH TIES FOR UPDATE
SKIP LOCKED" and erroring out, it seems like it would require adding in infrastructure that we don't
support (AFAIK) with unlocking an already locked row inside a transaction or reworking the order of
LockRows and Limit such that Limit comes first (and itself handles the WITH TIES) before handing to
LockRows.  Either way (other than the error), it seems to be a fairly invasive change.

If someone has another idea on how to handle this, I could take a stab at things.  Detecting the
situation and erroring seems like the easiest way to handle so you're at least not getting back bad
results, though I agree that the functionality would be useful if we *could* support it somehow.

David
-- 



Re: BUG #17141: SELECT LIMIT WITH TIES FOR UPDATE SKIP LOCKED returns wrong number of rows

От
Emil Iggland
Дата:
I continued trying the variations in the linked thread.

* Sub-query
BEGIN;
SELECT * FROM (SELECT * FROM queue
ORDER BY task DESC
FETCH NEXT 1 ROWS WITH TIES) t
FOR UPDATE SKIP LOCKED;
COMMIT;

This behaves the same way, this does not work around the bug.
The same goes for my previous "find" with NEXT. I can not replicate the
working state, I must have done something wrong last night.

I added some more tasks with the same number in order to see if there
was a problem with the first row, or with the count.

I now have the following counts:
task    count(*)
180    2
280    2
380    4
480    3
580    2

I attempted to select multiple tasks at the same time, representing a
case where a worker might select multiple tasks.

SELECT * FROM queue
ORDER BY task DESC
FETCH FIRST 3 ROWS WITH TIES
FOR UPDATE SKIP LOCKED;

Here I get three rows back (580), (580), (480)

If I run
SELECT * FROM queue ORDER BY task DESC
FETCH FIRST 3 ROWS WITH TIES;
I get back 5 rows (580), (580), (480), (480), (480) as expected.

/Emil




On 2021-08-11 23:39, David Christensen wrote:
> 
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> 
>> On 2021-Aug-11, PG Bug reporting form wrote:
>>
>>> BEGIN; 
>>> SELECT * FROM queue
>>> ORDER BY task DESC
>>> FETCH FIRST 1 ROWS WITH TIES
>>> FOR UPDATE SKIP LOCKED;
>>> /* Some work to be done here */
>>> COMMIT;
>>>
>>> select version();
>>> PostgreSQL 13.3, compiled by Visual C++ build 1914, 64-bit
>>>
>>> Expected result Worker 1: (580), (580), Actual result Worker 1: (580), (580)
>>> Expected result Worker 2: (480), (480), Actual result Worker 2: (480)
>>
>> Ouch, we already saw this actually:
>> https://postgr.es/m/16676-fd62c3c835880da6@postgresql.org
>> The problem is that the first worker locks the first (480) row (even
>> though it does not return it), so the second worker skips it due to SKIP
>> LOCKED.
>>
>> I have this on my list of things to look at, but it's not at the top
>> yet sadly ...
> 
> Yeah, I'd looked at this when I found it, and short of detecting the situation "WITH TIES FOR UPDATE
> SKIP LOCKED" and erroring out, it seems like it would require adding in infrastructure that we don't
> support (AFAIK) with unlocking an already locked row inside a transaction or reworking the order of
> LockRows and Limit such that Limit comes first (and itself handles the WITH TIES) before handing to
> LockRows.  Either way (other than the error), it seems to be a fairly invasive change.
> 
> If someone has another idea on how to handle this, I could take a stab at things.  Detecting the
> situation and erroring seems like the easiest way to handle so you're at least not getting back bad
> results, though I agree that the functionality would be useful if we *could* support it somehow.
> 
> David
> 



Re: BUG #17141: SELECT LIMIT WITH TIES FOR UPDATE SKIP LOCKED returns wrong number of rows

От
David Christensen
Дата:
On Wed, Aug 11, 2021 at 3:07 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2021-Aug-11, PG Bug reporting form wrote:
>
> > BEGIN;
> > SELECT * FROM queue
> > ORDER BY task DESC
> > FETCH FIRST 1 ROWS WITH TIES
> > FOR UPDATE SKIP LOCKED;
> > /* Some work to be done here */
> > COMMIT;
> >
> > select version();
> > PostgreSQL 13.3, compiled by Visual C++ build 1914, 64-bit
> >
> > Expected result Worker 1: (580), (580), Actual result Worker 1: (580), (580)
> > Expected result Worker 2: (480), (480), Actual result Worker 2: (480)
>
> Ouch, we already saw this actually:
> https://postgr.es/m/16676-fd62c3c835880da6@postgresql.org
> The problem is that the first worker locks the first (480) row (even
> though it does not return it), so the second worker skips it due to SKIP
> LOCKED.
>
> I have this on my list of things to look at, but it's not at the top
> yet sadly ...

I have written a patch[1] to detect this situation and error out
instead of silently not returning some of the rows it ostensibly
should have.  I'm not convinced it's the *right* solution, as we may
want to allow the existing types of SELECT that currently trigger this
to run instead, but it is at least a solution and the start of a
discussion.

Best,

David

[1] https://www.postgresql.org/message-id/CAOxo6XLPccCKru3xPMaYDpa%2BAXyPeWFs%2BSskrrL%2BHKwDjJnLhg%40mail.gmail.com



Re: BUG #17141: SELECT LIMIT WITH TIES FOR UPDATE SKIP LOCKED returns wrong number of rows

От
Emil Iggland
Дата:
I tried to read the other thread to understand the underlying problem of
an extra row being locked, but couldn't find any good explanation.

I see a lot of thought being put into how this issue can be worked
around, but very little discussion on if this behaviour is correct or
not. Without having thought about it much deeper, this seems to only be
a problem with the "WITH TIES" clause which provokes this extra row
being locked. Perhaps that is where the problem should be attacked.


On 2021-08-13 23:46, David Christensen wrote:
> On Wed, Aug 11, 2021 at 3:07 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>>
>> On 2021-Aug-11, PG Bug reporting form wrote:
>>
>>> BEGIN;
>>> SELECT * FROM queue
>>> ORDER BY task DESC
>>> FETCH FIRST 1 ROWS WITH TIES
>>> FOR UPDATE SKIP LOCKED;
>>> /* Some work to be done here */
>>> COMMIT;
>>>
>>> select version();
>>> PostgreSQL 13.3, compiled by Visual C++ build 1914, 64-bit
>>>
>>> Expected result Worker 1: (580), (580), Actual result Worker 1: (580), (580)
>>> Expected result Worker 2: (480), (480), Actual result Worker 2: (480)
>>
>> Ouch, we already saw this actually:
>> https://postgr.es/m/16676-fd62c3c835880da6@postgresql.org
>> The problem is that the first worker locks the first (480) row (even
>> though it does not return it), so the second worker skips it due to SKIP
>> LOCKED.
>>
>> I have this on my list of things to look at, but it's not at the top
>> yet sadly ...
> 
> I have written a patch[1] to detect this situation and error out
> instead of silently not returning some of the rows it ostensibly
> should have.  I'm not convinced it's the *right* solution, as we may
> want to allow the existing types of SELECT that currently trigger this
> to run instead, but it is at least a solution and the start of a
> discussion.
> 
> Best,
> 
> David
> 
> [1] https://www.postgresql.org/message-id/CAOxo6XLPccCKru3xPMaYDpa%2BAXyPeWFs%2BSskrrL%2BHKwDjJnLhg%40mail.gmail.com
> 



Re: BUG #17141: SELECT LIMIT WITH TIES FOR UPDATE SKIP LOCKED returns wrong number of rows

От
Alvaro Herrera
Дата:
On 2021-Aug-25, Emil Iggland wrote:

> I see a lot of thought being put into how this issue can be worked
> around, but very little discussion on if this behaviour is correct or
> not. Without having thought about it much deeper, this seems to only be
> a problem with the "WITH TIES" clause which provokes this extra row
> being locked. Perhaps that is where the problem should be attacked.

The problem is that when the WITH TIES clause is added, we need to read
one extra row after the one that reaches the LIMIT count, in order to
verify whether it (the next one) should be included due to a tie.  With
the executor structure that we currently have, there is no way to read
that row and not lock it.  So a good fix would be to separate the act of
locking the row from the act of reading it.

-- 
Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/



Re: BUG #17141: SELECT LIMIT WITH TIES FOR UPDATE SKIP LOCKED returns wrong number of rows

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2021-Aug-25, Emil Iggland wrote:
>> I see a lot of thought being put into how this issue can be worked
>> around, but very little discussion on if this behaviour is correct or
>> not. Without having thought about it much deeper, this seems to only be
>> a problem with the "WITH TIES" clause which provokes this extra row
>> being locked. Perhaps that is where the problem should be attacked.

> The problem is that when the WITH TIES clause is added, we need to read
> one extra row after the one that reaches the LIMIT count, in order to
> verify whether it (the next one) should be included due to a tie.  With
> the executor structure that we currently have, there is no way to read
> that row and not lock it.  So a good fix would be to separate the act of
> locking the row from the act of reading it.

I'm not convinced that's a "good fix".  SELECT FOR UPDATE generally
considers that it should return rows only if they still match the
WHERE condition after they're locked.  The natural extension of that
to WITH TIES is that you ought to check for equality after locking.
So these features seem rather fundamentally in conflict.

            regards, tom lane



Re: BUG #17141: SELECT LIMIT WITH TIES FOR UPDATE SKIP LOCKED returns wrong number of rows

От
Emil Iggland
Дата:
To be it would seem that a "peek()" type function would do a good job
also, you can always peek at the next row to determine if it is not a
candidate, if it is the "look and lock"-it.
Then you can keep both features.


On 2021-10-03 00:30, Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
>> On 2021-Aug-25, Emil Iggland wrote:
>>> I see a lot of thought being put into how this issue can be worked
>>> around, but very little discussion on if this behaviour is correct or
>>> not. Without having thought about it much deeper, this seems to only be
>>> a problem with the "WITH TIES" clause which provokes this extra row
>>> being locked. Perhaps that is where the problem should be attacked.
> 
>> The problem is that when the WITH TIES clause is added, we need to read
>> one extra row after the one that reaches the LIMIT count, in order to
>> verify whether it (the next one) should be included due to a tie.  With
>> the executor structure that we currently have, there is no way to read
>> that row and not lock it.  So a good fix would be to separate the act of
>> locking the row from the act of reading it.
> 
> I'm not convinced that's a "good fix".  SELECT FOR UPDATE generally
> considers that it should return rows only if they still match the
> WHERE condition after they're locked.  The natural extension of that
> to WITH TIES is that you ought to check for equality after locking.
> So these features seem rather fundamentally in conflict.
> 
>             regards, tom lane
>