Обсуждение: BUG #13523: Unexplained deadlocks (possible race condition)

Поиск
Список
Период
Сортировка

BUG #13523: Unexplained deadlocks (possible race condition)

От
jack@douglastechnology.co.uk
Дата:
The following bug has been logged on the website:

Bug reference:      13523
Logged by:          Jack Douglas
Email address:      jack@douglastechnology.co.uk
PostgreSQL version: 9.4.3
Operating system:   Debian Jessie
Description:

Hi

I'm getting intermittent (but easily reproducible) deadlocks showing in my
error log, for example:

2015-07-28 16:46:19 BST ERROR:  deadlock detected
2015-07-28 16:46:19 BST DETAIL:  Process 9394 waits for ExclusiveLock on
relation 65605 of database 12141; blocked by process 9393.
        Process 9393 waits for ExclusiveLock on relation 65605 of database
12141; blocked by process 9394.
        Process 9394: select f_test('blah')
        Process 9393: select f_test('blah')
2015-07-28 16:46:19 BST HINT:  See server log for query details.
2015-07-28 16:46:19 BST CONTEXT:  SQL function "f_test" statement 1
2015-07-28 16:46:19 BST STATEMENT:  select f_test('blah')

The above is from a minimal test case I've attempted to create rather than
my production code. The test case is as follows:

create table test( id serial primary key, val text );

create function f_test(v text) returns integer language sql security definer
set search_path = postgres,pg_temp as $$
  lock test in exclusive mode;
  insert into test(val) select v where not exists(select * from test where
val=v);
  select id from test where val=v;
$$;

then from two bash sessions simultaneously:

for i in {1..1000}; do psql postgres postgres -c "select f_test('blah')";
done

This produces a handful of deadlocks on my server, however using a VM on a
laptop another postgres user said he got far more.

More detail is on the question I posted at DBA.SE (and it's probably easier
to read the code there:

http://dba.stackexchange.com/q/108290/1396

Kind regards
Jack

Re: BUG #13523: Unexplained deadlocks (possible race condition)

От
Tom Lane
Дата:
jack@douglastechnology.co.uk writes:
> I'm getting intermittent (but easily reproducible) deadlocks showing in my
> error log, for example:

> The above is from a minimal test case I've attempted to create rather than
> my production code. The test case is as follows:

> create table test( id serial primary key, val text );

> create function f_test(v text) returns integer language sql security definer
> set search_path = postgres,pg_temp as $$
>   lock test in exclusive mode;
>   insert into test(val) select v where not exists(select * from test where
> val=v);
>   select id from test where val=v;
> $$;

I believe the issue with this is that a SQL function will do parsing (and
maybe planning too; don't feel like checking the code right now) for the
entire function body at once.  This means that due to the INSERT command
you acquire RowExclusiveLock on the "test" table during function body
parsing, before the LOCK command actually executes.  So the LOCK
represents a lock escalation attempt, and deadlocks are to be expected.

This coding technique would be safe in plpgsql, but not in a SQL-language
function.

There have been discussions of reimplementing SQL-language functions so
that parsing occurs one statement at a time, but don't hold your breath
about something happening in that direction; it doesn't seem to be a
high priority concern for anybody.

            regards, tom lane

Re: BUG #13523: Unexplained deadlocks (possible race condition)

От
"Jack Douglas"
Дата:
> I believe the issue with this is that a SQL function will do parsing (and
maybe planning too; don't feel like checking the code right now) for the
entire function body at once.  This means that due to the INSERT command you
acquire RowExclusiveLock on the "test" table during function body parsing,
before the LOCK command actually executes.  So the LOCK represents a lock
escalation attempt, and deadlocks are to be expected.

That makes perfect sense, many thanks for the explanation.

> This coding technique would be safe in plpgsql, but not in a SQL-language
function.

That's useful to know - I already worked around the issue with a retry-loop
(as I'm basically doing an upsert the 'bad' way with `lock table` - roll on
9.5!)

Kind regards
Jack

Re: BUG #13523: Unexplained deadlocks (possible race condition)

От
Andres Freund
Дата:
On 2015-07-28 16:28:23 +0000, jack@douglastechnology.co.uk wrote:
> 2015-07-28 16:46:19 BST ERROR:  deadlock detected
> 2015-07-28 16:46:19 BST DETAIL:  Process 9394 waits for ExclusiveLock on
> relation 65605 of database 12141; blocked by process 9393.
>         Process 9393 waits for ExclusiveLock on relation 65605 of database
> 12141; blocked by process 9394.
>         Process 9394: select f_test('blah')
>         Process 9393: select f_test('blah')
> 2015-07-28 16:46:19 BST HINT:  See server log for query details.
> 2015-07-28 16:46:19 BST CONTEXT:  SQL function "f_test" statement 1
> 2015-07-28 16:46:19 BST STATEMENT:  select f_test('blah')

I am wondering if the deadlock reports would be a bit easier to debug if
we didn't just say "blocked by process 9394", but also in which lockmode
it's held currently.

Greetings,

Andres Freund

Re: BUG #13523: Unexplained deadlocks (possible race condition)

От
"Jack Douglas"
Дата:
> There have been discussions of reimplementing SQL-language functions so
that parsing occurs one statement at a time, but don't hold your breath
about something happening in that direction; it doesn't seem to be a high
priority concern for anybody.

Perhaps it is worth considering a less drastic change if that is not on the
cards in the immediate future?

If parsing the INSERT aquires the RowExclusiveLock, perhaps parsing the LOCK
statement should also aquire the lock? That would mean the following
principle in the documentation ("...The best defense against deadlocks is
generally to avoid them by being certain that all applications using a
database acquire locks on multiple objects in a consistent order...",
http://www.postgresql.org/docs/9.4/static/explicit-locking.html#LOCKING-DEAD
LOCKS) would be possible (or at least more easily understood) when using
SQL-language functions.

Re: BUG #13523: Unexplained deadlocks (possible race condition)

От
Andres Freund
Дата:
On 2015-07-30 09:23:10 +0100, Jack Douglas wrote:
> If parsing the INSERT aquires the RowExclusiveLock, perhaps parsing the LOCK
> statement should also aquire the lock? That would mean the following
> principle in the documentation ("...The best defense against deadlocks is
> generally to avoid them by being certain that all applications using a
> database acquire locks on multiple objects in a consistent order...",
> http://www.postgresql.org/docs/9.4/static/explicit-locking.html#LOCKING-DEAD
> LOCKS) would be possible (or at least more easily understood) when using
> SQL-language functions.

I don't think that'd help at all? The problem here is the lock upgrade
from RowExclusiveLock to the exclusive lock, and that'll not be fixed by
that proposal?

Regards,

Andres

Re: BUG #13523: Unexplained deadlocks (possible race condition)

От
"Jack Douglas"
Дата:
> I am wondering if the deadlock reports would be a bit easier to debug if
we didn't just say "blocked by process 9394", but also in which lockmode
it's held currently.

That would certainly help, something like this would have been a big clue in
this case:

...
relation 65605 of database 12141; blocked by RowExclusiveLock held by
process 9393.
...

is that what you mean?

Re: BUG #13523: Unexplained deadlocks (possible race condition)

От
"Jack Douglas"
Дата:
> I don't think that'd help at all? The problem here is the lock upgrade
from RowExclusiveLock to the exclusive lock, and that'll not be fixed by
that proposal?

The problem is that the RowExclusiveLock is being aquired in one session
before the Exclusive lock, even though the LOCK TABLE statement is
physically first in the SQL function.

Because the locks are being acquired out-of-order, deadlocks become possible
as another session tries to escalate the lock and waits, then the first
session tries to escalate it's own lock and deadlocks.

Normally this is prevented by acquiring the most restrictive lock first, but
with a SQL-language function (unlike plpgsql for example) this is not
possible.

This is how I understand Tom's initial reply, is that not right?