Обсуждение: INSERT ON CONFLICT and RETURNING

Поиск
Список
Период
Сортировка

INSERT ON CONFLICT and RETURNING

От
Konstantin Knizhnik
Дата:
Hi hackers,

I am sorry for the question which may be already discussed multiple times.
But I have not found answer for it neither in internet neither in 
pgsql-hackers archieve.
UPSERT (INSERT ... IN CONFLICT...) clause was added to the Postgres a 
long time ago.
As far as I remember there was long discussions about its syntax and 
functionality.
But today I found that there is still no way to perform one of the most 
frequently needed operation:
locate record by key and return its autogenerated ID or insert new 
record if key is absent.

Something like this:

   create table jsonb_schemas(id serial, schema bytea primary key);
   create index on jsonb_schemas(id);
   insert into jsonb_schemas (schema) values (?) on conflict(schema) do 
nothing returning id;

But it doesn't work because in case of conflict no value is returned.
It is possible to do something like this:

   with ins as (insert into jsonb_schemas (schema) values (obj_schema) 
on conflict(schema) do nothing returning id) select coalesce((select id 
from ins),(select id from jsonb_schemas where schema=obj_schema));

but it requires extra lookup.
Or perform update:

   insert into jsonb_schemas (schema) values (?) on conflict(schema) do 
update set schema=excluded.schema returning id;

But it is even worse because we have to perform useless update and 
produce new version.

May be I missing something, but according to stackoverflow:
https://stackoverflow.com/questions/34708509/how-to-use-returning-with-on-conflict-in-postgresql
there is no better solution.

I wonder how it can happen that such popular use case ia not covered by 
Postgresql UPSERT?
Are there some principle problems with it?
Why it is not possible to add one more on-conflict action: SELECT, 
making it possible to return data when key is found?

Thanks in advance,
Konstantin







Re: INSERT ON CONFLICT and RETURNING

От
Geoff Winkless
Дата:
On Sat, 22 Aug 2020 at 08:16, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> It is possible to do something like this:
>
>    with ins as (insert into jsonb_schemas (schema) values (obj_schema)
> on conflict(schema) do nothing returning id) select coalesce((select id
> from ins),(select id from jsonb_schemas where schema=obj_schema));
>
> but it requires extra lookup.

But if

INSERT INTO jsonb_schemas (schema) VALUES (obj_schema)
  ON CONFLICT (schema) DO NOTHING RETURNING id

were to work then that would _also_ require a second lookup, since
"id" is not part of the conflict key that will be used to perform the
existence test, so the only difference is it's hidden by the syntax.

Geoff



Re: INSERT ON CONFLICT and RETURNING

От
Konstantin Knizhnik
Дата:

On 24.08.2020 13:37, Geoff Winkless wrote:
> On Sat, 22 Aug 2020 at 08:16, Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> It is possible to do something like this:
>>
>>     with ins as (insert into jsonb_schemas (schema) values (obj_schema)
>> on conflict(schema) do nothing returning id) select coalesce((select id
>> from ins),(select id from jsonb_schemas where schema=obj_schema));
>>
>> but it requires extra lookup.
> But if
>
> INSERT INTO jsonb_schemas (schema) VALUES (obj_schema)
>    ON CONFLICT (schema) DO NOTHING RETURNING id
>
> were to work then that would _also_ require a second lookup, since
> "id" is not part of the conflict key that will be used to perform the
> existence test, so the only difference is it's hidden by the syntax.
>
> Geoff
Sorry, I didn't quite understand it.
If we are doing such query:

INSERT INTO jsonb_schemas (schema) VALUES (obj_schema)
   ON CONFLICT (schema) DO UPDATE schema=jsonb_schemas.schema RETURNING id


Then as far as I understand no extra lookup is used to return ID:

  Insert on jsonb_schemas  (cost=0.00..0.01 rows=1 width=36) (actual 
time=0.035..0.036 rows=0 loops=1)
    Conflict Resolution: UPDATE
    Conflict Arbiter Indexes:jsonb_schemas_schema_key
    Conflict Filter: false
    Rows Removed by Conflict Filter: 1
    Tuples Inserted: 0
    Conflicting Tuples: 1
    ->  Result  (cost=0.00..0.01 rows=1 width=36) (actual 
time=0.002..0.002 rows=1 loops=1)
  Planning Time: 0.034 ms
  Execution Time: 0.065 ms
(10 rows)

So if we are able to efficienty execute query above, why we can not 
write query:

INSERT INTO jsonb_schemas (schema) VALUES (obj_schema)
   ON CONFLICT (schema) DO SELECT ID RETURNING id



-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: INSERT ON CONFLICT and RETURNING

От
Konstantin Knizhnik
Дата:

On 22.08.2020 10:16, Konstantin Knizhnik wrote:
> Hi hackers,
>
> I am sorry for the question which may be already discussed multiple 
> times.
> But I have not found answer for it neither in internet neither in 
> pgsql-hackers archieve.
> UPSERT (INSERT ... IN CONFLICT...) clause was added to the Postgres a 
> long time ago.
> As far as I remember there was long discussions about its syntax and 
> functionality.
> But today I found that there is still no way to perform one of the 
> most frequently needed operation:
> locate record by key and return its autogenerated ID or insert new 
> record if key is absent.
>
> Something like this:
>
>   create table jsonb_schemas(id serial, schema bytea primary key);
>   create index on jsonb_schemas(id);
>   insert into jsonb_schemas (schema) values (?) on conflict(schema) do 
> nothing returning id;
>
> But it doesn't work because in case of conflict no value is returned.
> It is possible to do something like this:
>
>   with ins as (insert into jsonb_schemas (schema) values (obj_schema) 
> on conflict(schema) do nothing returning id) select coalesce((select 
> id from ins),(select id from jsonb_schemas where schema=obj_schema));
>
> but it requires extra lookup.
> Or perform update:
>
>   insert into jsonb_schemas (schema) values (?) on conflict(schema) do 
> update set schema=excluded.schema returning id;
>
> But it is even worse because we have to perform useless update and 
> produce new version.
>
> May be I missing something, but according to stackoverflow:
> https://stackoverflow.com/questions/34708509/how-to-use-returning-with-on-conflict-in-postgresql 
>
> there is no better solution.
>
> I wonder how it can happen that such popular use case ia not covered 
> by Postgresql UPSERT?
> Are there some principle problems with it?
> Why it is not possible to add one more on-conflict action: SELECT, 
> making it possible to return data when key is found?
>
> Thanks in advance,
> Konstantin

I'm sorry for been intrusive.
But can somebody familiar with Postgres upsert mechanism explain me why 
current implementation doesn't support very popular use case:
locate record by some unique key and and return its primary 
(autogenerated) key if found otherwise insert new tuple.
I have explained the possible workarounds of the problem above.
But all of them looks awful or inefficient.

What I am suggesting is just add ON CONFLICT DO SELECT clause:

insert into jsonb_schemas (schema) values ('one') on conflict(schema) do 
select returning id;

I attached small patch with prototype implementation of this construction.
It seems to be very trivial. What's wring with it?
Are there some fundamental problems which I do not understand?

Below is small illustration of how this patch is working:

postgres=# create table jsonb_schemas(id serial, schema bytea primary key);
CREATE TABLE
postgres=# create index on jsonb_schemas(id);
CREATE INDEX
postgres=# insert into jsonb_schemas (schema) values ('some') on 
conflict(schema) do nothing returning id;
  id
----
   1
(1 row)

INSERT 0 1
postgres=# insert into jsonb_schemas (schema) values ('some') on 
conflict(schema) do nothing returning id;
  id
----
(0 rows)

INSERT 0 0
postgres=# insert into jsonb_schemas (schema) values ('some') on 
conflict(schema) do select returning id;
  id
----
   1
(1 row)

INSERT 0 1


Thanks in advance,
Konstantin


Вложения

Re: INSERT ON CONFLICT and RETURNING

От
Marko Tiikkaja
Дата:


.m

Re: INSERT ON CONFLICT and RETURNING

От
Konstantin Knizhnik
Дата:

On 03.09.2020 19:30, Marko Tiikkaja wrote:
> There's prior art on this: https://commitfest.postgresql.org/15/1241/
>
>
> .m
Ooops:(
Thank you.
I missed it.

But frankly speaking I still didn't find answer for my question in this 
thread: what are the dangerous scenarios with ON CONFLICT DO NOTHING/SELECT.
Yes, record is not exclusively locked. But I just want to obtain value 
of some column which is not a source of conflict. I do not understand 
what can be wrong if some
other transaction changed this column.

And I certainly can't agree with Peter's statement:
 > Whereas here, with ON CONFLICT DO SELECT,
 > I see a somewhat greater risk, and a much, much smaller benefit. A
 > benefit that might actually be indistinguishable from zero.

 From my point of view it is quite common use case when we need to 
convert some long key to small autogenerated record identifier.
Without UPSERT we have to perform two queries instead of just one . And 
even with current implementation of INSERT ON CONFLICT...
we have to either perform extra lookup, either produce new (useless) 
tuple version.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: INSERT ON CONFLICT and RETURNING

От
Geoff Winkless
Дата:
On Mon, 31 Aug 2020 at 14:53, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> If we are doing such query:
>
> INSERT INTO jsonb_schemas (schema) VALUES (obj_schema)
>    ON CONFLICT (schema) DO UPDATE schema=jsonb_schemas.schema RETURNING id
>
>
> Then as far as I understand no extra lookup is used to return ID:

The conflict resolution checks the unique index on (schema) and
decides whether or not a conflict will exist. For DO NOTHING it
doesn't have to get the actual row from the table; however in order
for it to return the ID it would have to go and get the existing row
from the table. That's the "extra lookup", as you term it. The only
difference from doing it with RETURNING id versus WITH... COALESCE()
as you described is the simpler syntax.

I'm not saying the simpler syntax isn't nice, mind you. I was just
pointing out that it's not inherently any less efficient.

Geoff



Re: INSERT ON CONFLICT and RETURNING

От
Alexander Korotkov
Дата:
On Thu, Sep 3, 2020 at 7:56 PM Geoff Winkless <pgsqladmin@geoff.dj> wrote:
>
> On Mon, 31 Aug 2020 at 14:53, Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
> > If we are doing such query:
> >
> > INSERT INTO jsonb_schemas (schema) VALUES (obj_schema)
> >    ON CONFLICT (schema) DO UPDATE schema=jsonb_schemas.schema RETURNING id
> >
> >
> > Then as far as I understand no extra lookup is used to return ID:
>
> The conflict resolution checks the unique index on (schema) and
> decides whether or not a conflict will exist. For DO NOTHING it
> doesn't have to get the actual row from the table; however in order
> for it to return the ID it would have to go and get the existing row
> from the table. That's the "extra lookup", as you term it. The only
> difference from doing it with RETURNING id versus WITH... COALESCE()
> as you described is the simpler syntax.

As I know, conflict resolution still has to fetch heap tuples, see
_bt_check_unique().  As I understand it, the issues are as follows.
1) Conflict resolution uses the dirty snapshot.  It's unclear whether
we can return this tuple to the user, because the query has a
different snapshot.  Note, that CTE query by Konstantin at thead start
doesn't handle all the cases correctly, it can return no rows on
conflict. We probably should do the trick similar to the EPQ mechanism
for UPDATE.  For instance, UPDATE ... RETURNING old.* can return the
tuple, which doesn't match the query snapshot.  But INSERT ON CONFLICT
might have other caveats in this area, it needs careful analysis.
2) Checking unique conflicts inside the index am is already the
encapsulation-breaking hack.  Returning the heap tuple for index am
would be even worse hack.  We probably should refactor this whole area
before.

------
Regards,
Alexander Korotkov



Re: INSERT ON CONFLICT and RETURNING

От
Andreas Karlsson
Дата:
On 9/3/20 6:52 PM, Konstantin Knizhnik wrote:
> But frankly speaking I still didn't find answer for my question in this 
> thread: what are the dangerous scenarios with ON CONFLICT DO 
> NOTHING/SELECT.
> Yes, record is not exclusively locked. But I just want to obtain value 
> of some column which is not a source of conflict. I do not understand 
> what can be wrong if some
> other transaction changed this column.
> 
> And I certainly can't agree with Peter's statement:
>  > Whereas here, with ON CONFLICT DO SELECT,
>  > I see a somewhat greater risk, and a much, much smaller benefit. A
>  > benefit that might actually be indistinguishable from zero.
> 
>  From my point of view it is quite common use case when we need to 
> convert some long key to small autogenerated record identifier.
> Without UPSERT we have to perform two queries instead of just one . And 
> even with current implementation of INSERT ON CONFLICT...
> we have to either perform extra lookup, either produce new (useless) 
> tuple version.

I have no idea about the potential risks here since I am not very 
familiar with the ON CONFLICT code, but I will chime in and agree that 
this is indeed a common use case. Selecting and taking a SHARE lock 
would also be a nice feature.

Andreas




Re: INSERT ON CONFLICT and RETURNING

От
Konstantin Knizhnik
Дата:

On 03.09.2020 19:56, Geoff Winkless wrote:
> On Mon, 31 Aug 2020 at 14:53, Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> If we are doing such query:
>>
>> INSERT INTO jsonb_schemas (schema) VALUES (obj_schema)
>>     ON CONFLICT (schema) DO UPDATE schema=jsonb_schemas.schema RETURNING id
>>
>>
>> Then as far as I understand no extra lookup is used to return ID:
> The conflict resolution checks the unique index on (schema) and
> decides whether or not a conflict will exist. For DO NOTHING it
> doesn't have to get the actual row from the table; however in order
> for it to return the ID it would have to go and get the existing row
> from the table. That's the "extra lookup", as you term it. The only
> difference from doing it with RETURNING id versus WITH... COALESCE()
> as you described is the simpler syntax.
Sorry, but there is no exrta lookup in this case.
By "lookup" I mean index search.
What we are doing in case ON CONFLICT SELECT is just fetching tuple from 
the buffer.
So we are not even loading any data from the disk.

By in case

    with ins as (insert into jsonb_schemas (schema) values (obj_schema) 
on conflict(schema) do nothing returning id)
    select coalesce((select id from ins),(select id from jsonb_schemas 
where   schema=obj_schema));

we actually execute extra subquery: select id from jsonb_schemas where 
schema=obj_schema:

explain with ins as (insert into jsonb_schemas (schema) values ('some') 
on conflict(schema) do nothing returning id) select coalesce((select id 
from ins),(select id from jsonb_schemas where schema='some'));
                                                    QUERY PLAN
----------------------------------------------------------------------------------------------------------------
  Result  (cost=8.21..8.21 rows=1 width=4)
    CTE ins
      ->  Insert on jsonb_schemas  (cost=0.00..0.01 rows=1 width=36)
            Conflict Resolution: NOTHING
            Conflict Arbiter Indexes: jsonb_schemas_pkey
            ->  Result  (cost=0.00..0.01 rows=1 width=36)
    InitPlan 2 (returns $2)
      ->  CTE Scan on ins  (cost=0.00..0.02 rows=1 width=4)
    InitPlan 3 (returns $3)
      ->  Index Scan using jsonb_schemas_pkey on jsonb_schemas 
jsonb_schemas_1  (cost=0.15..8.17 rows=1 width=4)
            Index Cond: (schema = '\x736f6d65'::bytea)

Is it critical?
At my system average time of executing this query is 104 usec, and with 
ON CONFLICT SELECT fix - 82 usec.
The difference is no so large, because we in any case insert speculative 
tuple.
But it is incorrect to say that "it's not inherently any less efficient."

> I'm not saying the simpler syntax isn't nice, mind you. I was just
> pointing out that it's not inherently any less efficient.
>
> Geoff



-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: INSERT ON CONFLICT and RETURNING

От
Konstantin Knizhnik
Дата:
I have performed comparison of different ways of implementing UPSERT in Postgres.
May be it will be interesting not only for me, so I share my results:

So first of all initialization step:

  create table jsonb_schemas(id serial, schema bytea primary key);
  create unique index on jsonb_schemas(id);
  insert into jsonb_schemas (schema) values ('some') on conflict(schema) do nothing returning id;

Then I test performance of getting ID of exitsed schema:

1. Use plpgsql script to avoid unneeded database modifications:

create function upsert(obj_schema bytea) returns integer as $$
declare
  obj_id integer;
begin
  select id from jsonb_schemas where schema=obj_schema into obj_id;
  if obj_id is null then
    insert into jsonb_schemas (schema) values (obj_schema) on conflict(schema) do nothing returning id into obj_id;
    if obj_id is null then
      select id from jsonb_schemas where schema=obj_schema into obj_id;
    end if;
  end if;
  return obj_id;
end;
$$ language plpgsql;

------------------------
upsert-plpgsql.sql:
select upsert('some');
------------------------
pgbench -n -T 100 -M prepared -f upsert-plpgsql.sql postgres
tps = 45092.241350

2. Use ON CONFLICT DO UPDATE:

upsert-update.sql:
insert into jsonb_schemas (schema) values ('some') on conflict(schema) do update set schema='some' returning id;
------------------------
pgbench -n -T 100 -M prepared -f upsert-update.sql postgres
tps = 9222.344890


3.  Use ON CONFLICT DO NOTHING + COALESCE:

upsert-coalecsce.sql:
with ins as (insert into jsonb_schemas (schema) values ('some') on conflict(schema) do nothing returning id) select coalesce((select id from ins),(select id from jsonb_schemas where schema='some'));
------------------------
pgbench -n -T 100 -M prepared -f upsert-coalesce.sql postgres
tps = 28929.353732


4. Use ON CONFLICT DO SELECT

upsert-select.sql:
insert into jsonb_schemas (schema) values ('some') on conflict(schema) do select returning id;
------------------------
pgbench -n -T 100 -M prepared -f upsert-select.sql postgres
ps = 35788.362302



So, as you can see PLpgSQL version, which doesn't modify database if key is found is signficantly faster than others.
And version which always do update is  almost five times slower!
Proposed version of upsert with ON CONFLICT DO SELECT is slower than PLpgSQL version (because it has to insert speculative tuple),
but faster than "user-unfriendly" version with COALESCE:

Upsert implementation
TPS
PLpgSQL
45092
ON CONFLICT DO UPDATE9222
ON CONFLICT DO NOTHING 28929
ON CONFLICT DO SELECT35788


Slightly modified version of my ON CONFLICT DO SELECT patch is attached to this mail.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Вложения

Re: INSERT ON CONFLICT and RETURNING

От
Pavel Stehule
Дата:


út 8. 9. 2020 v 11:06 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:
I have performed comparison of different ways of implementing UPSERT in Postgres.
May be it will be interesting not only for me, so I share my results:

So first of all initialization step:

  create table jsonb_schemas(id serial, schema bytea primary key);
  create unique index on jsonb_schemas(id);
  insert into jsonb_schemas (schema) values ('some') on conflict(schema) do nothing returning id;

Then I test performance of getting ID of exitsed schema:

1. Use plpgsql script to avoid unneeded database modifications:

create function upsert(obj_schema bytea) returns integer as $$
declare
  obj_id integer;
begin
  select id from jsonb_schemas where schema=obj_schema into obj_id;
  if obj_id is null then
    insert into jsonb_schemas (schema) values (obj_schema) on conflict(schema) do nothing returning id into obj_id;
    if obj_id is null then
      select id from jsonb_schemas where schema=obj_schema into obj_id;
    end if;
  end if;
  return obj_id;
end;
$$ language plpgsql;

In parallel execution the plpgsql variant can fail. The possible raise conditions are not handled.

So maybe this is the reason why this is really fast.

Regards

Pavel
 

------------------------
upsert-plpgsql.sql:
select upsert('some');
------------------------
pgbench -n -T 100 -M prepared -f upsert-plpgsql.sql postgres
tps = 45092.241350

2. Use ON CONFLICT DO UPDATE:

upsert-update.sql:
insert into jsonb_schemas (schema) values ('some') on conflict(schema) do update set schema='some' returning id;
------------------------
pgbench -n -T 100 -M prepared -f upsert-update.sql postgres
tps = 9222.344890


3.  Use ON CONFLICT DO NOTHING + COALESCE:

upsert-coalecsce.sql:
with ins as (insert into jsonb_schemas (schema) values ('some') on conflict(schema) do nothing returning id) select coalesce((select id from ins),(select id from jsonb_schemas where schema='some'));
------------------------
pgbench -n -T 100 -M prepared -f upsert-coalesce.sql postgres
tps = 28929.353732


4. Use ON CONFLICT DO SELECT

upsert-select.sql:
insert into jsonb_schemas (schema) values ('some') on conflict(schema) do select returning id;
------------------------
pgbench -n -T 100 -M prepared -f upsert-select.sql postgres
ps = 35788.362302



So, as you can see PLpgSQL version, which doesn't modify database if key is found is signficantly faster than others.
And version which always do update is  almost five times slower!
Proposed version of upsert with ON CONFLICT DO SELECT is slower than PLpgSQL version (because it has to insert speculative tuple),
but faster than "user-unfriendly" version with COALESCE:

Upsert implementation
TPS
PLpgSQL
45092
ON CONFLICT DO UPDATE9222
ON CONFLICT DO NOTHING 28929
ON CONFLICT DO SELECT35788


Slightly modified version of my ON CONFLICT DO SELECT patch is attached to this mail.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: INSERT ON CONFLICT and RETURNING

От
Konstantin Knizhnik
Дата:


On 08.09.2020 12:34, Pavel Stehule wrote:


út 8. 9. 2020 v 11:06 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:
I have performed comparison of different ways of implementing UPSERT in Postgres.
May be it will be interesting not only for me, so I share my results:

So first of all initialization step:

  create table jsonb_schemas(id serial, schema bytea primary key);
  create unique index on jsonb_schemas(id);
  insert into jsonb_schemas (schema) values ('some') on conflict(schema) do nothing returning id;

Then I test performance of getting ID of exitsed schema:

1. Use plpgsql script to avoid unneeded database modifications:

create function upsert(obj_schema bytea) returns integer as $$
declare
  obj_id integer;
begin
  select id from jsonb_schemas where schema=obj_schema into obj_id;
  if obj_id is null then
    insert into jsonb_schemas (schema) values (obj_schema) on conflict(schema) do nothing returning id into obj_id;
    if obj_id is null then
      select id from jsonb_schemas where schema=obj_schema into obj_id;
    end if;
  end if;
  return obj_id;
end;
$$ language plpgsql;

In parallel execution the plpgsql variant can fail. The possible raise conditions are not handled.

So maybe this is the reason why this is really fast.

With this example I model real use case, where we need to map long key (json schema in this case) to short  identifier (serial column in this case).
Rows of jsonb_schemas are never updated: it is append-only dictionary.
In this assumption no race condition can happen with this PLpgSQL implementation (and other implementations of UPSERT as well).


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: INSERT ON CONFLICT and RETURNING

От
Pavel Stehule
Дата:


út 8. 9. 2020 v 12:34 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:


On 08.09.2020 12:34, Pavel Stehule wrote:


út 8. 9. 2020 v 11:06 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:
I have performed comparison of different ways of implementing UPSERT in Postgres.
May be it will be interesting not only for me, so I share my results:

So first of all initialization step:

  create table jsonb_schemas(id serial, schema bytea primary key);
  create unique index on jsonb_schemas(id);
  insert into jsonb_schemas (schema) values ('some') on conflict(schema) do nothing returning id;

Then I test performance of getting ID of exitsed schema:

1. Use plpgsql script to avoid unneeded database modifications:

create function upsert(obj_schema bytea) returns integer as $$
declare
  obj_id integer;
begin
  select id from jsonb_schemas where schema=obj_schema into obj_id;
  if obj_id is null then
    insert into jsonb_schemas (schema) values (obj_schema) on conflict(schema) do nothing returning id into obj_id;
    if obj_id is null then
      select id from jsonb_schemas where schema=obj_schema into obj_id;
    end if;
  end if;
  return obj_id;
end;
$$ language plpgsql;

In parallel execution the plpgsql variant can fail. The possible raise conditions are not handled.

So maybe this is the reason why this is really fast.

With this example I model real use case, where we need to map long key (json schema in this case) to short  identifier (serial column in this case).
Rows of jsonb_schemas are never updated: it is append-only dictionary.
In this assumption no race condition can happen with this PLpgSQL implementation (and other implementations of UPSERT as well).

yes, the performance depends on possibilities - and if you can implement optimistic or pessimistic locking (or if you know so there is not race condition possibility)

Pavel



-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: INSERT ON CONFLICT and RETURNING

От
Pavel Stehule
Дата:


út 8. 9. 2020 v 12:34 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:


On 08.09.2020 12:34, Pavel Stehule wrote:


út 8. 9. 2020 v 11:06 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:
I have performed comparison of different ways of implementing UPSERT in Postgres.
May be it will be interesting not only for me, so I share my results:

So first of all initialization step:

  create table jsonb_schemas(id serial, schema bytea primary key);
  create unique index on jsonb_schemas(id);
  insert into jsonb_schemas (schema) values ('some') on conflict(schema) do nothing returning id;

Then I test performance of getting ID of exitsed schema:

1. Use plpgsql script to avoid unneeded database modifications:

create function upsert(obj_schema bytea) returns integer as $$
declare
  obj_id integer;
begin
  select id from jsonb_schemas where schema=obj_schema into obj_id;
  if obj_id is null then
    insert into jsonb_schemas (schema) values (obj_schema) on conflict(schema) do nothing returning id into obj_id;
    if obj_id is null then
      select id from jsonb_schemas where schema=obj_schema into obj_id;
    end if;
  end if;
  return obj_id;
end;
$$ language plpgsql;

In parallel execution the plpgsql variant can fail. The possible raise conditions are not handled.

So maybe this is the reason why this is really fast.

With this example I model real use case, where we need to map long key (json schema in this case) to short  identifier (serial column in this case).
Rows of jsonb_schemas are never updated: it is append-only dictionary.
In this assumption no race condition can happen with this PLpgSQL implementation (and other implementations of UPSERT as well).

I am not sure, but I think this should  be a design and behavior of MERGE statement - it is designed for OLAP (and speed). Unfortunately, this feature stalled (and your benchmarks show so there is clean performance benefit).

Regards

Pavel


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company