Обсуждение: INSERT ON CONFLICT and RETURNING
Hi hackers, I am sorry for the question which may be already discussed multiple times. But I have not found answer for it neither in internet neither in pgsql-hackers archieve. UPSERT (INSERT ... IN CONFLICT...) clause was added to the Postgres a long time ago. As far as I remember there was long discussions about its syntax and functionality. But today I found that there is still no way to perform one of the most frequently needed operation: locate record by key and return its autogenerated ID or insert new record if key is absent. Something like this: create table jsonb_schemas(id serial, schema bytea primary key); create index on jsonb_schemas(id); insert into jsonb_schemas (schema) values (?) on conflict(schema) do nothing returning id; But it doesn't work because in case of conflict no value is returned. It is possible to do something like this: with ins as (insert into jsonb_schemas (schema) values (obj_schema) on conflict(schema) do nothing returning id) select coalesce((select id from ins),(select id from jsonb_schemas where schema=obj_schema)); but it requires extra lookup. Or perform update: insert into jsonb_schemas (schema) values (?) on conflict(schema) do update set schema=excluded.schema returning id; But it is even worse because we have to perform useless update and produce new version. May be I missing something, but according to stackoverflow: https://stackoverflow.com/questions/34708509/how-to-use-returning-with-on-conflict-in-postgresql there is no better solution. I wonder how it can happen that such popular use case ia not covered by Postgresql UPSERT? Are there some principle problems with it? Why it is not possible to add one more on-conflict action: SELECT, making it possible to return data when key is found? Thanks in advance, Konstantin
On Sat, 22 Aug 2020 at 08:16, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > It is possible to do something like this: > > with ins as (insert into jsonb_schemas (schema) values (obj_schema) > on conflict(schema) do nothing returning id) select coalesce((select id > from ins),(select id from jsonb_schemas where schema=obj_schema)); > > but it requires extra lookup. But if INSERT INTO jsonb_schemas (schema) VALUES (obj_schema) ON CONFLICT (schema) DO NOTHING RETURNING id were to work then that would _also_ require a second lookup, since "id" is not part of the conflict key that will be used to perform the existence test, so the only difference is it's hidden by the syntax. Geoff
On 24.08.2020 13:37, Geoff Winkless wrote: > On Sat, 22 Aug 2020 at 08:16, Konstantin Knizhnik > <k.knizhnik@postgrespro.ru> wrote: >> It is possible to do something like this: >> >> with ins as (insert into jsonb_schemas (schema) values (obj_schema) >> on conflict(schema) do nothing returning id) select coalesce((select id >> from ins),(select id from jsonb_schemas where schema=obj_schema)); >> >> but it requires extra lookup. > But if > > INSERT INTO jsonb_schemas (schema) VALUES (obj_schema) > ON CONFLICT (schema) DO NOTHING RETURNING id > > were to work then that would _also_ require a second lookup, since > "id" is not part of the conflict key that will be used to perform the > existence test, so the only difference is it's hidden by the syntax. > > Geoff Sorry, I didn't quite understand it. If we are doing such query: INSERT INTO jsonb_schemas (schema) VALUES (obj_schema) ON CONFLICT (schema) DO UPDATE schema=jsonb_schemas.schema RETURNING id Then as far as I understand no extra lookup is used to return ID: Insert on jsonb_schemas (cost=0.00..0.01 rows=1 width=36) (actual time=0.035..0.036 rows=0 loops=1) Conflict Resolution: UPDATE Conflict Arbiter Indexes:jsonb_schemas_schema_key Conflict Filter: false Rows Removed by Conflict Filter: 1 Tuples Inserted: 0 Conflicting Tuples: 1 -> Result (cost=0.00..0.01 rows=1 width=36) (actual time=0.002..0.002 rows=1 loops=1) Planning Time: 0.034 ms Execution Time: 0.065 ms (10 rows) So if we are able to efficienty execute query above, why we can not write query: INSERT INTO jsonb_schemas (schema) VALUES (obj_schema) ON CONFLICT (schema) DO SELECT ID RETURNING id -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 22.08.2020 10:16, Konstantin Knizhnik wrote: > Hi hackers, > > I am sorry for the question which may be already discussed multiple > times. > But I have not found answer for it neither in internet neither in > pgsql-hackers archieve. > UPSERT (INSERT ... IN CONFLICT...) clause was added to the Postgres a > long time ago. > As far as I remember there was long discussions about its syntax and > functionality. > But today I found that there is still no way to perform one of the > most frequently needed operation: > locate record by key and return its autogenerated ID or insert new > record if key is absent. > > Something like this: > > create table jsonb_schemas(id serial, schema bytea primary key); > create index on jsonb_schemas(id); > insert into jsonb_schemas (schema) values (?) on conflict(schema) do > nothing returning id; > > But it doesn't work because in case of conflict no value is returned. > It is possible to do something like this: > > with ins as (insert into jsonb_schemas (schema) values (obj_schema) > on conflict(schema) do nothing returning id) select coalesce((select > id from ins),(select id from jsonb_schemas where schema=obj_schema)); > > but it requires extra lookup. > Or perform update: > > insert into jsonb_schemas (schema) values (?) on conflict(schema) do > update set schema=excluded.schema returning id; > > But it is even worse because we have to perform useless update and > produce new version. > > May be I missing something, but according to stackoverflow: > https://stackoverflow.com/questions/34708509/how-to-use-returning-with-on-conflict-in-postgresql > > there is no better solution. > > I wonder how it can happen that such popular use case ia not covered > by Postgresql UPSERT? > Are there some principle problems with it? > Why it is not possible to add one more on-conflict action: SELECT, > making it possible to return data when key is found? > > Thanks in advance, > Konstantin I'm sorry for been intrusive. But can somebody familiar with Postgres upsert mechanism explain me why current implementation doesn't support very popular use case: locate record by some unique key and and return its primary (autogenerated) key if found otherwise insert new tuple. I have explained the possible workarounds of the problem above. But all of them looks awful or inefficient. What I am suggesting is just add ON CONFLICT DO SELECT clause: insert into jsonb_schemas (schema) values ('one') on conflict(schema) do select returning id; I attached small patch with prototype implementation of this construction. It seems to be very trivial. What's wring with it? Are there some fundamental problems which I do not understand? Below is small illustration of how this patch is working: postgres=# create table jsonb_schemas(id serial, schema bytea primary key); CREATE TABLE postgres=# create index on jsonb_schemas(id); CREATE INDEX postgres=# insert into jsonb_schemas (schema) values ('some') on conflict(schema) do nothing returning id; id ---- 1 (1 row) INSERT 0 1 postgres=# insert into jsonb_schemas (schema) values ('some') on conflict(schema) do nothing returning id; id ---- (0 rows) INSERT 0 0 postgres=# insert into jsonb_schemas (schema) values ('some') on conflict(schema) do select returning id; id ---- 1 (1 row) INSERT 0 1 Thanks in advance, Konstantin
Вложения
On 03.09.2020 19:30, Marko Tiikkaja wrote: > There's prior art on this: https://commitfest.postgresql.org/15/1241/ > > > .m Ooops:( Thank you. I missed it. But frankly speaking I still didn't find answer for my question in this thread: what are the dangerous scenarios with ON CONFLICT DO NOTHING/SELECT. Yes, record is not exclusively locked. But I just want to obtain value of some column which is not a source of conflict. I do not understand what can be wrong if some other transaction changed this column. And I certainly can't agree with Peter's statement: > Whereas here, with ON CONFLICT DO SELECT, > I see a somewhat greater risk, and a much, much smaller benefit. A > benefit that might actually be indistinguishable from zero. From my point of view it is quite common use case when we need to convert some long key to small autogenerated record identifier. Without UPSERT we have to perform two queries instead of just one . And even with current implementation of INSERT ON CONFLICT... we have to either perform extra lookup, either produce new (useless) tuple version. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Mon, 31 Aug 2020 at 14:53, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > If we are doing such query: > > INSERT INTO jsonb_schemas (schema) VALUES (obj_schema) > ON CONFLICT (schema) DO UPDATE schema=jsonb_schemas.schema RETURNING id > > > Then as far as I understand no extra lookup is used to return ID: The conflict resolution checks the unique index on (schema) and decides whether or not a conflict will exist. For DO NOTHING it doesn't have to get the actual row from the table; however in order for it to return the ID it would have to go and get the existing row from the table. That's the "extra lookup", as you term it. The only difference from doing it with RETURNING id versus WITH... COALESCE() as you described is the simpler syntax. I'm not saying the simpler syntax isn't nice, mind you. I was just pointing out that it's not inherently any less efficient. Geoff
On Thu, Sep 3, 2020 at 7:56 PM Geoff Winkless <pgsqladmin@geoff.dj> wrote: > > On Mon, 31 Aug 2020 at 14:53, Konstantin Knizhnik > <k.knizhnik@postgrespro.ru> wrote: > > If we are doing such query: > > > > INSERT INTO jsonb_schemas (schema) VALUES (obj_schema) > > ON CONFLICT (schema) DO UPDATE schema=jsonb_schemas.schema RETURNING id > > > > > > Then as far as I understand no extra lookup is used to return ID: > > The conflict resolution checks the unique index on (schema) and > decides whether or not a conflict will exist. For DO NOTHING it > doesn't have to get the actual row from the table; however in order > for it to return the ID it would have to go and get the existing row > from the table. That's the "extra lookup", as you term it. The only > difference from doing it with RETURNING id versus WITH... COALESCE() > as you described is the simpler syntax. As I know, conflict resolution still has to fetch heap tuples, see _bt_check_unique(). As I understand it, the issues are as follows. 1) Conflict resolution uses the dirty snapshot. It's unclear whether we can return this tuple to the user, because the query has a different snapshot. Note, that CTE query by Konstantin at thead start doesn't handle all the cases correctly, it can return no rows on conflict. We probably should do the trick similar to the EPQ mechanism for UPDATE. For instance, UPDATE ... RETURNING old.* can return the tuple, which doesn't match the query snapshot. But INSERT ON CONFLICT might have other caveats in this area, it needs careful analysis. 2) Checking unique conflicts inside the index am is already the encapsulation-breaking hack. Returning the heap tuple for index am would be even worse hack. We probably should refactor this whole area before. ------ Regards, Alexander Korotkov
On 9/3/20 6:52 PM, Konstantin Knizhnik wrote: > But frankly speaking I still didn't find answer for my question in this > thread: what are the dangerous scenarios with ON CONFLICT DO > NOTHING/SELECT. > Yes, record is not exclusively locked. But I just want to obtain value > of some column which is not a source of conflict. I do not understand > what can be wrong if some > other transaction changed this column. > > And I certainly can't agree with Peter's statement: > > Whereas here, with ON CONFLICT DO SELECT, > > I see a somewhat greater risk, and a much, much smaller benefit. A > > benefit that might actually be indistinguishable from zero. > > From my point of view it is quite common use case when we need to > convert some long key to small autogenerated record identifier. > Without UPSERT we have to perform two queries instead of just one . And > even with current implementation of INSERT ON CONFLICT... > we have to either perform extra lookup, either produce new (useless) > tuple version. I have no idea about the potential risks here since I am not very familiar with the ON CONFLICT code, but I will chime in and agree that this is indeed a common use case. Selecting and taking a SHARE lock would also be a nice feature. Andreas
On 03.09.2020 19:56, Geoff Winkless wrote: > On Mon, 31 Aug 2020 at 14:53, Konstantin Knizhnik > <k.knizhnik@postgrespro.ru> wrote: >> If we are doing such query: >> >> INSERT INTO jsonb_schemas (schema) VALUES (obj_schema) >> ON CONFLICT (schema) DO UPDATE schema=jsonb_schemas.schema RETURNING id >> >> >> Then as far as I understand no extra lookup is used to return ID: > The conflict resolution checks the unique index on (schema) and > decides whether or not a conflict will exist. For DO NOTHING it > doesn't have to get the actual row from the table; however in order > for it to return the ID it would have to go and get the existing row > from the table. That's the "extra lookup", as you term it. The only > difference from doing it with RETURNING id versus WITH... COALESCE() > as you described is the simpler syntax. Sorry, but there is no exrta lookup in this case. By "lookup" I mean index search. What we are doing in case ON CONFLICT SELECT is just fetching tuple from the buffer. So we are not even loading any data from the disk. By in case with ins as (insert into jsonb_schemas (schema) values (obj_schema) on conflict(schema) do nothing returning id) select coalesce((select id from ins),(select id from jsonb_schemas where schema=obj_schema)); we actually execute extra subquery: select id from jsonb_schemas where schema=obj_schema: explain with ins as (insert into jsonb_schemas (schema) values ('some') on conflict(schema) do nothing returning id) select coalesce((select id from ins),(select id from jsonb_schemas where schema='some')); QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Result (cost=8.21..8.21 rows=1 width=4) CTE ins -> Insert on jsonb_schemas (cost=0.00..0.01 rows=1 width=36) Conflict Resolution: NOTHING Conflict Arbiter Indexes: jsonb_schemas_pkey -> Result (cost=0.00..0.01 rows=1 width=36) InitPlan 2 (returns $2) -> CTE Scan on ins (cost=0.00..0.02 rows=1 width=4) InitPlan 3 (returns $3) -> Index Scan using jsonb_schemas_pkey on jsonb_schemas jsonb_schemas_1 (cost=0.15..8.17 rows=1 width=4) Index Cond: (schema = '\x736f6d65'::bytea) Is it critical? At my system average time of executing this query is 104 usec, and with ON CONFLICT SELECT fix - 82 usec. The difference is no so large, because we in any case insert speculative tuple. But it is incorrect to say that "it's not inherently any less efficient." > I'm not saying the simpler syntax isn't nice, mind you. I was just > pointing out that it's not inherently any less efficient. > > Geoff -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
I have performed comparison of different ways of implementing UPSERT in Postgres.
May be it will be interesting not only for me, so I share my results:
So first of all initialization step:
create table jsonb_schemas(id serial, schema bytea primary key);
create unique index on jsonb_schemas(id);
insert into jsonb_schemas (schema) values ('some') on conflict(schema) do nothing returning id;
Then I test performance of getting ID of exitsed schema:
1. Use plpgsql script to avoid unneeded database modifications:
create function upsert(obj_schema bytea) returns integer as $$
declare
obj_id integer;
begin
select id from jsonb_schemas where schema=obj_schema into obj_id;
if obj_id is null then
insert into jsonb_schemas (schema) values (obj_schema) on conflict(schema) do nothing returning id into obj_id;
if obj_id is null then
select id from jsonb_schemas where schema=obj_schema into obj_id;
end if;
end if;
return obj_id;
end;
$$ language plpgsql;
------------------------
upsert-plpgsql.sql:
select upsert('some');
------------------------
pgbench -n -T 100 -M prepared -f upsert-plpgsql.sql postgres
tps = 45092.241350
2. Use ON CONFLICT DO UPDATE:
upsert-update.sql:
insert into jsonb_schemas (schema) values ('some') on conflict(schema) do update set schema='some' returning id;
------------------------
pgbench -n -T 100 -M prepared -f upsert-update.sql postgres
tps = 9222.344890
3. Use ON CONFLICT DO NOTHING + COALESCE:
upsert-coalecsce.sql:
with ins as (insert into jsonb_schemas (schema) values ('some') on conflict(schema) do nothing returning id) select coalesce((select id from ins),(select id from jsonb_schemas where schema='some'));
------------------------
pgbench -n -T 100 -M prepared -f upsert-coalesce.sql postgres
tps = 28929.353732
4. Use ON CONFLICT DO SELECT
upsert-select.sql:
insert into jsonb_schemas (schema) values ('some') on conflict(schema) do select returning id;
------------------------
pgbench -n -T 100 -M prepared -f upsert-select.sql postgres
ps = 35788.362302
So, as you can see PLpgSQL version, which doesn't modify database if key is found is signficantly faster than others.
And version which always do update is almost five times slower!
Proposed version of upsert with ON CONFLICT DO SELECT is slower than PLpgSQL version (because it has to insert speculative tuple),
but faster than "user-unfriendly" version with COALESCE:
Slightly modified version of my ON CONFLICT DO SELECT patch is attached to this mail.
--
May be it will be interesting not only for me, so I share my results:
So first of all initialization step:
create table jsonb_schemas(id serial, schema bytea primary key);
create unique index on jsonb_schemas(id);
insert into jsonb_schemas (schema) values ('some') on conflict(schema) do nothing returning id;
Then I test performance of getting ID of exitsed schema:
1. Use plpgsql script to avoid unneeded database modifications:
create function upsert(obj_schema bytea) returns integer as $$
declare
obj_id integer;
begin
select id from jsonb_schemas where schema=obj_schema into obj_id;
if obj_id is null then
insert into jsonb_schemas (schema) values (obj_schema) on conflict(schema) do nothing returning id into obj_id;
if obj_id is null then
select id from jsonb_schemas where schema=obj_schema into obj_id;
end if;
end if;
return obj_id;
end;
$$ language plpgsql;
------------------------
upsert-plpgsql.sql:
select upsert('some');
------------------------
pgbench -n -T 100 -M prepared -f upsert-plpgsql.sql postgres
tps = 45092.241350
2. Use ON CONFLICT DO UPDATE:
upsert-update.sql:
insert into jsonb_schemas (schema) values ('some') on conflict(schema) do update set schema='some' returning id;
------------------------
pgbench -n -T 100 -M prepared -f upsert-update.sql postgres
tps = 9222.344890
3. Use ON CONFLICT DO NOTHING + COALESCE:
upsert-coalecsce.sql:
with ins as (insert into jsonb_schemas (schema) values ('some') on conflict(schema) do nothing returning id) select coalesce((select id from ins),(select id from jsonb_schemas where schema='some'));
------------------------
pgbench -n -T 100 -M prepared -f upsert-coalesce.sql postgres
tps = 28929.353732
4. Use ON CONFLICT DO SELECT
upsert-select.sql:
insert into jsonb_schemas (schema) values ('some') on conflict(schema) do select returning id;
------------------------
pgbench -n -T 100 -M prepared -f upsert-select.sql postgres
ps = 35788.362302
So, as you can see PLpgSQL version, which doesn't modify database if key is found is signficantly faster than others.
And version which always do update is almost five times slower!
Proposed version of upsert with ON CONFLICT DO SELECT is slower than PLpgSQL version (because it has to insert speculative tuple),
but faster than "user-unfriendly" version with COALESCE:
Upsert implementation | TPS |
PLpgSQL | 45092 |
ON CONFLICT DO UPDATE | 9222 |
ON CONFLICT DO NOTHING | 28929 |
ON CONFLICT DO SELECT | 35788 |
Slightly modified version of my ON CONFLICT DO SELECT patch is attached to this mail.
--
Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Вложения
út 8. 9. 2020 v 11:06 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:
I have performed comparison of different ways of implementing UPSERT in Postgres.
May be it will be interesting not only for me, so I share my results:
So first of all initialization step:
create table jsonb_schemas(id serial, schema bytea primary key);
create unique index on jsonb_schemas(id);
insert into jsonb_schemas (schema) values ('some') on conflict(schema) do nothing returning id;
Then I test performance of getting ID of exitsed schema:
1. Use plpgsql script to avoid unneeded database modifications:
create function upsert(obj_schema bytea) returns integer as $$
declare
obj_id integer;
begin
select id from jsonb_schemas where schema=obj_schema into obj_id;
if obj_id is null then
insert into jsonb_schemas (schema) values (obj_schema) on conflict(schema) do nothing returning id into obj_id;
if obj_id is null then
select id from jsonb_schemas where schema=obj_schema into obj_id;
end if;
end if;
return obj_id;
end;
$$ language plpgsql;
In parallel execution the plpgsql variant can fail. The possible raise conditions are not handled.
So maybe this is the reason why this is really fast.
Regards
Pavel
------------------------
upsert-plpgsql.sql:
select upsert('some');
------------------------
pgbench -n -T 100 -M prepared -f upsert-plpgsql.sql postgres
tps = 45092.241350
2. Use ON CONFLICT DO UPDATE:
upsert-update.sql:
insert into jsonb_schemas (schema) values ('some') on conflict(schema) do update set schema='some' returning id;
------------------------
pgbench -n -T 100 -M prepared -f upsert-update.sql postgres
tps = 9222.344890
3. Use ON CONFLICT DO NOTHING + COALESCE:
upsert-coalecsce.sql:
with ins as (insert into jsonb_schemas (schema) values ('some') on conflict(schema) do nothing returning id) select coalesce((select id from ins),(select id from jsonb_schemas where schema='some'));
------------------------
pgbench -n -T 100 -M prepared -f upsert-coalesce.sql postgres
tps = 28929.353732
4. Use ON CONFLICT DO SELECT
upsert-select.sql:
insert into jsonb_schemas (schema) values ('some') on conflict(schema) do select returning id;
------------------------
pgbench -n -T 100 -M prepared -f upsert-select.sql postgres
ps = 35788.362302
So, as you can see PLpgSQL version, which doesn't modify database if key is found is signficantly faster than others.
And version which always do update is almost five times slower!
Proposed version of upsert with ON CONFLICT DO SELECT is slower than PLpgSQL version (because it has to insert speculative tuple),
but faster than "user-unfriendly" version with COALESCE:
Upsert implementation TPS PLpgSQL 45092 ON CONFLICT DO UPDATE 9222 ON CONFLICT DO NOTHING 28929 ON CONFLICT DO SELECT 35788
Slightly modified version of my ON CONFLICT DO SELECT patch is attached to this mail.
--Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 08.09.2020 12:34, Pavel Stehule wrote:
út 8. 9. 2020 v 11:06 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:I have performed comparison of different ways of implementing UPSERT in Postgres.
May be it will be interesting not only for me, so I share my results:
So first of all initialization step:
create table jsonb_schemas(id serial, schema bytea primary key);
create unique index on jsonb_schemas(id);
insert into jsonb_schemas (schema) values ('some') on conflict(schema) do nothing returning id;
Then I test performance of getting ID of exitsed schema:
1. Use plpgsql script to avoid unneeded database modifications:
create function upsert(obj_schema bytea) returns integer as $$
declare
obj_id integer;
begin
select id from jsonb_schemas where schema=obj_schema into obj_id;
if obj_id is null then
insert into jsonb_schemas (schema) values (obj_schema) on conflict(schema) do nothing returning id into obj_id;
if obj_id is null then
select id from jsonb_schemas where schema=obj_schema into obj_id;
end if;
end if;
return obj_id;
end;
$$ language plpgsql;In parallel execution the plpgsql variant can fail. The possible raise conditions are not handled.So maybe this is the reason why this is really fast.
With this example I model real use case, where we need to map long key (json schema in this case) to short identifier (serial column in this case).
Rows of jsonb_schemas are never updated: it is append-only dictionary.
In this assumption no race condition can happen with this PLpgSQL implementation (and other implementations of UPSERT as well).
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
út 8. 9. 2020 v 12:34 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:
On 08.09.2020 12:34, Pavel Stehule wrote:út 8. 9. 2020 v 11:06 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:I have performed comparison of different ways of implementing UPSERT in Postgres.
May be it will be interesting not only for me, so I share my results:
So first of all initialization step:
create table jsonb_schemas(id serial, schema bytea primary key);
create unique index on jsonb_schemas(id);
insert into jsonb_schemas (schema) values ('some') on conflict(schema) do nothing returning id;
Then I test performance of getting ID of exitsed schema:
1. Use plpgsql script to avoid unneeded database modifications:
create function upsert(obj_schema bytea) returns integer as $$
declare
obj_id integer;
begin
select id from jsonb_schemas where schema=obj_schema into obj_id;
if obj_id is null then
insert into jsonb_schemas (schema) values (obj_schema) on conflict(schema) do nothing returning id into obj_id;
if obj_id is null then
select id from jsonb_schemas where schema=obj_schema into obj_id;
end if;
end if;
return obj_id;
end;
$$ language plpgsql;In parallel execution the plpgsql variant can fail. The possible raise conditions are not handled.So maybe this is the reason why this is really fast.
With this example I model real use case, where we need to map long key (json schema in this case) to short identifier (serial column in this case).
Rows of jsonb_schemas are never updated: it is append-only dictionary.
In this assumption no race condition can happen with this PLpgSQL implementation (and other implementations of UPSERT as well).
yes, the performance depends on possibilities - and if you can implement optimistic or pessimistic locking (or if you know so there is not race condition possibility)
Pavel
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
út 8. 9. 2020 v 12:34 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:
On 08.09.2020 12:34, Pavel Stehule wrote:út 8. 9. 2020 v 11:06 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:I have performed comparison of different ways of implementing UPSERT in Postgres.
May be it will be interesting not only for me, so I share my results:
So first of all initialization step:
create table jsonb_schemas(id serial, schema bytea primary key);
create unique index on jsonb_schemas(id);
insert into jsonb_schemas (schema) values ('some') on conflict(schema) do nothing returning id;
Then I test performance of getting ID of exitsed schema:
1. Use plpgsql script to avoid unneeded database modifications:
create function upsert(obj_schema bytea) returns integer as $$
declare
obj_id integer;
begin
select id from jsonb_schemas where schema=obj_schema into obj_id;
if obj_id is null then
insert into jsonb_schemas (schema) values (obj_schema) on conflict(schema) do nothing returning id into obj_id;
if obj_id is null then
select id from jsonb_schemas where schema=obj_schema into obj_id;
end if;
end if;
return obj_id;
end;
$$ language plpgsql;In parallel execution the plpgsql variant can fail. The possible raise conditions are not handled.So maybe this is the reason why this is really fast.
With this example I model real use case, where we need to map long key (json schema in this case) to short identifier (serial column in this case).
Rows of jsonb_schemas are never updated: it is append-only dictionary.
In this assumption no race condition can happen with this PLpgSQL implementation (and other implementations of UPSERT as well).
I am not sure, but I think this should be a design and behavior of MERGE statement - it is designed for OLAP (and speed). Unfortunately, this feature stalled (and your benchmarks show so there is clean performance benefit).
Regards
Pavel
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company