BUG #18166: 100 Gb 18000000 records table update

Поиск
Список
Период
Сортировка
От PG Bug reporting form
Тема BUG #18166: 100 Gb 18000000 records table update
Дата
Msg-id 18166-8d0f3718572475ef@postgresql.org
обсуждение исходный текст
Ответы Re: BUG #18166: 100 Gb 18000000 records table update  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      18166
Logged by:          Ruslan Ganeev
Email address:      ruslan.ganeev@list.ru
PostgreSQL version: 12.6
Operating system:   CentOS 8
Description:

Greetings
Could you help us with the following 

We have a table with name «svjul», that contains 18 million records, 100 Gb
disk space occupied 

The table «svjul» contains 3 columns 
-  ID (Type: Integer) – it’s a key
-  enddate (Type: Date) 
-  Data (Type: JSONB). That column contains fields «OGRN» and «DataVip». The
value in a row «OGRN» is not unique: several records may contain the same
OGRN.

We tried to make a script, which sets enddate = '2022-12-31' for all
records, having the value in «DataVip» that is not maximal.  For other
records the script set s enddate = null
The problem is that the script is running for 6 hours, the main percentage
of time is taken by the rebuilding of indexes.

Would you be so kind to help us with the information whether there is any
possibility to update record without rebuilding indexes. Or may be there is
some programming technique to solve the problem of this kind.

Sorry for my English, I’m not a native speaker

Best regards

--Our script--

-- 0 minute
drop table if exists tempsvjul1;
create table tempsvjul1 (id bigint not null, code text not null, datapostuch
text, datavip text, constraint tempsvjul1_pkey primary key (id));

-- 30 minute
insert into tempsvjul1(id, code, datapostuch, datavip) select id,
data->>'OGRN', data -> 'SvUchetNO' ->> 'DataPostUch', data ->> 'DataVip'
from svjul;

-- 1 minute
create index idx_deduplicate_tempsvjul1_btree_codedpostdvipid on tempsvjul1
using btree (code, datapostuch desc, datavip desc, id desc);

-- 0 minute
drop table if exists tempsvjul2;
create table tempsvjul2(id bigint not null, constraint tempsvjul2_pkey
primary key (id));

-- 2 minute
insert into tempsvjul2(id) select distinct on (code) id from tempsvjul1
order by code, datapostuch desc, datavip desc, id desc;

-- 4 minute without indexes 6 hour with indexes
update svjul set
    code = coalesce((select tempsvjul1.code from tempsvjul1 where
tempsvjul1.id = svjul.id), '???'),
    enddate = case when exists(select 1 from tempsvjul2 where tempsvjul2.id
= svjul.id) then null else '2023-10-05'::date end;

-- 0 minute
drop table tempsvjul1;
drop table tempsvjul2;


В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #18165: Could not duplicate handle for "Global/PostgreSQL.xxxxxxxxxx": Bad file descriptor
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #18165: Could not duplicate handle for "Global/PostgreSQL.xxxxxxxxxx": Bad file descriptor