BUG #18166: 100 Gb 18000000 records table update

Поиск

Список

Период

Сортировка

От	PG Bug reporting form
Тема	BUG #18166: 100 Gb 18000000 records table update
Дата	21 октября 2023 г. 01:09:18
Msg-id	18166-8d0f3718572475ef@postgresql.org обсуждение исходный текст
Ответы	Re: BUG #18166: 100 Gb 18000000 records table update (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-bugs

Дерево обсуждения

The following bug has been logged on the website:

Bug reference:      18166
Logged by:          Ruslan Ganeev
Email address:      ruslan.ganeev@list.ru
PostgreSQL version: 12.6
Operating system:   CentOS 8
Description:

Greetings
Could you help us with the following 

We have a table with name «svjul», that contains 18 million records, 100 Gb
disk space occupied 

The table «svjul» contains 3 columns 
-  ID (Type: Integer) – it’s a key
-  enddate (Type: Date) 
-  Data (Type: JSONB). That column contains fields «OGRN» and «DataVip». The
value in a row «OGRN» is not unique: several records may contain the same
OGRN.

We tried to make a script, which sets enddate = '2022-12-31' for all
records, having the value in «DataVip» that is not maximal.  For other
records the script set s enddate = null
The problem is that the script is running for 6 hours, the main percentage
of time is taken by the rebuilding of indexes.

Would you be so kind to help us with the information whether there is any
possibility to update record without rebuilding indexes. Or may be there is
some programming technique to solve the problem of this kind.

Sorry for my English, I’m not a native speaker

Best regards

--Our script--

-- 0 minute
drop table if exists tempsvjul1;
create table tempsvjul1 (id bigint not null, code text not null, datapostuch
text, datavip text, constraint tempsvjul1_pkey primary key (id));

-- 30 minute
insert into tempsvjul1(id, code, datapostuch, datavip) select id,
data->>'OGRN', data -> 'SvUchetNO' ->> 'DataPostUch', data ->> 'DataVip'
from svjul;

-- 1 minute
create index idx_deduplicate_tempsvjul1_btree_codedpostdvipid on tempsvjul1
using btree (code, datapostuch desc, datavip desc, id desc);

-- 0 minute
drop table if exists tempsvjul2;
create table tempsvjul2(id bigint not null, constraint tempsvjul2_pkey
primary key (id));

-- 2 minute
insert into tempsvjul2(id) select distinct on (code) id from tempsvjul1
order by code, datapostuch desc, datavip desc, id desc;

-- 4 minute without indexes 6 hour with indexes
update svjul set
    code = coalesce((select tempsvjul1.code from tempsvjul1 where
tempsvjul1.id = svjul.id), '???'),
    enddate = case when exists(select 1 from tempsvjul2 where tempsvjul2.id
= svjul.id) then null else '2023-10-05'::date end;

-- 0 minute
drop table tempsvjul1;
drop table tempsvjul2;

В списке pgsql-bugs по дате отправления:

Предыдущее

От: PG Bug reporting form
Дата: 20 октября 2023 г., 23:18:08
Сообщение: BUG #18165: Could not duplicate handle for "Global/PostgreSQL.xxxxxxxxxx": Bad file descriptor

Следующее

От: Tom Lane
Дата: 21 октября 2023 г., 01:16:04
Сообщение: Re: BUG #18165: Could not duplicate handle for "Global/PostgreSQL.xxxxxxxxxx": Bad file descriptor

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

BUG #18166: 100 Gb 18000000 records table update

Предыдущее

Следующее