Обсуждение: BUG #18773: オートバキュームのリトライ時にreltuplesの値が減少する

Поиск
Список
Период
Сортировка

BUG #18773: オートバキュームのリトライ時にreltuplesの値が減少する

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      18773
Logged by:          勇次 安藤
Email address:      antengynnnn536@gmail.com
PostgreSQL version: 13.6
Operating system:   linax
Description:

現在、auroraposgres SQLを使用して、オートバキュームを起動して、運用を行っています。
オートバキュームの強度を確認したところ、あるテーブルを掴んでいる状態でオートバキュームのリトライが行われると、刀剣情報のreltuplesの値を減少させていることに気づきました。
マニュアル上は、オートバキューム実行時には書き換えられるものかと認識していましたが、リトライ時にも書き換えられており、テストでは、リトライを繰返すたびにあたいど徐々に小さくせいていることがわかりました。


reltuplesの値は、sql実行時に実行計画を作成するときに使用されるところなので、reltuples(テーブル行数)が小さいあたいで実行計画が作成されており、処理が終わりませんでした。analyzeをかけ直して、reltuplesの値を正規の値にしてsqlを実行したところ問題ない時間で処理が完了しています。

■再現テストの実施内容
1.aa_testというテーブルを50000件データで作成。
2.aa_testのテーブルに対して、1件deleteを行い、comitは行わずに掴んだ状態にしておく
3.更にaa_testに対して20000万件のdeleteを行い、こちらはcomitを行い完了しておく
4. 3.で20000万件deleteしたのをきっかけにオートバキュームが起動されるが、2.でテーブルを掴んでいるためリトライを繰返す
5.reltuplesの情報を確認すると、50000からリトライを繰返すたびに減少しており、529まで値が下がった


Re: BUG #18773: オートバキュームのリトライ時にreltuplesの値が減少する

От
Daniel Gustafsson
Дата:
> On 15 Jan 2025, at 06:59, PG Bug reporting form <noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      18773
> Logged by:          勇次 安藤
> Email address:      antengynnnn536@gmail.com
> PostgreSQL version: 13.6
> Operating system:   linax
> Description:
>
> 現在、auroraposgres SQLを使用して、オートバキュームを起動して、運用を行っています。

Re-submitting this bugreport in English will greatly improve your chances of
getting help.  Also note that if the database in question is running Aurora
Postgres then you need to contact AWS support as this bugreport form is only
for open-source PostgreSQL.

--
Daniel Gustafsson





On 2025/01/15 22:32, Daniel Gustafsson wrote:
>> On 15 Jan 2025, at 06:59, PG Bug reporting form <noreply@postgresql.org> wrote:
>>
>> The following bug has been logged on the website:
>>
>> Bug reference:      18773
>> Logged by:          勇次 安藤
>> Email address:      antengynnnn536@gmail.com
>> PostgreSQL version: 13.6
>> Operating system:   linax
>> Description:
>>
>> 現在、auroraposgres SQLを使用して、オートバキュームを起動して、運用を行っています。
> 
> Re-submitting this bugreport in English will greatly improve your chances of
> getting help.

+1

BTW, the reported issue is that when a table contains dead tuples that cannot be
removed due to a long-running transaction, repeated autovacuum runs gradually
decrease the table's reltuples value until it reaches a small number. I also
was able to reproduce this issue on the current HEAD as follows:

Session 1:
=# CREATE TABLE t AS SELECT i FROM generate_series(1, 50000) i;
=# BEGIN;
=# DELETE FROM t WHERE i = 1;
-- (Leave this session open)

Session 2:
=# DELETE FROM t WHERE i > 30000;
=# SELECT relpages, reltuples FROM pg_class WHERE relname = 't';

In my test, reltuples kept decreasing with each autovacuum run, as shown below:
------------
30000
15746
8452
4719
2809
1831
1331
1075
944
877
843
825
816
812
810
808
807
------------

After reltuples drops to 807, running ANALYZE restores the correct count of
live tuples. However, repeated autovacuum runs then cause reltuples to
decrease again.

 From my review of the code, this seems related to the logic in vac_estimate_reltuples().
When a table has many unremovable dead tuples, autovacuum scans pages containing
mostly dead tuples and finds only a small number of live tuples. These few live
tuples are counted toward reltuples, but this small count is indirectly used to
calculate subsequent reltuples values. This feedback loop causes reltuples to
continually decrease in such scenarios.

I'm not sure if this is a bug, and I currently don't have a good solution
for the issue.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION





On 2025/01/15 22:32, Daniel Gustafsson wrote:
>> On 15 Jan 2025, at 06:59, PG Bug reporting form <noreply@postgresql.org> wrote:
>>
>> The following bug has been logged on the website:
>>
>> Bug reference:      18773
>> Logged by:          勇次 安藤
>> Email address:      antengynnnn536@gmail.com
>> PostgreSQL version: 13.6
>> Operating system:   linax
>> Description:
>>
>> 現在、auroraposgres SQLを使用して、オートバキュームを起動して、運用を行っています。
> 
> Re-submitting this bugreport in English will greatly improve your chances of
> getting help.

+1

BTW, the reported issue is that when a table contains dead tuples that cannot be
removed due to a long-running transaction, repeated autovacuum runs gradually
decrease the table's reltuples value until it reaches a small number. I also
was able to reproduce this issue on the current HEAD as follows:

Session 1:
=# CREATE TABLE t AS SELECT i FROM generate_series(1, 50000) i;
=# BEGIN;
=# DELETE FROM t WHERE i = 1;
-- (Leave this session open)

Session 2:
=# DELETE FROM t WHERE i > 30000;
=# SELECT relpages, reltuples FROM pg_class WHERE relname = 't';

In my test, reltuples kept decreasing with each autovacuum run, as shown below:
------------
30000
15746
8452
4719
2809
1831
1331
1075
944
877
843
825
816
812
810
808
807
------------

After reltuples drops to 807, running ANALYZE restores the correct count of
live tuples. However, repeated autovacuum runs then cause reltuples to
decrease again.

 From my review of the code, this seems related to the logic in vac_estimate_reltuples().
When a table has many unremovable dead tuples, autovacuum scans pages containing
mostly dead tuples and finds only a small number of live tuples. These few live
tuples are counted toward reltuples, but this small count is indirectly used to
calculate subsequent reltuples values. This feedback loop causes reltuples to
continually decrease in such scenarios.

I'm not sure if this is a bug, and I currently don't have a good solution
for the issue.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




ダニエルさん
お世話になっております。

今回の事象はauroraで発生している事象になります。
動きとしてはバグの可能性が高いと考えています。ご確認いただけますと助かります。

必要な資料などあれば連絡ください。

まさおさん
情報のご連携ありがとうございます。同じ事象が再現できたとのことで、
現在のところ対応は、バキュームのリトライが発生した場合に、アナライズを実施する手立てしかない認識で、夜間バッチ時間帯の本番環境で起きていることから、長時間稼働したものがあればれんらくをもらって手動でアナライズを行う手立てしかおこなえず、後手後手になっています・・・オンライン前にすべてのテーブルに対してアナライズを行うことでオンラインへの影響をなくす手立てがあるるかと思うのですが、テーブル数が2000を超えているので、オンライン開始までに終わらせられるかなど検証を進めている状況です・・

よろしくお願いします。


From: Fujii Masao <masao.fujii@oss.nttdata.com>
Sent: Thursday, January 16, 2025 1:53 AM
To: Daniel Gustafsson <daniel@yesql.se>; antengynnnn536@gmail.com <antengynnnn536@gmail.com>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: reltuples kept decreasing with each autovacuum run Re: BUG #18773: オートバキュームのリトライ時にreltuplesの値が減少する
 


On 2025/01/15 22:32, Daniel Gustafsson wrote:
>> On 15 Jan 2025, at 06:59, PG Bug reporting form <noreply@postgresql.org> wrote:
>>
>> The following bug has been logged on the website:
>>
>> Bug reference:      18773
>> Logged by:          勇次 安藤
>> Email address:      antengynnnn536@gmail.com
>> PostgreSQL version: 13.6
>> Operating system:   linax
>> Description:
>>
>> 現在、auroraposgres SQLを使用して、オートバキュームを起動して、運用を行っています。
>
> Re-submitting this bugreport in English will greatly improve your chances of
> getting help.

+1

BTW, the reported issue is that when a table contains dead tuples that cannot be
removed due to a long-running transaction, repeated autovacuum runs gradually
decrease the table's reltuples value until it reaches a small number. I also
was able to reproduce this issue on the current HEAD as follows:

Session 1:
=# CREATE TABLE t AS SELECT i FROM generate_series(1, 50000) i;
=# BEGIN;
=# DELETE FROM t WHERE i = 1;
-- (Leave this session open)

Session 2:
=# DELETE FROM t WHERE i > 30000;
=# SELECT relpages, reltuples FROM pg_class WHERE relname = 't';

In my test, reltuples kept decreasing with each autovacuum run, as shown below:
------------
30000
15746
8452
4719
2809
1831
1331
1075
944
877
843
825
816
812
810
808
807
------------

After reltuples drops to 807, running ANALYZE restores the correct count of
live tuples. However, repeated autovacuum runs then cause reltuples to
decrease again.

 From my review of the code, this seems related to the logic in vac_estimate_reltuples().
When a table has many unremovable dead tuples, autovacuum scans pages containing
mostly dead tuples and finds only a small number of live tuples. These few live
tuples are counted toward reltuples, but this small count is indirectly used to
calculate subsequent reltuples values. This feedback loop causes reltuples to
continually decrease in such scenarios.

I'm not sure if this is a bug, and I currently don't have a good solution
for the issue.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION