Обсуждение: BUG #18283: vacuum full use a large amount of memory (may cause OOM)
The following bug has been logged on the website: Bug reference: 18283 Logged by: Zhu Yang Email address: fairyfar@msn.com PostgreSQL version: 16.1 Operating system: Red Hat Enterprise Linux Server 7.6 (Maipo) Description: Under certain conditions, a vacuum full will use a lot of memory. The memory usage is out of control, and an OOM may occur. Step to reproduce the behavior: Step 1. Create a new session("Sess 1"), then execute the following query and keep the connection: ```sql -- create table and insert data for testing. create table t1 ( c1 character varying(100), c2 character varying(100), c3 character varying(100), c4 character varying(100), c5 character varying(100), c6 character varying(100) ); create function randomtext(len int) returns text as $$ select string_agg(md5(random()::text),'') from generate_series(1,$1/32) $$ language sql; insert into t1 select randomtext(34),randomtext(34),randomtext(34),randomtext(34),randomtext(34),'RST' from generate_series(1,1000000); ``` Step 2. Create another session("Sess 2"). Start a long transaction, then create any table, and keep the connection: ```sql BEGIN; create table t2(a int); ``` Step 3. Go back to "Sess 1" and continue: ```sql -- Get the PID of the current backend postgres=# select pg_backend_pid(); pg_backend_pid ---------------- 10511 (1 row) -- Can update multiple times update t1 set c2=randomtext(34) where k1='RST'; update t1 set c2=randomtext(34) where k1='RST'; vacuum full t1; ``` Step 4. Create a new bash terminal and view the memory usage of PID 10511 during the execution of 'vacumm full': ```bash [yz@bogon ~]$ top -p 10511 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10511 yz 20 0 1853340 1.6g 141488 D 15.7 21.4 0:14.92 postgres ``` You can observe that 'vacumm full' memory usage (VIRT & RES) is very high, with the peak memory usage increasing with each Step 3 execution. After analysis, the MemoryContext that consumes significant memory is "Table rewrite", and the stack is: ``` #0 0x000000000050f700 in heap_copytuple () #1 0x000000000054f452 in rewrite_heap_tuple () #2 0x000000000054844f in reform_and_rewrite_tuple.isra.0 () #3 0x00000000005488e0 in heapam_relation_copy_for_cluster () #4 0x0000000000616760 in copy_table_data () #5 0x0000000000617846 in cluster_rel () #6 0x0000000000676973 in vacuum_rel () #7 0x0000000000677b9c in vacuum () #8 0x00000000006782dc in ExecVacuum () #9 0x0000000000808859 in standard_ProcessUtility () #10 0x0000000000806f5f in PortalRunUtility () #11 0x000000000080708b in PortalRunMulti () #12 0x000000000080755d in PortalRun () #13 0x0000000000803b28 in exec_simple_query () ... ``` The code that causes the problem is in src/backend/access/heap/rewriteheap.c: ```c void rewrite_heap_tuple(RewriteState state, HeapTuple old_tuple, HeapTuple new_tuple) { ... if (!((old_tuple->t_data->t_infomask & HEAP_XMAX_INVALID) || HeapTupleHeaderIsOnlyLocked(old_tuple->t_data)) && !HeapTupleHeaderIndicatesMovedPartitions(old_tuple->t_data) && !(ItemPointerEquals(&(old_tuple->t_self), &(old_tuple->t_data->t_ctid)))) { // If the code executes inside this block, the allocated memory will not be freed until the query ends. ... ```
The test case in the original report was flawed and I have updated it.
Step 1. Create a new session("Sess 1"), then execute the following query and keep the connection:
```sql
-- create table and insert data for testing.
create table t1 (
c1 character varying(100),
c2 character varying(100),
c3 character varying(100),
c4 character varying(100),
c5 character varying(100),
c6 character varying(100)
);
create function randomtext(len int) returns text as $$
select string_agg(md5(random()::text),'') from generate_series(1,$1/32)
$$ language sql;
insert into t1 select
randomtext(34),randomtext(34),randomtext(34),randomtext(34),randomtext(34),'RST'
from generate_series(1,1000000);
```
Step 2. Create another session("Sess 2"). Start a long transaction, then create any table, and keep the connection:
```sql
BEGIN;
create table t2(a int);
```
Step 3. Go back to "Sess 1" and continue:
```sql
-- Get the PID of the current backend
postgres=# select pg_backend_pid();
pg_backend_pid
----------------
10511
(1 row)
-- Update 5 times
update t1 set c2=randomtext(34) where c6='RST';
update t1 set c2=randomtext(34) where c6='RST';
update t1 set c2=randomtext(34) where c6='RST';
update t1 set c2=randomtext(34) where c6='RST';
update t1 set c2=randomtext(34) where c6='RST';
vacuum full t1;
```
Step 4. Create a new bash terminal and view the memory usage of PID 10511
during the execution of 'vacumm full':
```bash
[yz@bogon ~]$ top -p 10511
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10511 yz 20 0 1853340 1.6g 141488 D 15.7 21.4 0:14.92 postgres
```
Step 3 & 4 can be repeated, and the peak memory of 'vacumm full' will continue to increase.
发件人: PG Bug reporting form <noreply@postgresql.org>
发送时间: 2024年1月11日 14:47
收件人: pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
抄送: fairyfar@msn.com <fairyfar@msn.com>
主题: BUG #18283: vacuum full use a large amount of memory (may cause OOM)
发送时间: 2024年1月11日 14:47
收件人: pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
抄送: fairyfar@msn.com <fairyfar@msn.com>
主题: BUG #18283: vacuum full use a large amount of memory (may cause OOM)
The following bug has been logged on the website:
Bug reference: 18283
Logged by: Zhu Yang
Email address: fairyfar@msn.com
PostgreSQL version: 16.1
Operating system: Red Hat Enterprise Linux Server 7.6 (Maipo)
Description:
Under certain conditions, a vacuum full will use a lot of memory. The memory
usage is out of control, and an OOM may occur.
Step to reproduce the behavior:
Step 1. Create a new session("Sess 1"), then execute the following query and
keep the connection:
```sql
-- create table and insert data for testing.
create table t1 (
c1 character varying(100),
c2 character varying(100),
c3 character varying(100),
c4 character varying(100),
c5 character varying(100),
c6 character varying(100)
);
create function randomtext(len int) returns text as $$
select string_agg(md5(random()::text),'') from generate_series(1,$1/32)
$$ language sql;
insert into t1 select
randomtext(34),randomtext(34),randomtext(34),randomtext(34),randomtext(34),'RST'
from generate_series(1,1000000);
```
Step 2. Create another session("Sess 2"). Start a long transaction, then
create any table, and keep the connection:
```sql
BEGIN;
create table t2(a int);
```
Step 3. Go back to "Sess 1" and continue:
```sql
-- Get the PID of the current backend
postgres=# select pg_backend_pid();
pg_backend_pid
----------------
10511
(1 row)
-- Can update multiple times
update t1 set c2=randomtext(34) where k1='RST';
update t1 set c2=randomtext(34) where k1='RST';
vacuum full t1;
```
Step 4. Create a new bash terminal and view the memory usage of PID 10511
during the execution of 'vacumm full':
```bash
[yz@bogon ~]$ top -p 10511
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
10511 yz 20 0 1853340 1.6g 141488 D 15.7 21.4 0:14.92
postgres
```
You can observe that 'vacumm full' memory usage (VIRT & RES) is very high,
with the peak memory usage increasing with each Step 3 execution.
After analysis, the MemoryContext that consumes significant memory is "Table
rewrite", and the stack is:
```
#0 0x000000000050f700 in heap_copytuple ()
#1 0x000000000054f452 in rewrite_heap_tuple ()
#2 0x000000000054844f in reform_and_rewrite_tuple.isra.0 ()
#3 0x00000000005488e0 in heapam_relation_copy_for_cluster ()
#4 0x0000000000616760 in copy_table_data ()
#5 0x0000000000617846 in cluster_rel ()
#6 0x0000000000676973 in vacuum_rel ()
#7 0x0000000000677b9c in vacuum ()
#8 0x00000000006782dc in ExecVacuum ()
#9 0x0000000000808859 in standard_ProcessUtility ()
#10 0x0000000000806f5f in PortalRunUtility ()
#11 0x000000000080708b in PortalRunMulti ()
#12 0x000000000080755d in PortalRun ()
#13 0x0000000000803b28 in exec_simple_query ()
...
```
The code that causes the problem is in
src/backend/access/heap/rewriteheap.c:
```c
void
rewrite_heap_tuple(RewriteState state,
HeapTuple old_tuple, HeapTuple new_tuple)
{
...
if (!((old_tuple->t_data->t_infomask & HEAP_XMAX_INVALID) ||
HeapTupleHeaderIsOnlyLocked(old_tuple->t_data)) &&
!HeapTupleHeaderIndicatesMovedPartitions(old_tuple->t_data) &&
!(ItemPointerEquals(&(old_tuple->t_self),
&(old_tuple->t_data->t_ctid))))
{
// If the code executes inside this block, the allocated memory will not
be freed until the query ends.
...
```
Bug reference: 18283
Logged by: Zhu Yang
Email address: fairyfar@msn.com
PostgreSQL version: 16.1
Operating system: Red Hat Enterprise Linux Server 7.6 (Maipo)
Description:
Under certain conditions, a vacuum full will use a lot of memory. The memory
usage is out of control, and an OOM may occur.
Step to reproduce the behavior:
Step 1. Create a new session("Sess 1"), then execute the following query and
keep the connection:
```sql
-- create table and insert data for testing.
create table t1 (
c1 character varying(100),
c2 character varying(100),
c3 character varying(100),
c4 character varying(100),
c5 character varying(100),
c6 character varying(100)
);
create function randomtext(len int) returns text as $$
select string_agg(md5(random()::text),'') from generate_series(1,$1/32)
$$ language sql;
insert into t1 select
randomtext(34),randomtext(34),randomtext(34),randomtext(34),randomtext(34),'RST'
from generate_series(1,1000000);
```
Step 2. Create another session("Sess 2"). Start a long transaction, then
create any table, and keep the connection:
```sql
BEGIN;
create table t2(a int);
```
Step 3. Go back to "Sess 1" and continue:
```sql
-- Get the PID of the current backend
postgres=# select pg_backend_pid();
pg_backend_pid
----------------
10511
(1 row)
-- Can update multiple times
update t1 set c2=randomtext(34) where k1='RST';
update t1 set c2=randomtext(34) where k1='RST';
vacuum full t1;
```
Step 4. Create a new bash terminal and view the memory usage of PID 10511
during the execution of 'vacumm full':
```bash
[yz@bogon ~]$ top -p 10511
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
10511 yz 20 0 1853340 1.6g 141488 D 15.7 21.4 0:14.92
postgres
```
You can observe that 'vacumm full' memory usage (VIRT & RES) is very high,
with the peak memory usage increasing with each Step 3 execution.
After analysis, the MemoryContext that consumes significant memory is "Table
rewrite", and the stack is:
```
#0 0x000000000050f700 in heap_copytuple ()
#1 0x000000000054f452 in rewrite_heap_tuple ()
#2 0x000000000054844f in reform_and_rewrite_tuple.isra.0 ()
#3 0x00000000005488e0 in heapam_relation_copy_for_cluster ()
#4 0x0000000000616760 in copy_table_data ()
#5 0x0000000000617846 in cluster_rel ()
#6 0x0000000000676973 in vacuum_rel ()
#7 0x0000000000677b9c in vacuum ()
#8 0x00000000006782dc in ExecVacuum ()
#9 0x0000000000808859 in standard_ProcessUtility ()
#10 0x0000000000806f5f in PortalRunUtility ()
#11 0x000000000080708b in PortalRunMulti ()
#12 0x000000000080755d in PortalRun ()
#13 0x0000000000803b28 in exec_simple_query ()
...
```
The code that causes the problem is in
src/backend/access/heap/rewriteheap.c:
```c
void
rewrite_heap_tuple(RewriteState state,
HeapTuple old_tuple, HeapTuple new_tuple)
{
...
if (!((old_tuple->t_data->t_infomask & HEAP_XMAX_INVALID) ||
HeapTupleHeaderIsOnlyLocked(old_tuple->t_data)) &&
!HeapTupleHeaderIndicatesMovedPartitions(old_tuple->t_data) &&
!(ItemPointerEquals(&(old_tuple->t_self),
&(old_tuple->t_data->t_ctid))))
{
// If the code executes inside this block, the allocated memory will not
be freed until the query ends.
...
```