Обсуждение: OOM killer and rocky linux

Поиск
Список
Период
Сортировка

OOM killer and rocky linux

От
kaido vaikla
Дата:
Hi, 

The problem is probably somwhere between pg and linux. 
Seems like OOM killer on rocky linux behaves in a different way
than on RHEL. 

For OOM killer invoking I used:
set work_mem=50GB;
select * from generate_series(1,100000000) order by random();

pg runs as systemd service.

Example 1:
==========
pg 15.2
o/s Red Hat Enterprise Linux Server release 7.9 (Maipo)

pg log:

2023-07-05 12:56:28.848 EEST [:::119279:postmaster] LOG:  server process (PID 120401) was terminated by signal 9: Killed
2023-07-05 12:56:28.848 EEST [:::119279:postmaster] LOG:  terminating any other active server processes
2023-07-05 12:56:28.850 EEST [:::119279:postmaster] LOG:  all server processes terminated; reinitializing
2023-07-05 12:56:28.885 EEST [:::120421:startup] LOG:  database system was interrupted; last known up at 2023-07-05 12:55:34 EEST
2023-07-05 12:56:28.899 EEST [:::120421:startup] LOG:  database system was not properly shut down; automatic recovery in progress
2023-07-05 12:56:28.904 EEST [:::120421:startup] LOG:  redo starts at 1/D0002E8
2023-07-05 12:56:28.904 EEST [:::120421:startup] LOG:  invalid record length at 1/D000320: wanted 24, got 0
2023-07-05 12:56:28.904 EEST [:::120421:startup] LOG:  redo done at 1/D0002E8 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2023-07-05 12:56:28.912 EEST [:::120422:checkpointer] LOG:  checkpoint starting: end-of-recovery immediate wait
2023-07-05 12:56:28.918 EEST [:::120422:checkpointer] LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.002 s, total=0.013 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB


Instance is reinitializing

Example 2:
==========
pg 15.2
o/s Rocky Linux release 9.2 (Blue Onyx)

pg log and systemd log in chronological order (juuni == 06 ):

juuni 30 14:19:11.491833 pgdb-forecast kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/postgresql-15.service,task=postmaster,pid=2575533,uid=26
juuni 30 14:19:11.491839 pgdb-forecast kernel: Out of memory: Killed process 2575533 (postmaster) total-vm:9716200kB, anon-rss:5113080kB, file-rss:0kB, shmem-rss:16828kB, UID:26 pgtables:10256kB oom_score_adj:0
juuni 30 14:19:11.491846 pgdb-forecast kernel: oom_reaper: reaped process 2575533 (postmaster), now anon-rss:0kB, file-rss:0kB, shmem-rss:16828kB
juuni 30 14:19:11.478410 pgdb-forecast systemd[1]: postgresql-15.service: A process of this unit has been killed by the OOM killer.
2023-06-30 14:19:11.505 EEST [:::2517202:postmaster] LOG:  received fast shutdown request
2023-06-30 14:19:11.518 EEST [:::2517202:postmaster] LOG:  aborting any active transactions
2023-06-30 14:19:11.564 EEST [:::2517202:postmaster] LOG:  server process (PID 2575533) was terminated by signal 9: Killed
2023-06-30 14:19:11.564 EEST [:::2517202:postmaster] LOG:  terminating any other active server processes
2023-06-30 14:19:11.633 EEST [:::2517202:postmaster] LOG:  abnormal database system shutdown
2023-06-30 14:19:11.939 EEST [:::2517202:postmaster] LOG:  database system is shut down
juuni 30 14:19:11.940216 pgdb-forecast systemd[1]: postgresql-15.service: Main process exited, code=exited, status=1/FAILURE
juuni 30 14:19:11.940255 pgdb-forecast systemd[1]: postgresql-15.service: Killing process 2517203 (postmaster) with signal SIGKILL.
juuni 30 14:19:11.940700 pgdb-forecast systemd[1]: postgresql-15.service: Failed with result 'oom-kill'.
juuni 30 14:19:11.940884 pgdb-forecast systemd[1]: postgresql-15.service: Consumed 46min 16.612s CPU time.


Instance received fast shutdown request. 
And kernel sends kill twice: process 2575533, process 2517203.

Googeling this problem gives me nothing, and i'm not sure, 
is it caused of pg or linux kernel.

br
Kaido





Re: OOM killer and rocky linux

От
Matti Linnanvuori
Дата:

Hello!

Disabling overcommit is recommended.

Best regards

kaido vaikla <kaido.vaikla@gmail.com> kirjoitti 9.11.2023 kello 14.57:

Hi, 

The problem is probably somwhere between pg and linux. 
Seems like OOM killer on rocky linux behaves in a different way
than on RHEL. 

For OOM killer invoking I used:
set work_mem=50GB;
select * from generate_series(1,100000000) order by random();

pg runs as systemd service.

Example 1:
==========
pg 15.2
o/s Red Hat Enterprise Linux Server release 7.9 (Maipo)

pg log:

2023-07-05 12:56:28.848 EEST [:::119279:postmaster] LOG:  server process (PID 120401) was terminated by signal 9: Killed
2023-07-05 12:56:28.848 EEST [:::119279:postmaster] LOG:  terminating any other active server processes
2023-07-05 12:56:28.850 EEST [:::119279:postmaster] LOG:  all server processes terminated; reinitializing
2023-07-05 12:56:28.885 EEST [:::120421:startup] LOG:  database system was interrupted; last known up at 2023-07-05 12:55:34 EEST
2023-07-05 12:56:28.899 EEST [:::120421:startup] LOG:  database system was not properly shut down; automatic recovery in progress
2023-07-05 12:56:28.904 EEST [:::120421:startup] LOG:  redo starts at 1/D0002E8
2023-07-05 12:56:28.904 EEST [:::120421:startup] LOG:  invalid record length at 1/D000320: wanted 24, got 0
2023-07-05 12:56:28.904 EEST [:::120421:startup] LOG:  redo done at 1/D0002E8 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2023-07-05 12:56:28.912 EEST [:::120422:checkpointer] LOG:  checkpoint starting: end-of-recovery immediate wait
2023-07-05 12:56:28.918 EEST [:::120422:checkpointer] LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.002 s, total=0.013 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB


Instance is reinitializing

Example 2:
==========
pg 15.2
o/s Rocky Linux release 9.2 (Blue Onyx)

pg log and systemd log in chronological order (juuni == 06 ):

juuni 30 14:19:11.491833 pgdb-forecast kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/postgresql-15.service,task=postmaster,pid=2575533,uid=26
juuni 30 14:19:11.491839 pgdb-forecast kernel: Out of memory: Killed process 2575533 (postmaster) total-vm:9716200kB, anon-rss:5113080kB, file-rss:0kB, shmem-rss:16828kB, UID:26 pgtables:10256kB oom_score_adj:0
juuni 30 14:19:11.491846 pgdb-forecast kernel: oom_reaper: reaped process 2575533 (postmaster), now anon-rss:0kB, file-rss:0kB, shmem-rss:16828kB
juuni 30 14:19:11.478410 pgdb-forecast systemd[1]: postgresql-15.service: A process of this unit has been killed by the OOM killer.
2023-06-30 14:19:11.505 EEST [:::2517202:postmaster] LOG:  received fast shutdown request
2023-06-30 14:19:11.518 EEST [:::2517202:postmaster] LOG:  aborting any active transactions
2023-06-30 14:19:11.564 EEST [:::2517202:postmaster] LOG:  server process (PID 2575533) was terminated by signal 9: Killed
2023-06-30 14:19:11.564 EEST [:::2517202:postmaster] LOG:  terminating any other active server processes
2023-06-30 14:19:11.633 EEST [:::2517202:postmaster] LOG:  abnormal database system shutdown
2023-06-30 14:19:11.939 EEST [:::2517202:postmaster] LOG:  database system is shut down
juuni 30 14:19:11.940216 pgdb-forecast systemd[1]: postgresql-15.service: Main process exited, code=exited, status=1/FAILURE
juuni 30 14:19:11.940255 pgdb-forecast systemd[1]: postgresql-15.service: Killing process 2517203 (postmaster) with signal SIGKILL.
juuni 30 14:19:11.940700 pgdb-forecast systemd[1]: postgresql-15.service: Failed with result 'oom-kill'.
juuni 30 14:19:11.940884 pgdb-forecast systemd[1]: postgresql-15.service: Consumed 46min 16.612s CPU time.


Instance received fast shutdown request. 
And kernel sends kill twice: process 2575533, process 2517203.

Googeling this problem gives me nothing, and i'm not sure, 
is it caused of pg or linux kernel.

br
Kaido






Вложения

Re: OOM killer and rocky linux

От
kaido vaikla
Дата:
Yep, i know it. But my question was more like 
why (during OOM) 
pg on o/s Red Hat Enterprise Linux Server release 7.9 (Maipo)
does "Instance is reinitializing" 
but
pg on o/s Rocky Linux release 9.2 (Blue Onyx)
got "Instance received fast shutdown request. "

br
Kaido

On Thu, 9 Nov 2023 at 15:04, Matti Linnanvuori <matti.linnanvuori@portalify.com> wrote:

Hello!

Disabling overcommit is recommended.

Best regards

kaido vaikla <kaido.vaikla@gmail.com> kirjoitti 9.11.2023 kello 14.57:

Hi, 

The problem is probably somwhere between pg and linux. 
Seems like OOM killer on rocky linux behaves in a different way
than on RHEL. 

For OOM killer invoking I used:
set work_mem=50GB;
select * from generate_series(1,100000000) order by random();

pg runs as systemd service.

Example 1:
==========
pg 15.2
o/s Red Hat Enterprise Linux Server release 7.9 (Maipo)

pg log:

2023-07-05 12:56:28.848 EEST [:::119279:postmaster] LOG:  server process (PID 120401) was terminated by signal 9: Killed
2023-07-05 12:56:28.848 EEST [:::119279:postmaster] LOG:  terminating any other active server processes
2023-07-05 12:56:28.850 EEST [:::119279:postmaster] LOG:  all server processes terminated; reinitializing
2023-07-05 12:56:28.885 EEST [:::120421:startup] LOG:  database system was interrupted; last known up at 2023-07-05 12:55:34 EEST
2023-07-05 12:56:28.899 EEST [:::120421:startup] LOG:  database system was not properly shut down; automatic recovery in progress
2023-07-05 12:56:28.904 EEST [:::120421:startup] LOG:  redo starts at 1/D0002E8
2023-07-05 12:56:28.904 EEST [:::120421:startup] LOG:  invalid record length at 1/D000320: wanted 24, got 0
2023-07-05 12:56:28.904 EEST [:::120421:startup] LOG:  redo done at 1/D0002E8 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2023-07-05 12:56:28.912 EEST [:::120422:checkpointer] LOG:  checkpoint starting: end-of-recovery immediate wait
2023-07-05 12:56:28.918 EEST [:::120422:checkpointer] LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.002 s, total=0.013 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB


Instance is reinitializing

Example 2:
==========
pg 15.2
o/s Rocky Linux release 9.2 (Blue Onyx)

pg log and systemd log in chronological order (juuni == 06 ):

juuni 30 14:19:11.491833 pgdb-forecast kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/postgresql-15.service,task=postmaster,pid=2575533,uid=26
juuni 30 14:19:11.491839 pgdb-forecast kernel: Out of memory: Killed process 2575533 (postmaster) total-vm:9716200kB, anon-rss:5113080kB, file-rss:0kB, shmem-rss:16828kB, UID:26 pgtables:10256kB oom_score_adj:0
juuni 30 14:19:11.491846 pgdb-forecast kernel: oom_reaper: reaped process 2575533 (postmaster), now anon-rss:0kB, file-rss:0kB, shmem-rss:16828kB
juuni 30 14:19:11.478410 pgdb-forecast systemd[1]: postgresql-15.service: A process of this unit has been killed by the OOM killer.
2023-06-30 14:19:11.505 EEST [:::2517202:postmaster] LOG:  received fast shutdown request
2023-06-30 14:19:11.518 EEST [:::2517202:postmaster] LOG:  aborting any active transactions
2023-06-30 14:19:11.564 EEST [:::2517202:postmaster] LOG:  server process (PID 2575533) was terminated by signal 9: Killed
2023-06-30 14:19:11.564 EEST [:::2517202:postmaster] LOG:  terminating any other active server processes
2023-06-30 14:19:11.633 EEST [:::2517202:postmaster] LOG:  abnormal database system shutdown
2023-06-30 14:19:11.939 EEST [:::2517202:postmaster] LOG:  database system is shut down
juuni 30 14:19:11.940216 pgdb-forecast systemd[1]: postgresql-15.service: Main process exited, code=exited, status=1/FAILURE
juuni 30 14:19:11.940255 pgdb-forecast systemd[1]: postgresql-15.service: Killing process 2517203 (postmaster) with signal SIGKILL.
juuni 30 14:19:11.940700 pgdb-forecast systemd[1]: postgresql-15.service: Failed with result 'oom-kill'.
juuni 30 14:19:11.940884 pgdb-forecast systemd[1]: postgresql-15.service: Consumed 46min 16.612s CPU time.


Instance received fast shutdown request. 
And kernel sends kill twice: process 2575533, process 2517203.

Googeling this problem gives me nothing, and i'm not sure, 
is it caused of pg or linux kernel.

br
Kaido






Вложения