I have problem with one of my Postgres production server. Server works fine almost always, but sometimes without any increase of transactions or statements amount, machine gets stuck. Cores goes up to 100%, load up to 160%. When it happens then there are problems with connect to database and even it will succeed, simple queries works several seconds instead of milliseconds.Problem sometimes stops after a period a time (e.g. 35 min), sometimes we must restart Postgres, Linux, or even KVM (which exists as virtualization host).
My hardware
56 cores (Intel Core Processor (Skylake, IBRS))
400 GB RAM
RAID10 with about 40k IOPS
Os
CentOS Linux release 7.7.1908
kernel 3.10.0-1062.18.1.el7.x86_64
Databasesize 100 GB (entirely fit in memory :) )
server_version 10.12
effective_cache_size 192000 MB
maintenance_work_mem 2048 MB
max_connections 150
shared_buffers 64000 MB
work_mem 96 MB