Re: hash_search_with_hash_value is high in "perf top" on a replica
От | Srinath Reddy |
---|---|
Тема | Re: hash_search_with_hash_value is high in "perf top" on a replica |
Дата | |
Msg-id | CAFC+b6pQEipF7ptzTJ+=QiMPo4fYTJQ4SceR4K2GhaYjy9tXRg@mail.gmail.com обсуждение исходный текст |
Ответ на | hash_search_with_hash_value is high in "perf top" on a replica (Dmitry Koterov <dmitry.koterov@gmail.com>) |
Список | pgsql-hackers |
On Fri, Jan 31, 2025 at 5:00 PM Dmitry Koterov <dmitry.koterov@gmail.com> wrote:
Hi.Debugging some replication lag on a replica when the master node experiences heavy writes.PG "startup recovering" eats up a lot of CPU (like 65 %user and 30 %sys), which is a little surprising (what is it doing with all those CPU cycles? it looked like WAL replay should be more IO bound than CPU bound?).Running "perf top -p <pid>", it shows this:Samples: 1M of event 'cycles:P', 4000 Hz, Event count (approx.): 18178814660 lost: 0/0 drop: 0/0
Overhead Shared Object Symbol
16.63% postgres [.] hash_search_with_hash_value
5.38% postgres [.] __aarch64_ldset4_sync
4.42% postgres [.] __aarch64_cas4_acq_rel
3.42% postgres [.] XLogReadBufferExtended
2.35% libc.so.6 [.] 0x0000000000097f4c
2.04% postgres [.] pg_comp_crc32c_armv8
1.77% [kernel] [k] mutex_lock
1.63% postgres [.] XLogPrefetcherReadRecord
1.56% postgres [.] DecodeXLogRecord
1.55% postgres [.] LWLockAcquireThis is more surprising: hash_search_with_hash_value is a hash table lookup function, is it expected that it is #1 in the profiler? (Called mostly from PrefetchSharedBuffer*.)
Configuration:- PG17
- ZFS, compression is off, empty database (with compression on, most of "startup recovering" CPU was spent in ZFS guts doing compression/decompression according to the profiler)- full_page_writes=off, recovery_prefetch=on (ZFS supports it, I tested with a small C program), wal_decode_buffer_size=2048kB (it doesn't seem to affect anything though).
- shared_buffers = 25% of RAM
- testing with a giant COPY command basically
My main question is about hash_search_with_hash_value CPU usage. With both recovery_prefetch=on and off, it is the topmost function in "perf top". I see no IO bottleneck on the machine, it's only "startup recovering" maxing out one CPU core.
Maybe it's a red herring though, but it looks pretty suspicious.
Hi,
i think high CPU usage make sense,as hash_search_with_hash_value is called every time when ever startup process in replica reads a wal and tries to redo it ,as it goes through readbuffer_common to check if the page which the wal effects and wants to redo is in buffer pool or not using BufferAlloc->BufTableLookup>hash_search_with_hash_value.
Regards,
Srinath Reddy Sadipiralla,
EDB: https://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: