Re: [PATCH] fix wait_event of pg_stat_activity in case of high amount of connections
От | Justin Pryzby |
---|---|
Тема | Re: [PATCH] fix wait_event of pg_stat_activity in case of high amount of connections |
Дата | |
Msg-id | 20220707185806.GC13040@telsasoft.com обсуждение исходный текст |
Ответ на | [PATCH] fix wait_event of pg_stat_activity in case of high amount of connections (Michael Zhilin <m.zhilin@postgrespro.ru>) |
Ответы |
Re: [PATCH] fix wait_event of pg_stat_activity in case of high amount of connections
|
Список | pgsql-hackers |
On Thu, Jun 30, 2022 at 11:05:28PM +0300, Michael Zhilin wrote: > I would like to submit patch for column wait_event of view pg_stat_activity. > Please find it attached. Thanks. I had this thread on my list to look at later but then, by chance, I got an erroneous nagios notification about a longrunning transaction, which seems to have been caused by this bug. You can reproduce the problem by running a couple handsful of these: psql -h /tmp postgres -Atc "SELECT * FROM pg_stat_activity WHERE state='active' AND wait_event='ClientRead' LIMIT 1"& > As result backend wait events may be changed for quick queries. Yep. I realize now that I've been seeing this at one of our customers for a pkey lookup query. We don't use high max_connections, but if you have a lot of connections, you might be more likely to hit this. I agree that this is a bug, since it can (and did) cause false positives in a monitoring system. > As well, patch changes way to allocate memory for local structure. Before it > estimates maximum size of required memory and allocate it at once. It could > result into allocation of dozens/hundreds of megabytes for nothing. Now it > allocates memory by chunks to reduce overall amount of allocated memory and > reduce time for allocation. I suggest to present this as two patches: a 0001 patch to fix the bug, and proposed for backpatch, and an 0002 patch for master to improve memory usage. As attached. Actually, once 0001 is resolved, it may be good to start a separate thread for 0002. I plan to add to the next CF. Did you really experience latency before the patch ? It seems like max_connections=1000 would only allocate a bit more than 1MB. For me, max_connections=10000 makes pg_stat_activity 10% slower for me than 100 (11.8s vs 13.2s). With the patch, 10k is only about ~3% slower than 100. So there is an improvement, but people here may not be enthusiastic excited about improving performance for huge values of max_connections. time for a in `seq 1 99`; do psql -h /tmp postgres -Atc "SELECT FROM pg_stat_activity -- WHERE state='active' AND wait_event='ClientRead'LIMIT 1"; done Is there a reason why you made separate allocations for appname, hostname, activity, ssl, gss (I know that's how it's done originally, too) ? Couldn't it be a single allocation ? You could do a single (re)allocation when an active backend is found if the amount of free space is less than 2*NAMEDATALEN+track_activity_query_size+sizeof(PgBackendSSLStatus)+sizeof(PgBackendGSSStatus) I have a patch for that, but I'll share it after 0001 is resolved. Why did you change backendStatus to a pointer ? Is it to minimize the size of the structure to allow for huge values of max_connections ? Note that there were warnings from your 0002: backend_status.c:723:21: warning: ‘localactivity_thr’ may be used uninitialized in this function [-Wmaybe-uninitialized] Thanks, -- Justin
Вложения
В списке pgsql-hackers по дате отправления: