Обсуждение: Re: Limit length of queryies in pg_stat_statement extension

Поиск

Список

Период

Сортировка

Re: Limit length of queryies in pg_stat_statement extension

От

Julien Rouhaud

Дата:

17 января, 10:24:34

Hi,

On Thu, Jan 16, 2025 at 10:19:49AM +0800, 赵庭海(庭章) wrote:
> Hi all,
> Recently, I have noticed a potential problem in the pg_stat_statements
> extension.  When the garbage collection is triggered within this extension,
> if there is a significant amount of data to be written to the
> pgss_query_texts.stat file (for example, when tracking very long SQL
> queries), the corresponding backend process holds the pgss->lock for an
> extended period. This causes all other backend processes to wait for this
> lock in the extension's hook during all SQL executions, and these processes
> do not respond to interrupts during this time.
> To temporarily address this issue, I have written a patch that introduces a
> parameter to control the maximum length of tracked SQL queries.

I don't think that it would move the needle much.  Deallocating entries is very
expensive, even when the query text file isn't being cleaned up, as it needs to
sort all entries by usage to remove the least recently used all with an
exclusive pgss->lock.  The real solution is probably to rely on the new
pluggable statistic architecture rather than using the hash table / query text
file.

> It seems like this is not an ideal solution, at least, I think it is
> necessary to log messages at the log level before and after the garbage
> collection process.  This would help us diagnose similar issues in the
> future.I spent a considerable amount of time investigating this issue due to
> the lack of relevant logs.
> I believe adding these logs would be beneficial for troubleshooting.
> Thanks for your attention to this issue.

Isn't the pg_stat_statements_info.dealloc counter enough to figure out the root
issue?

Re：Limit length of queryies in pg_stat_statement extension

От

"赵庭海(庭章)"

Дата:

18 января, 22:32:19

Hi,

Thanks for your answer.

> I don't think that it would move the needle much.  Deallocating entries is very
> expensive, even when the query text file isn't being cleaned up, as it needs to
> sort all entries by usage to remove the least recently used all with an
> exclusive pgss->lock.  The real solution is probably to rely on the new
> pluggable statistic architecture rather than using the hash table / query text
> file.

I'm sorry I left out some details earlier. I found that the garbage collect backend process was in the loop of
gc_qtextswhile for a long time. The main backtrace is below. 

```
#0  0x00007fc528d6aba0 in __write_nocancel () from /lib64/libc.so.6
#1  0x00007fc528cf52f3 in _IO_new_file_write () from /lib64/libc.so.6
#2  0x00007fc528cf5b90 in __GI__IO_file_xsputn () from /lib64/libc.so.6
#3  0x00007fc528cea7e2 in fwrite () from /lib64/libc.so.6
#4  0x00007fc529199dd5 in gc_qtexts () at pg_stat_statements.c:2380
#5  pgss_store
#6  0x00007fc52919a2b8 in pgss_post_parse_analyze (query=0x1e9aed8, pstate=0x178a220) at pg_stat_statements.c:900
```

So I think the main reason for this long lock holding is that the I/O operation takes a long time because of these very
longqueries. 

In my production environment. pg_stat_statement.max is set to 1000. I found that when this problem occurred, gc took
morethan 20 seconds. If I limit the length of a single sql to 8k, it will only take 1.79 seconds. 

> Isn't the pg_stat_statements_info.dealloc counter enough to figure out the root
> issue?

Only in my opinions, pg_stat_statements_info.dealloc doesn't reflect how long it takes for garbage collect. Earlier
whenI was checking the logs for abnormal periods, there is only some slow parse logging like below. 

> duration: 20834 ms  parse S0_1: …...

Best regards,
Tinghai Zhao

Re: Re：Limit length of queryies in pg_stat_statement extension

От

Julien Rouhaud

Дата:

19 января, 07:49:48

On Sun, Jan 19, 2025 at 03:32:19AM +0800, 赵庭海(庭章) wrote:
>
> I'm sorry I left out some details earlier. I found that the garbage collect
> backend process was in the loop of gc_qtexts while for a long time. The main
> backtrace is below.
>
> ```
> #0  0x00007fc528d6aba0 in __write_nocancel () from /lib64/libc.so.6
> #1  0x00007fc528cf52f3 in _IO_new_file_write () from /lib64/libc.so.6
> #2  0x00007fc528cf5b90 in __GI__IO_file_xsputn () from /lib64/libc.so.6
> #3  0x00007fc528cea7e2 in fwrite () from /lib64/libc.so.6
> #4  0x00007fc529199dd5 in gc_qtexts () at pg_stat_statements.c:2380
> #5  pgss_store
> #6  0x00007fc52919a2b8 in pgss_post_parse_analyze (query=0x1e9aed8, pstate=0x178a220) at pg_stat_statements.c:900
> ```
>
> So I think the main reason for this long lock holding is that the I/O
> operation takes a long time because of these very long queries.
>
> In my production environment. pg_stat_statement.max is set to 1000. I found
> that when this problem occurred, gc took more than 20 seconds. If I limit the
> length of a single sql to 8k, it will only take 1.79 seconds.

As I mentioned earlier entry eviction that doesn't lead to query text gc is
also expensive, and is more frequent.  Truncating the query text makes
pg_stat_statements almost useless for multiple usages (e.g. if you need to run
the query again for testing a hypothetical index or similar).

In your case almost 2s of total freeze still sounds like something that
wouldn't acceptable.  Either you have some OLTP system and that's many order of
magnitude more than query, or it's some kind of OLAP and then the difference
between 2s and 10s is a very narrow window for it to be a game changer.

Why exactly do you have pg_stat_statements.max set to a value that low?  Have
you tried to tune it and/or estimate how many different entries you would need
to store to avoid too frequent eviction?  In general if your workload leads to
something that will never fit in a finite number of entries (use of temporary
tables is a perfect example), the overhead is so high, gc or not, that your
only option is to get rid of pg_stat_statements.

>
> > Isn't the pg_stat_statements_info.dealloc counter enough to figure out the root
> > issue?
>
> Only in my opinions, pg_stat_statements_info.dealloc doesn't reflect how long
> it takes for garbage collect. Earlier when I was checking the logs for
> abnormal periods, there is only some slow parse logging like below.

Having this counter continuously increasing tells you that you have will almost
certainly have lock contention.  You should also see pg_stat_statements in the
wait events.  And again the gc just makes things worse, eviction itself will
kill your performance almost as much.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Re: Limit length of queryies in pg_stat_statement extension

Re: Limit length of queryies in pg_stat_statement extension

Re：Limit length of queryies in pg_stat_statement extension

Re: Re：Limit length of queryies in pg_stat_statement extension