Обсуждение: pg_stat_statements: faster search by queryid

Поиск

Список

Период

Сортировка

pg_stat_statements: faster search by queryid

От

Karina Litskevich

Дата:

17 сентября, 11:45:29

Hi hackers,

Aleksandra Bondar and I are proposing the following patch for
pg_stat_statements.


The idea.
---------

Currently, to get statistics on a specific query, you should execute

SELECT * FROM pg_stat_statements WHERE queryid = specific_queryid;

This takes a long time because the pg_stat_statements() function forms
tuples for all statistics it has first, and then they are filtered by
the WHERE clause. If we provide a function like
pg_stat_statements_by_queryid(queryid bigint), which would filter
statistics by queryid while scanning pgss_hash and return only
statistics with the specified queryid, that would be much faster.

We can also easily add filtration by userid and dbid, which would lead
us to a function like
pg_stat_statements_filtered(queryid bigint, userid Oid, dbid Oid). In
case some parameter is not specified, its default value is 0, and it
means that no filtration is needed on this parameter. Kind of like
pg_stat_statements_reset() chooses what statistics should be cleaned.
If no parameter is specified, pg_stat_statements_filtered() should
return all statistics that pg_stat_statements() would return.

This led me to the idea that we should rather extend the
pg_stat_statements() function than add a new function. The old way to
call pg_stat_statements() will produce the same results, and
specifying new parameters will produce filtered results.


The patch.
----------

The extended pg_stat_statements() function idea is implemented in the
patch attached. I can always rewrite the patch to add a new function
and leave pg_stat_statements() as it is, though, if you think it's
better to have a separate function for filtering.

We've only written the code so far and want to get your opinion on
that. If you like the idea, we'll also provide tests and docs. Any
suggestions are welcome.


Benchmarking.
-------------

We prepared a simple test case here to show performance improvement.
Download the attached script pg_stat_statements_prepare.sql and run
the following in psql.

CREATE EXTENSION pg_stat_statements;
-- Fill in pg_stat_statements statistics
\i /path/to/pgpro_stats_prepare_script.sql
-- Get random query ID
SELECT queryid AS rand_queryid
FROM pg_stat_statements
WHERE queryid IS NOT NULL
ORDER BY random()
LIMIT 1 \gset
-- Turn on time measuring
\timing
-- Get statistics in the old way
SELECT * FROM pg_stat_statements WHERE queryid = :rand_queryid;
-- Get statistics in the new way
SELECT * FROM pg_stat_statements(true, queryid => :rand_queryid);

I'm getting that the new way is at least two times faster on my
machine. I also compared the time for the old way on master with
and without the patch. I get that the difference is within standard
deviation.


Best regards,
Karina Litskevich
Postgres Professional: http://postgrespro.com/

Вложения

Re: pg_stat_statements: faster search by queryid

От

Sami Imseih

Дата:

17 сентября, 18:48:40

Hi,

Thanks for raising this.

> This takes a long time because the pg_stat_statements() function forms
> tuples for all statistics it has first, and then they are filtered by
> the WHERE clause.

I am curious about the specific use case where this becomes a concern.
How often are you querying pg_stat_statements for individual entries?

My initial thought is that this patch does not remove the issue of
loading the entire query text (just to return one or a few entries). I
suspect that repeatedly loading query text into memory for lookups
could be a bigger problem. One option would be to pass showtext=false
for such lookups, but that makes the function more cumbersome to use
and understand.

I do think having the ability to look up a specific entry based on a
key (that is, hash_search instead of hash_seq_search) would be useful.
But we also need to consider how to handle query text lookups, so we
are not forced to load the entire text file into memory.

For what it is worth, I have been thinking about what it would take to
move query texts into shared memory, which could make this type of
filtering more practical.

Just my 2c.

--
Sami Imseih
Amazon Web Services (AWS)

Re: pg_stat_statements: faster search by queryid

От

Karina Litskevich

Дата:

18 сентября, 17:25:00

Thank you for your feedback.


> My initial thought is that this patch does not remove the issue of
> loading the entire query text (just to return one or a few entries).

This patch is not intended to address the issue of loading the file
with query texts. We only try to avoid forming and handling tuples
that we don't really need. Forming a tuple in
pg_stat_statements_internal() includes calling CStringGetTextDatum(),
which allocates memory and copies the query text. Avoiding that for
queries we don't need makes a big difference already.


> I do think having the ability to look up a specific entry based on a
> key (that is, hash_search instead of hash_seq_search) would be useful.

That's a great idea, thanks! I'm going to try that and include it in
the next version of the patch if I succeed.


> For what it is worth, I have been thinking about what it would take to
> move query texts into shared memory, which could make this type of
> filtering more practical.

As far as I can tell, pg_stat_statements is storing query texts in an
external file to not have problems with long query texts. Here is a
quote from the docs:

> The representative query texts are kept in an external disk file, and
> do not consume shared memory. Therefore, even very lengthy query texts
> can be stored successfully.

I can think of the idea of caching some of the texts (short enough) in
shared memory. That would require adding a GUC that limits query text
length or maybe that limits the total amount of memory to be used for
caching. So for an individual query, pg_stat_statements should first
look up its text in the cache and only go to the disk if it's not
found. I'm not sure I want to do that, though...


Best regards,
Karina Litskevich
Postgres Professional: http://postgrespro.com/

Re: pg_stat_statements: faster search by queryid

От

Sami Imseih

Дата:

18 сентября, 18:33:40

> > My initial thought is that this patch does not remove the issue of
> > loading the entire query text (just to return one or a few entries).
>
> This patch is not intended to address the issue of loading the file
> with query texts. We only try to avoid forming and handling tuples
> that we don't really need. Forming a tuple in
> pg_stat_statements_internal() includes calling CStringGetTextDatum(),
> which allocates memory and copies the query text. Avoiding that for
> queries we don't need makes a big difference already.

Yes, but my point is, if someone repeatedly lookup up pg_stat_statements
with filters, they will end up loading the query text multiple times.

for example:
````
select * from pg_stat_statements where query_id in (10000, 20000, 30000);
```

will only load the query text once to retrieve these 3 query IDs.

If I instead do this, with the proposed patch:

```
select * from pg_stat_statements(true, queryid=>10000);
select * from pg_stat_statements(true, queryid=>20000);
select * from pg_stat_statements(true, queryid=>30000);

or
select * from pg_stat_activity a, pg_stat_statements(true, queryid=>a.query_id);

```
I will have to load the query text file into memory for every invocation of
pg_stat_statements.


> > For what it is worth, I have been thinking about what it would take to
> > move query texts into shared memory, which could make this type of
> > filtering more practical.
>
> As far as I can tell, pg_stat_statements is storing query texts in an
> external file to not have problems with long query texts. Here is a
> quote from the docs:
>
> > The representative query texts are kept in an external disk file, and
> > do not consume shared memory. Therefore, even very lengthy query texts
> > can be stored successfully.

pg_stat_statements_internal must load the file when showtext = true, which
is the default.

```
qbuffer = qtext_load_file(&qbuffer_size);
```

--
Sami Imseih
Amazon Web Services (AWS)

Re: pg_stat_statements: faster search by queryid

От

Karina Litskevich

Дата:

23 сентября, 14:59:19

>> I do think having the ability to look up a specific entry based on a
>> key (that is, hash_search instead of hash_seq_search) would be useful.
>
> That's a great idea, thanks! I'm going to try that and include it in
> the next version of the patch if I succeed.

Here is the second version of the patch, as promised. I used hash_search
in case all three key arguments are provided, the same way as it is done
in the entry_reset() function. The diff doesn't look very pleasant.
Basically I just moved the code that forms one tuple in a new
pg_stat_statements_handle_entry() function to use it in
pg_stat_statements_internal().

I also had a second thought about adding a new struct just to pass three
key arguments as one filter to the internal function. In the v2 patch I
just pass the arguments as they are. I'm not sure which option is better.
Anyway, it should be the same in both entry_reset() and
pg_stat_statements_internal(), so if you say adding struct pgssFilter
was a good idea, I'll rewrite the patch to use it in both
pg_stat_statements_internal() and entry_reset().

On Thu, Sep 18, 2025 at 6:33 PM Sami Imseih <samimseih@gmail.com> wrote:
>
> Yes, but my point is, if someone repeatedly lookup up pg_stat_statements
> with filters, they will end up loading the query text multiple times.
>
> for example:
> ````
> select * from pg_stat_statements where query_id in (10000, 20000, 30000);
> ```
>
> will only load the query text once to retrieve these 3 query IDs.
>
> If I instead do this, with the proposed patch:
>
> ```
> select * from pg_stat_statements(true, queryid=>10000);
> select * from pg_stat_statements(true, queryid=>20000);
> select * from pg_stat_statements(true, queryid=>30000);
>
> or
> select * from pg_stat_activity a, pg_stat_statements(true, queryid=>a.query_id);
>
> ```
> I will have to load the query text file into memory for every invocation of
> pg_stat_statements.
>

You are right. At some point, if information about multiple queries is
needed, a single select from pg_stat_statements followed by filtering
will be more efficient than calling pg_stat_statements with filters
multiple times. That's something that should be documented.

Best regards,
Karina Litskevich
Postgres Professional: http://postgrespro.com/

Вложения

v2-0001-pg_stat_statements-add-ability-to-filter-statisti.patch

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: pg_stat_statements: faster search by queryid

pg_stat_statements: faster search by queryid

Вложения

Re: pg_stat_statements: faster search by queryid

Re: pg_stat_statements: faster search by queryid

Re: pg_stat_statements: faster search by queryid

Re: pg_stat_statements: faster search by queryid

Вложения