pg_stat_statements: faster search by queryid

Поиск

Список

Период

Сортировка

От	Karina Litskevich
Тема	pg_stat_statements: faster search by queryid
Дата	17 сентября 11:45:29
Msg-id	CACiT8iaL-x=KtE6L1EFfS0jwxdwFQ8=uQjut1Q0o=5xDJdb-uQ@mail.gmail.com обсуждение исходный текст
Ответы	Re: pg_stat_statements: faster search by queryid
Список	pgsql-hackers

Дерево обсуждения

Hi hackers,

Aleksandra Bondar and I are proposing the following patch for
pg_stat_statements.


The idea.
---------

Currently, to get statistics on a specific query, you should execute

SELECT * FROM pg_stat_statements WHERE queryid = specific_queryid;

This takes a long time because the pg_stat_statements() function forms
tuples for all statistics it has first, and then they are filtered by
the WHERE clause. If we provide a function like
pg_stat_statements_by_queryid(queryid bigint), which would filter
statistics by queryid while scanning pgss_hash and return only
statistics with the specified queryid, that would be much faster.

We can also easily add filtration by userid and dbid, which would lead
us to a function like
pg_stat_statements_filtered(queryid bigint, userid Oid, dbid Oid). In
case some parameter is not specified, its default value is 0, and it
means that no filtration is needed on this parameter. Kind of like
pg_stat_statements_reset() chooses what statistics should be cleaned.
If no parameter is specified, pg_stat_statements_filtered() should
return all statistics that pg_stat_statements() would return.

This led me to the idea that we should rather extend the
pg_stat_statements() function than add a new function. The old way to
call pg_stat_statements() will produce the same results, and
specifying new parameters will produce filtered results.


The patch.
----------

The extended pg_stat_statements() function idea is implemented in the
patch attached. I can always rewrite the patch to add a new function
and leave pg_stat_statements() as it is, though, if you think it's
better to have a separate function for filtering.

We've only written the code so far and want to get your opinion on
that. If you like the idea, we'll also provide tests and docs. Any
suggestions are welcome.


Benchmarking.
-------------

We prepared a simple test case here to show performance improvement.
Download the attached script pg_stat_statements_prepare.sql and run
the following in psql.

CREATE EXTENSION pg_stat_statements;
-- Fill in pg_stat_statements statistics
\i /path/to/pgpro_stats_prepare_script.sql
-- Get random query ID
SELECT queryid AS rand_queryid
FROM pg_stat_statements
WHERE queryid IS NOT NULL
ORDER BY random()
LIMIT 1 \gset
-- Turn on time measuring
\timing
-- Get statistics in the old way
SELECT * FROM pg_stat_statements WHERE queryid = :rand_queryid;
-- Get statistics in the new way
SELECT * FROM pg_stat_statements(true, queryid => :rand_queryid);

I'm getting that the new way is at least two times faster on my
machine. I also compared the time for the old way on master with
and without the patch. I get that the difference is within standard
deviation.


Best regards,
Karina Litskevich
Postgres Professional: http://postgrespro.com/

Вложения

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

pg_stat_statements: faster search by queryid

Вложения