Обсуждение: BUG #17668: Query normalization generates multiple queryId:s for calls to the same procedure

Поиск
Список
Период
Сортировка

BUG #17668: Query normalization generates multiple queryId:s for calls to the same procedure

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      17668
Logged by:          Marcus Kempe
Email address:      marcus.kempe@gmail.com
PostgreSQL version: 14.5
Operating system:   Linux
Description:

Consider the following case: 

postgres=# create function test(x bigint) returns void language sql as $$
select x ;$$;
CREATE FUNCTION
postgres=# select test(1);
 test 
------
 
(1 row)

postgres=# select calls,userid,dbid,queryid,query from
public.pg_stat_statements where query like 'select test%';
 calls | userid | dbid  |       queryid        |      query      
-------+--------+-------+----------------------+-----------------
     1 |     10 | 13757 | -8729651821477628722 | select test($1)
(1 row)

postgres=# select test(12345678910); -- larger that INTMAX.
 test 
------
 
(1 row)

postgres=# select calls,userid,dbid,queryid,query from
public.pg_stat_statements where query like 'select test%';
 calls | userid | dbid  |       queryid        |      query      
-------+--------+-------+----------------------+-----------------
     1 |     10 | 13757 | -8729651821477628722 | select test($1)
     1 |     10 | 13757 |  1452143938866253601 | select test($1)
(2 rows)

So given the above 2 calls for the same function, there are 2 different
queryIds generated, seemingly based on the fact that in the first case the
parameter can be cast to integer, and in the second case it can not.

I would have expected both calls to be normalized and fingerprinted to one
and the same queryId.

This behavior was similar even before queryjumble.c was introduced in PG14.


Re: BUG #17668: Query normalization generates multiple queryId:s for calls to the same procedure

От
Julien Rouhaud
Дата:
Hi,

On Thu, Oct 27, 2022 at 01:49:18PM +0000, PG Bug reporting form wrote:
>
> postgres=# select calls,userid,dbid,queryid,query from
> public.pg_stat_statements where query like 'select test%';
>  calls | userid | dbid  |       queryid        |      query
> -------+--------+-------+----------------------+-----------------
>      1 |     10 | 13757 | -8729651821477628722 | select test($1)
>      1 |     10 | 13757 |  1452143938866253601 | select test($1)
> (2 rows)
>
> So given the above 2 calls for the same function, there are 2 different
> queryIds generated, seemingly based on the fact that in the first case the
> parameter can be cast to integer, and in the second case it can not.
>
> I would have expected both calls to be normalized and fingerprinted to one
> and the same queryId.

Agreed, and that's actually a known problem that is currently being worked on.
You can look at
https://www.postgresql.org/message-id/flat/36e5bffe-e989-194f-85c8-06e7bc88e6f7%40amazon.com
for details about the discussion and in-progress fix.



Re: BUG #17668: Query normalization generates multiple queryId:s for calls to the same procedure

От
Tom Lane
Дата:
Julien Rouhaud <rjuju123@gmail.com> writes:
> On Thu, Oct 27, 2022 at 01:49:18PM +0000, PG Bug reporting form wrote:
>> I would have expected both calls to be normalized and fingerprinted to one
>> and the same queryId.

> Agreed, and that's actually a known problem that is currently being worked on.
> You can look at
> https://www.postgresql.org/message-id/flat/36e5bffe-e989-194f-85c8-06e7bc88e6f7%40amazon.com
> for details about the discussion and in-progress fix.

Um ... that seems unrelated.  AFAICS the OP is complaining about
the fact that '42'::int4::int8 is not identical to '42'::int8.
Well, they're not.  I seriously doubt that we would or should
consider trying to get queryjumble to mask that.

            regards, tom lane



Re: BUG #17668: Query normalization generates multiple queryId:s for calls to the same procedure

От
Julien Rouhaud
Дата:
On Thu, Oct 27, 2022 at 02:28:17PM -0400, Tom Lane wrote:
> Julien Rouhaud <rjuju123@gmail.com> writes:
> > On Thu, Oct 27, 2022 at 01:49:18PM +0000, PG Bug reporting form wrote:
> >> I would have expected both calls to be normalized and fingerprinted to one
> >> and the same queryId.
> 
> > Agreed, and that's actually a known problem that is currently being worked on.
> > You can look at
> > https://www.postgresql.org/message-id/flat/36e5bffe-e989-194f-85c8-06e7bc88e6f7%40amazon.com
> > for details about the discussion and in-progress fix.
> 
> Um ... that seems unrelated.  AFAICS the OP is complaining about
> the fact that '42'::int4::int8 is not identical to '42'::int8.
> Well, they're not.  I seriously doubt that we would or should
> consider trying to get queryjumble to mask that.

Oh wow I totally misread the bug report, sorry about that.  A good night sleep
and a coffee later I see what it's about, and I agree that we shouldn't try to
mask that even if it might seem surprising.



Re: BUG #17668: Query normalization generates multiple queryId:s for calls to the same procedure

От
Marcus Kempe
Дата:
Appreciate the clarifications. The only thing I would then add is that if we agree that it's unintuitive, this could be better reflected in the documentation. 
Currently the only mention is the following (pgstatstatements.html):
"queries with identical texts might appear as separate entries, if they have different meanings as a result of factors such as different search_path settings."

Thanks and best regards, 
Marcus Kempe


On Fri, Oct 28, 2022 at 3:28 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Thu, Oct 27, 2022 at 02:28:17PM -0400, Tom Lane wrote:
> Julien Rouhaud <rjuju123@gmail.com> writes:
> > On Thu, Oct 27, 2022 at 01:49:18PM +0000, PG Bug reporting form wrote:
> >> I would have expected both calls to be normalized and fingerprinted to one
> >> and the same queryId.
>
> > Agreed, and that's actually a known problem that is currently being worked on.
> > You can look at
> > https://www.postgresql.org/message-id/flat/36e5bffe-e989-194f-85c8-06e7bc88e6f7%40amazon.com
> > for details about the discussion and in-progress fix.
>
> Um ... that seems unrelated.  AFAICS the OP is complaining about
> the fact that '42'::int4::int8 is not identical to '42'::int8.
> Well, they're not.  I seriously doubt that we would or should
> consider trying to get queryjumble to mask that.

Oh wow I totally misread the bug report, sorry about that.  A good night sleep
and a coffee later I see what it's about, and I agree that we shouldn't try to
mask that even if it might seem surprising.

Re: BUG #17668: Query normalization generates multiple queryId:s for calls to the same procedure

От
Alexey Ermakov
Дата:
Hello, I've seen possibly related case where that might be a problem.

PostgreSQL 12.8, pg_stat_statements.max=10000. There are queries like these:
insert into tablename (col1, ..., col28) values ($1, ..., $28), ... 
($2605, ... , $2660) on conflict do nothing.

Problem is that not only we might insert different number of rows but 
there might be many duplicates of the same exact query in 
pg_stat_statements (userid and dbid are the same in all rows):

select md5(query), count(*), pg_size_pretty(sum(length(query))) from 
pg_stat_statements
where query like 'insert into tablename%' group by 1 order by 2 desc 
limit 10;

    md5    | count | pg_size_pretty
----------+-------+----------------
  170...84 |  1283 | 2679 kB
  906...53 |   897 | 1724 kB
  55e...78 |   676 | 12 MB
  ea4...ef |   642 | 11 MB
  944...0f |   629 | 11 MB
  04e...ec |   530 | 9138 kB
  572...27 |   476 | 8575 kB
  5ea...3c |   430 | 7331 kB
  be1...6c |   324 | 5704 kB
  f69...47 |   313 | 550 kB

There are 8 bigint columns in the table, I suspect that we might insert 
into some of them values both greater and less than INT_MAX.
In result pg_stat_statements buffer is full of such queries (they take 
9k rows out of 10k right now, 10h since pg_stat_statements_reset call), 
currently total sum of query lengths is 117MB (97% of total sum).

 From monitoring perspective that might be a problem since calculating 
metrics from pg_stat_statements became more expensive and we might lose 
information about some queries when there is not enough space for them.

But I suppose such cases are pretty rare and some of them could be fixed 
with COPY queries instead of inserts.

--
Thanks,
Alexey Ermakov


On 2022-10-28 00:28, Tom Lane wrote:
> Julien Rouhaud <rjuju123@gmail.com> writes:
>> On Thu, Oct 27, 2022 at 01:49:18PM +0000, PG Bug reporting form wrote:
>>> I would have expected both calls to be normalized and fingerprinted to one
>>> and the same queryId.
>> Agreed, and that's actually a known problem that is currently being worked on.
>> You can look at
>> https://www.postgresql.org/message-id/flat/36e5bffe-e989-194f-85c8-06e7bc88e6f7%40amazon.com
>> for details about the discussion and in-progress fix.
> Um ... that seems unrelated.  AFAICS the OP is complaining about
> the fact that '42'::int4::int8 is not identical to '42'::int8.
> Well, they're not.  I seriously doubt that we would or should
> consider trying to get queryjumble to mask that.
>
>             regards, tom lane
>
>