Обсуждение: Jumble the CALL command in pg_stat_statements

Поиск
Список
Период
Сортировка

Jumble the CALL command in pg_stat_statements

От
"Imseih (AWS), Sami"
Дата:

Hi,

 

The proposal by Bertrand in CC to jumble CALL and SET in [1] was

rejected at the time for a more robust solution to jumble DDL.

 

Michael also in CC made this possible with commit 3db72ebcbe.

 

The attached patch takes advantage of the jumbling infrastructure

added in the above mentioned commit and jumbles the CALL statement

in pg_stat_statements.

 

The patch also modifies existing test cases for CALL handling in pg_stat_statements

and adds additional tests which prove that a CALL to an overloaded procedure

will generate a different query_id.

 

As far as the SET command mentioned in [1] is concerned, it is a bit more complex

as it requires us to deal with A_Constants which is not very straightforward. We can surely

deal with SET currently by applying custom query jumbling logic to VariableSetStmt,

but this can be dealt with in a separate discussion.

 

 

Regards,

 

Sami Imseih

Amazon Web Services (AWS)

 

[1] https://www.postgresql.org/message-id/flat/36e5bffe-e989-194f-85c8-06e7bc88e6f7%40amazon.com

Вложения

Re: Jumble the CALL command in pg_stat_statements

От
Michael Paquier
Дата:
On Wed, Sep 13, 2023 at 12:48:48AM +0000, Imseih (AWS), Sami wrote:
> The patch also modifies existing test cases for CALL handling in pg_stat_statements
> and adds additional tests which prove that a CALL to an overloaded procedure
> will generate a different query_id.

+CALL overload(1);
+CALL overload('A');
[...]
+     1 |    0 | CALL overload($1)
+     1 |    0 | CALL overload($1)

That's not surprising to me.  We've historically relied on the
function OID in the jumbling of a FuncExpr, so I'm OK with that.  This
may look a bit surprising though if you have a schema that enforces
the same function name for several data types.  Having a DEFAULT does
this:
CREATE OR REPLACE PROCEDURE overload(i text, j bool DEFAULT true) AS
$$ DECLARE
  r text;
BEGIN
  SELECT i::text INTO r;
END; $$ LANGUAGE plpgsql;

Then with these three, and a jumbling based on the OID gives:
+CALL overload(1);
+CALL overload('A');
+CALL overload('A', false);
[...]
-     1 |    0 | CALL overload($1)
+     2 |    0 | CALL overload($1)

Still this grouping is much better than having thousands of entries
with different values.  I am not sure if we should bother improving
that more than what you suggest that, especially as FuncExpr->args can
itself include Const nodes as far as I recall.

> As far as the SET command mentioned in [1] is concerned, it is a bit more complex
> as it requires us to deal with A_Constants which is not very straightforward. We can surely
> deal with SET currently by applying custom query jumbling logic to VariableSetStmt,
> but this can be dealt with in a separate discussion.

As VariableSetStmt is the top-most node structure for SET/RESET
commands, using a custom implementation may be wise in this case,
particularly for the args made of A_Const.  I don't really want to go
down to the internals of A_Const outside its internal implementation,
as these can be used for some queries where there are multiple
keywords separated by whitespaces for one single A_Const, like
isolation level values in transaction commands.  This would lead to
appending the dollar-based variables in weird ways for some patterns.
Cough.

    /* transformed output-argument expressions */
-   List       *outargs pg_node_attr(query_jumble_ignore);
+   List       *outargs;

This choice is a bit surprising.  How does it influence the jumbling?
For example, if I add a query_jumble_ignore to it, the regression
tests of pg_stat_statements still pass.  This is going to require more
test coverage to prove that this addition is useful.
--
Michael

Вложения

Re: Jumble the CALL command in pg_stat_statements

От
"Imseih (AWS), Sami"
Дата:
> Still this grouping is much better than having thousands of entries
> with different values. I am not sure if we should bother improving
> that more than what you suggest that, especially as FuncExpr->args can
> itself include Const nodes as far as I recall.

I agree.

> As far as the SET command mentioned in [1] is concerned, it is a bit more complex
> as it requires us to deal with A_Constants which is not very straightforward. We can surely
> deal with SET currently by applying custom query jumbling logic to VariableSetStmt,
> but this can be dealt with in a separate discussion.

> As VariableSetStmt is the top-most node structure for SET/RESET
> commands, using a custom implementation may be wise in this case,

I do have a patch for this with test cases, 0001-v1-Jumble-the-SET-command.patch
If you feel this needs a separate discussion I can start one.

In the patch, the custom _jumbleVariableSetStmt jumbles
 the kind, name, is_local and number of arguments ( in case of a list ) 
and tracks the locations for normalization.

> This choice is a bit surprising. How does it influence the jumbling?
> For example, if I add a query_jumble_ignore to it, the regression
> tests of pg_stat_statements still pass. This is going to require more
> test coverage to prove that this addition is useful.

CALL with OUT or INOUT args is a bit strange, because
as the doc [1] mentions "Arguments must be supplied for all procedure parameters 
that lack defaults, including OUT parameters. However, arguments 
matching OUT parameters are not evaluated, so it's customary
to just write NULL for them."

so for pgss, passing a NULL or some other value into OUT/INOUT args should 
be normalized like IN args.

0001-v2-Jumble-the-CALL-command-in-pg_stat_statements.patch adds
these test cases.


Regards,

Sami Imseih
Amazon Web Services (AWS)

[1] https://www.postgresql.org/docs/current/sql-call.html


Вложения

Re: Jumble the CALL command in pg_stat_statements

От
Michael Paquier
Дата:
On Wed, Sep 13, 2023 at 11:09:19PM +0000, Imseih (AWS), Sami wrote:
> I do have a patch for this with test cases, 0001-v1-Jumble-the-SET-command.patch
> If you feel this needs a separate discussion I can start one.

Agreed tha tthis should have its own thread with a proper subject.

> In the patch, the custom _jumbleVariableSetStmt jumbles
>  the kind, name, is_local and number of arguments ( in case of a list )
> and tracks the locations for normalization.

There is much more going on here, like FunctionSetResetClause, or
AlterSystemStmt with its generic_reset.

+        foreach    (l, expr->args)
+        {
+            A_Const *ac = (A_Const *) lfirst(l);
+
+            if(ac->type != T_String)
+                RecordConstLocation(jstate, ac->location);
+        }

Even this part, I am not sure if it is always correct.  Couldn't we
have cases where String's A_Const had better be recorded as const?

> CALL with OUT or INOUT args is a bit strange, because
> as the doc [1] mentions "Arguments must be supplied for all procedure parameters
> that lack defaults, including OUT parameters. However, arguments
> matching OUT parameters are not evaluated, so it's customary
> to just write NULL for them."
>
> so for pgss, passing a NULL or some other value into OUT/INOUT args should
> be normalized like IN args.

I've been studying this one, and I can see why you're right here.
This feels much more natural to include.  The INOUT parameters get
registered twice at the same position, and the duplicates are
discarded by pg_stat_statements, which is OK.  The patch is straight
for the CALL part, so I have applied it.
--
Michael

Вложения