Обсуждение: Jumble the CALL command in pg_stat_statements
Hi,
The proposal by Bertrand in CC to jumble CALL and SET in [1] was
rejected at the time for a more robust solution to jumble DDL.
Michael also in CC made this possible with commit 3db72ebcbe.
The attached patch takes advantage of the jumbling infrastructure
added in the above mentioned commit and jumbles the CALL statement
in pg_stat_statements.
The patch also modifies existing test cases for CALL handling in pg_stat_statements
and adds additional tests which prove that a CALL to an overloaded procedure
will generate a different query_id.
As far as the SET command mentioned in [1] is concerned, it is a bit more complex
as it requires us to deal with A_Constants which is not very straightforward. We can surely
deal with SET currently by applying custom query jumbling logic to VariableSetStmt,
but this can be dealt with in a separate discussion.
Regards,
Sami Imseih
Amazon Web Services (AWS)
[1] https://www.postgresql.org/message-id/flat/36e5bffe-e989-194f-85c8-06e7bc88e6f7%40amazon.com
Вложения
On Wed, Sep 13, 2023 at 12:48:48AM +0000, Imseih (AWS), Sami wrote: > The patch also modifies existing test cases for CALL handling in pg_stat_statements > and adds additional tests which prove that a CALL to an overloaded procedure > will generate a different query_id. +CALL overload(1); +CALL overload('A'); [...] + 1 | 0 | CALL overload($1) + 1 | 0 | CALL overload($1) That's not surprising to me. We've historically relied on the function OID in the jumbling of a FuncExpr, so I'm OK with that. This may look a bit surprising though if you have a schema that enforces the same function name for several data types. Having a DEFAULT does this: CREATE OR REPLACE PROCEDURE overload(i text, j bool DEFAULT true) AS $$ DECLARE r text; BEGIN SELECT i::text INTO r; END; $$ LANGUAGE plpgsql; Then with these three, and a jumbling based on the OID gives: +CALL overload(1); +CALL overload('A'); +CALL overload('A', false); [...] - 1 | 0 | CALL overload($1) + 2 | 0 | CALL overload($1) Still this grouping is much better than having thousands of entries with different values. I am not sure if we should bother improving that more than what you suggest that, especially as FuncExpr->args can itself include Const nodes as far as I recall. > As far as the SET command mentioned in [1] is concerned, it is a bit more complex > as it requires us to deal with A_Constants which is not very straightforward. We can surely > deal with SET currently by applying custom query jumbling logic to VariableSetStmt, > but this can be dealt with in a separate discussion. As VariableSetStmt is the top-most node structure for SET/RESET commands, using a custom implementation may be wise in this case, particularly for the args made of A_Const. I don't really want to go down to the internals of A_Const outside its internal implementation, as these can be used for some queries where there are multiple keywords separated by whitespaces for one single A_Const, like isolation level values in transaction commands. This would lead to appending the dollar-based variables in weird ways for some patterns. Cough. /* transformed output-argument expressions */ - List *outargs pg_node_attr(query_jumble_ignore); + List *outargs; This choice is a bit surprising. How does it influence the jumbling? For example, if I add a query_jumble_ignore to it, the regression tests of pg_stat_statements still pass. This is going to require more test coverage to prove that this addition is useful. -- Michael
Вложения
> Still this grouping is much better than having thousands of entries > with different values. I am not sure if we should bother improving > that more than what you suggest that, especially as FuncExpr->args can > itself include Const nodes as far as I recall. I agree. > As far as the SET command mentioned in [1] is concerned, it is a bit more complex > as it requires us to deal with A_Constants which is not very straightforward. We can surely > deal with SET currently by applying custom query jumbling logic to VariableSetStmt, > but this can be dealt with in a separate discussion. > As VariableSetStmt is the top-most node structure for SET/RESET > commands, using a custom implementation may be wise in this case, I do have a patch for this with test cases, 0001-v1-Jumble-the-SET-command.patch If you feel this needs a separate discussion I can start one. In the patch, the custom _jumbleVariableSetStmt jumbles the kind, name, is_local and number of arguments ( in case of a list ) and tracks the locations for normalization. > This choice is a bit surprising. How does it influence the jumbling? > For example, if I add a query_jumble_ignore to it, the regression > tests of pg_stat_statements still pass. This is going to require more > test coverage to prove that this addition is useful. CALL with OUT or INOUT args is a bit strange, because as the doc [1] mentions "Arguments must be supplied for all procedure parameters that lack defaults, including OUT parameters. However, arguments matching OUT parameters are not evaluated, so it's customary to just write NULL for them." so for pgss, passing a NULL or some other value into OUT/INOUT args should be normalized like IN args. 0001-v2-Jumble-the-CALL-command-in-pg_stat_statements.patch adds these test cases. Regards, Sami Imseih Amazon Web Services (AWS) [1] https://www.postgresql.org/docs/current/sql-call.html
Вложения
On Wed, Sep 13, 2023 at 11:09:19PM +0000, Imseih (AWS), Sami wrote: > I do have a patch for this with test cases, 0001-v1-Jumble-the-SET-command.patch > If you feel this needs a separate discussion I can start one. Agreed tha tthis should have its own thread with a proper subject. > In the patch, the custom _jumbleVariableSetStmt jumbles > the kind, name, is_local and number of arguments ( in case of a list ) > and tracks the locations for normalization. There is much more going on here, like FunctionSetResetClause, or AlterSystemStmt with its generic_reset. + foreach (l, expr->args) + { + A_Const *ac = (A_Const *) lfirst(l); + + if(ac->type != T_String) + RecordConstLocation(jstate, ac->location); + } Even this part, I am not sure if it is always correct. Couldn't we have cases where String's A_Const had better be recorded as const? > CALL with OUT or INOUT args is a bit strange, because > as the doc [1] mentions "Arguments must be supplied for all procedure parameters > that lack defaults, including OUT parameters. However, arguments > matching OUT parameters are not evaluated, so it's customary > to just write NULL for them." > > so for pgss, passing a NULL or some other value into OUT/INOUT args should > be normalized like IN args. I've been studying this one, and I can see why you're right here. This feels much more natural to include. The INOUT parameters get registered twice at the same position, and the duplicates are discarded by pg_stat_statements, which is OK. The patch is straight for the CALL part, so I have applied it. -- Michael