Обсуждение: Bug in pg_stat_statements
Hi hackers. In PG18 gthere is bug either in pg_stat_statements.c, either in queryjumblefuncs.c. Repro (you need Postgres build with asserts and pg_stat_statements included in shared_preload_librarties: create function f(a integer[], out x integer, out y integer) returns record as $$ begin x = a[1]; y = a[2]; end; $$ language plpgsql; create table t(x integer); select ((f(array[1,2]))).* from t; The problems is caused by ((...)).* which cause to include the same array literal twice in JumbleQuery. But pg_stat_stataement assumes that any occurrence of constant is unique and it cause assertion failure here: /* Copy next chunk (what precedes the next constant) */ len_to_wrt = off - last_off; len_to_wrt -= last_tok_len; Assert(len_to_wrt >= 0); So we should either exclude duplicates in RecordConstLocation (queryjumblefuncs.c) either in generate_normalized_query in pg_stat_statements.c What is considered to be more correct?
> In PG18 gthere is bug either in pg_stat_statements.c, either in
> queryjumblefuncs.c.
>
> Repro (you need Postgres build with asserts and pg_stat_statements
> included in shared_preload_librarties:
>
> create function f(a integer[], out x integer, out y integer) returns
> record as $$ begin
> x = a[1];
> y = a[2];
> end;
> $$ language plpgsql;
>
> create table t(x integer);
> select ((f(array[1,2]))).* from t;
>
> The problems is caused by ((...)).* which cause to include the same
> array literal twice in JumbleQuery.
> But pg_stat_stataement assumes that any occurrence of constant is unique
> and it cause assertion failure here:
>
> /* Copy next chunk (what precedes the next constant) */
> len_to_wrt = off - last_off;
> len_to_wrt -= last_tok_len;
> Assert(len_to_wrt >= 0);
>
> So we should either exclude duplicates in RecordConstLocation
> (queryjumblefuncs.c)
You are correct in the analysis that this is because we are
recording the same constant multiple times, this causes the
len_to_write to be 0 on the second occurance of the duplicated
location. I confirmed that this was introduced in the constant list
squashing per 0f65f3eec.
The same location being recorded twice will occur because of
row expansion, therefore the same location will be jumbled twice,
as observed in the explain output.
```
postgres=# explain verbose select ((f(array[1,2]))).* from t;
QUERY PLAN
----------------------------------------------------------------
Seq Scan on public.t (cost=0.00..1310.50 rows=2550 width=8)
Output: (f('{1,2}'::integer[])).x, (f('{1,2}'::integer[])).y
Query Identifier: 5305868219688930530
(3 rows)
```
Before 0f65f3eec, this was not an issue, because we recorded
all the locations.
I believe the correct answer is to fix this in queryjumblefuncs.c
and more specifically inside _jumbleElements. We should not
record the same location twice. We can do this by tracking the
start location of the last recorded constant location, and we
can skip recording it if we see it again. See the attached
which also includes test cases.
Also, inside the loop of generate_normalized_query in
pg_stat_statements we are asserting len_to_wrt >= 0, so we
are OK with memcpy of 0 bytes. This is valid, but does
not seem correct.
```
Assert(len_to_wrt > 0);
memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
```
Should we continue the loop and skip the memcpy
if (len_to_wrt == 0) instead?
--
Sami Imseih
Amazon Web Services (AWS)
Вложения
> On Thu, Oct 23, 2025 at 07:49:55PM -0500, Sami Imseih wrote:
> I believe the correct answer is to fix this in queryjumblefuncs.c
> and more specifically inside _jumbleElements. We should not
> record the same location twice. We can do this by tracking the
> start location of the last recorded constant location, and we
> can skip recording it if we see it again. See the attached
> which also includes test cases.
I think there is another option. Before squashing implementation, duplicated
constants were dealt with in fill_in_constant_lengths -- after processing their
length was -1 and such constants were skipped further down the line. After
0f65f3eec, if such a constant is supposed to be squashed we now ignore it, but
set the length anyway in RecordConstLocation:
if (locs[i].squashed)
continue; /* squashable list, ignore */
if (loc <= last_loc)
continue; /* Duplicate constant, ignore */
This sort of short cut the verification for duplicated constants. Having said
that it seems that another solution would be to check for duplicated constants
in fill_in_constant_lengths just before skipping squashed constants and reset
the length if needed.
> This sort of short cut the verification for duplicated constants. Having said > that it seems that another solution would be to check for duplicated constants > in fill_in_constant_lengths just before skipping squashed constants and reset > the length if needed. Hi, yes, I was not convinced my initial idea is good, and I ended up finding another case that still breaks: ``` SELECT (ROW(ARRAY[1, 2], ARRAY[1, 2, 3])).*; ``` Here we have multiple squashable lists, and simply checking that the location we are trying to record != the last location does not work. Also, checking that the location we are trying to record is ahead of the last location also does not work because the locations of squashable lists in the predicate will be recorded first. So, we can't rely on the location we want to record being higher than previous locations. For example: ``` SELECT (ROW(ARRAY[1, 2], ARRAY[1, 2, 3])).* FROM t WHERE x IN (1, 2, 3); ``` > Having said > that it seems that another solution would be to check for duplicated constants > in fill_in_constant_length Yes, I started thinking along these lines as well, where we check for duplicates in fill_in_constant_length; or after jumbling, we de-duplicate locations if we have a squashable list, which is what I have attached with updated test cases. This means we do need to scan the locations one extra time during jumbling, but I don't see that as a problem. Maybe there is another better way? -- Sami Imseih Amazon Web Services (AWS)
Вложения
> On Fri, Oct 24, 2025 at 09:17:22AM -0500, Sami Imseih wrote:
> > Having said
> > that it seems that another solution would be to check for duplicated constants
> > in fill_in_constant_length
>
> Yes, I started thinking along these lines as well, where we check for
> duplicates
> in fill_in_constant_length; or after jumbling, we de-duplicate locations if we
> have a squashable list, which is what I have attached with updated test cases.
>
> This means we do need to scan the locations one extra time during jumbling,
> but I don't see that as a problem. Maybe there is another better way?
Why? What I hand in mind is something like this, after a quick test it seems to
be able to address both the original case and the one you've discovered.
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index f2187167..f17a2b79 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -3008,11 +3008,17 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
Assert(loc >= 0);
- if (locs[i].squashed)
- continue; /* squashable list, ignore */
-
if (loc <= last_loc)
+ {
+ locs[i].length = -1;
continue; /* Duplicate constant, ignore */
+ }
+
+ if (locs[i].squashed)
+ {
+ last_loc = loc;
+ continue; /* squashable list, ignore */
+ }
/* Lex tokens until we find the desired constant */
for (;;)
> > > Having said
> > > that it seems that another solution would be to check for duplicated constants
> > > in fill_in_constant_length
> >
> > Yes, I started thinking along these lines as well, where we check for
> > duplicates
> > in fill_in_constant_length; or after jumbling, we de-duplicate locations if we
> > have a squashable list, which is what I have attached with updated test cases.
> >
> > This means we do need to scan the locations one extra time during jumbling,
> > but I don't see that as a problem. Maybe there is another better way?
>
> Why? What I hand in mind is something like this, after a quick test it seems to
> be able to address both the original case and the one you've discovered.
ahh, what you have works because clocations is sorted by location before
the loop starts in `fill_in_constant_lengths` , so comparing locations
makes sense in this case. I am OK with this.
```
/*
* Sort the records by location so that we can process them in order while
* scanning the query text.
*/
if (jstate->clocations_count > 1)
qsort(jstate->clocations, jstate->clocations_count,
sizeof(LocationLen), comp_location);
```
I was really hoping that the fix could be inside of query jumbling, as I think
pg_stat_statements or any consumers of query jumbling don't need to
care more about squashing ( or other internals of jumbling ).
So now I also wonder if we should also move:
```
static char *generate_normalized_query(JumbleState *jstate, const char *query,
int query_loc, int *query_len_p);
static void fill_in_constant_lengths(JumbleState *jstate, const char *query,
int query_loc);
```
from pg_stat_statements.c to queryjumblefuncs.c?
--
Sami Imseih
Amazon Web Services (AWS)
> > > > Having said > > > > that it seems that another solution would be to check for duplicated constants > > > > in fill_in_constant_length > > > > > > Yes, I started thinking along these lines as well, where we check for > > > duplicates > > > in fill_in_constant_length; or after jumbling, we de-duplicate locations if we > > > have a squashable list, which is what I have attached with updated test cases. > > > > > > This means we do need to scan the locations one extra time during jumbling, > > > but I don't see that as a problem. Maybe there is another better way? > > > > Why? What I hand in mind is something like this, after a quick test it seems to > > be able to address both the original case and the one you've discovered. > > ahh, what you have works because clocations is sorted by location before > the loop starts in `fill_in_constant_lengths` , so comparing locations > makes sense in this case. I am OK with this. v3-0001 does what Dimiti suggests with some slight tweaks: 1/ Moving "last_loc = loc;" earlier in the loop 2/ removing a comment for fill_in_constant_lengths that was an oversight from 0f65f3eec, as we no longer assume that the constant location is -1 when entering that routine. Also added the test cases that were discovered, besides the one from the original report. > I was really hoping that the fix could be inside of query jumbling, as I think > pg_stat_statements or any consumers of query jumbling don't need to > care more about squashing ( or other internals of jumbling ). > So now I also wonder if we should also move: > > ``` > static char *generate_normalized_query(JumbleState *jstate, const char *query, > int query_loc, int *query_len_p); > > static void fill_in_constant_lengths(JumbleState *jstate, const char *query, > int query_loc); > ``` > > from pg_stat_statements.c to queryjumblefuncs.c? > > v3-0002 moved both generate_normalized_query and fill_in_constant_lengths to queryjumblefuncs.c, as these routines deal with the internal of jumbling and should not be a concern of pg_stat_statements. I think this is a good cleanup, but others may have different opinions on this. -- Sami Imseih Amazon Web Services (AWS)
Вложения
v4 corrects some code comments. Also, I created a CF entry [0]. [0] https://commitfest.postgresql.org/patch/6167/
Вложения
> On Fri, Oct 24, 2025 at 07:04:59PM -0500, Sami Imseih wrote: > v4 corrects some code comments. The fix in the first patch looks good, thanks. Not so sure about the refactoring in the second patch -- generate_normalized_query is being used only in one place, in pg_stat_statements, moving it out somewhere else without having other clients seems to be questionable to me. P.S. Adding Álvaro as a commiter of the affected feature, maybe he will help us to apply the fix.
Thanks for reviewing! >Not so sure about the > refactoring in the second patch -- generate_normalized_query is being > used only in one place, in pg_stat_statements, moving it out somewhere > else without having other clients seems to be questionable to me. The excellent question raised earlier is whether the fix should be applied in pg_stat_statements.c or queryjumblefuncs.c. To me, this suggests that pg_stat_statements, as an extension, is dealing with code it should not be responsible for. generate_normalized_query could be used by other extensions that have normalization needs; it’s generic and needs to handle squashing, which is a jumble specific detail. Therefore, I think both fill_in_constant_lengths and generate_normalized_query should be moved. We can certainly start a new thread (0002) if more discussion is needed. -- Sami Imseih Amazon Web Services (AWS)
On Mon, Oct 27, 2025 at 02:34:50PM -0500, Sami Imseih wrote: > The excellent question raised earlier is whether the fix should be > applied in pg_stat_statements.c or queryjumblefuncs.c. To me, this > suggests that pg_stat_statements, as an extension, is dealing with code > it should not be responsible for. > > generate_normalized_query could be used by other extensions that have > normalization needs; it’s generic and needs to handle squashing, which > is a jumble specific detail. FWIW, the line defining the separation between pg_stat_statements.c and queryjumblefuncs.c should rely on a pretty simple concept: JumbleState can be written only by queryjumblefuncs.c and src/backend/nodes/, and should remain in an untouched constant state when looked at by any external module loaded, included PGSS, because it could be shared across more than one paths. That's just to say that I can get behind the argument you are presenting in patch 0002, to move the updates of JumbleState.clocations[N].length to happen inside the query jumbling code. This relies on a query string anyway, which would be the same for all the modules interested in interacting in a planner, analyze or execution hook. Now, I don't think that 0002 goes far enough: could it be possible to do things so as we enforce a policy where modules cannot touch a JumbleState at all? That would mean painting more consts to prevent manipulations of the Jumble state. > Therefore, I think both fill_in_constant_lengths and > generate_normalized_query should be moved. We can certainly start a new > thread (0002) if more discussion is needed. Yeah, that should be a separate discussion. Let's focus on the bug at hand, first, then consider improvements that could be done moving forward. -- Michael
Вложения
On 2025-Oct-28, Michael Paquier wrote: > FWIW, the line defining the separation between pg_stat_statements.c > and queryjumblefuncs.c should rely on a pretty simple concept: > JumbleState can be written only by queryjumblefuncs.c and > src/backend/nodes/, and should remain in an untouched constant state > when looked at by any external module loaded, included PGSS, because > it could be shared across more than one paths. I can get behind this as a principle. > That's just to say that I can get behind the argument you are > presenting in patch 0002, to move the updates of > JumbleState.clocations[N].length to happen inside the query jumbling > code. This relies on a query string anyway, which would be the same > for all the modules interested in interacting in a planner, analyze or > execution hook. Hmm, yeah, now that you mention this, it seems rather odd that pgss_store() is messing with (modifying) the jstate it is given. If we had multiple modules using a planner_hook to interact with the query representation, what each module sees would be different depending on which hook is called first, which sounds wrong. Maybe, as you say, we do need to consider this a design flaw that should be fixed in a more principled fashion ... and it's pretty hard to see that we could backpatch any such fixes. It's a pretty old issue though (you probably notice I hesitate to call it a bug.) So I agree with you: we should fix the specific bug first across branches, and the reworking of the jumbling framework should be done afterwards in master only. I'm going to get the first patch pushed, after looking at it more carefully. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Update: super-fast reaction on the Postgres bugs mailing list. The report was acknowledged [...], and a fix is under discussion. The wonders of open-source !" https://twitter.com/gunnarmorling/status/1596080409259003906
On 2025-Oct-26, Dmitry Dolgov wrote: > > On Fri, Oct 24, 2025 at 07:04:59PM -0500, Sami Imseih wrote: > > v4 corrects some code comments. > > The fix in the first patch looks good, thanks. Yeah, I think this general idea is sensible. However, I think we should take it one step further and just remove last_loc entirely. I think this makes the code a bit clearer. How about the attached? Regarding 0002, as I said on my reply to Michael I think this is a good idea on principle, but I suggest to discuss that in a separate thread. I would seek routine names that match the current ones in queryjumble.h a little better though. > P.S. Adding Álvaro as a commiter of the affected feature, maybe he will > help us to apply the fix. Thanks for doing that, I'd have not noticed the thread otherwise. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Pero la cosa no es muy grave ..." (le petit Nicolas -- René Goscinny)
Вложения
>
> > > On Fri, Oct 24, 2025 at 07:04:59PM -0500, Sami Imseih wrote:
> > > v4 corrects some code comments.
> >
> > The fix in the first patch looks good, thanks.
>
> Yeah, I think this general idea is sensible. However, I think we should
> take it one step further and just remove last_loc entirely. I think
> this makes the code a bit clearer. How about the attached?
getting rid of last_loc makes sense because the list is sorted makes
sense. I like this, definitely cleaner.
One minor comment is to change is to remove the "let's save it" but
in the comments, as we are no longer saving a last_loc.
/*
- * We have a valid location for a constant that's not
a dupe, let's
- * save it. Lex tokens until we find the desired constant.
+ * We have a valid location for a constant that's not
a dupe. We Lex
+ * tokens until we find the desired constant.
*/
--
Sami
On Tue, Oct 28, 2025 at 12:31:23PM +0200, Alvaro Herrera wrote: > Hmm, yeah, now that you mention this, it seems rather odd that > pgss_store() is messing with (modifying) the jstate it is given. If we > had multiple modules using a planner_hook to interact with the query > representation, what each module sees would be different depending on > which hook is called first, which sounds wrong. Maybe, as you say, we > do need to consider this a design flaw that should be fixed in a more > principled fashion ... and it's pretty hard to see that we could > backpatch any such fixes. It's a pretty old issue though (you probably > notice I hesitate to call it a bug.) It's hard to call that a bug when nothing in core is broken directly because of it, we are just assuming that it impacts other extensions. Now we have always been kind of bad in terms of defining what loaded extensions should or not should be able to do, also depending on the order their hooks are loaded. > So I agree with you: we should fix > the specific bug first across branches, and the reworking of the > jumbling framework should be done afterwards in master only. Agreedn. > I'm going to get the first patch pushed, after looking at it more > carefully. Cool, thanks! -- Michael
Вложения
On 2025-Oct-28, Sami Imseih wrote: > getting rid of last_loc makes sense because the list is sorted. I like > this, definitely cleaner. > > One minor comment is to change is to remove the "let's save it" but > in the comments, as we are no longer saving a last_loc. Ah, yeah, that's a good change. I have pushed it now. Thanks! -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Entristecido, Wutra (canción de Las Barreras) echa a Freyr a rodar y a nosotros al mar"
> > One minor comment is to change is to remove the "let's save it" but > > in the comments, as we are no longer saving a last_loc. > > Ah, yeah, that's a good change. I have pushed it now. Thanks for pushing! I will take 0002 on in a separate thread as there seems to be agreement on doing something about it. -- Sami Imseih Amazon Web Services (AWS)