Re: Generating code for query jumbling through gen_node_support.pl
От | Michael Paquier |
---|---|
Тема | Re: Generating code for query jumbling through gen_node_support.pl |
Дата | |
Msg-id | Y8eZ9/3STyhF57RV@paquier.xyz обсуждение исходный текст |
Ответ на | Re: Generating code for query jumbling through gen_node_support.pl (Michael Paquier <michael@paquier.xyz>) |
Ответы |
Re: Generating code for query jumbling through gen_node_support.pl
|
Список | pgsql-hackers |
On Tue, Jan 17, 2023 at 04:52:28PM +0900, Michael Paquier wrote: > On Tue, Jan 17, 2023 at 08:43:44AM +0100, Peter Eisentraut wrote: >> Ok, I understand now, and I agree with this approach over the opposite. I >> was confused because the snippet you showed above used "jumble_ignore", but >> your patch is correct as it uses "jumble_location". > > Okay. I'll refresh the patch set so as we have only "jumble_ignore", > then, like v1, with preparatory patches for what you mentioned and > anything that comes into mind. This is done as of the patch series v3 attached: - 0001 reformats all the comments of the nodes. - 0002 moves the current files for query jumble as of queryjumble.c -> queryjumblefuncs.c and utils/queryjumble.h -> nodes/queryjumble.h. - 0003 is the core feature, where I have done a second pass over the nodes to make sure that things map with HEAD, incorporating the extra docs coming from v2, adding a bit more. >> That said, the term "jumble" is really weird, because in the sense that we >> are using it here it means, approximately, "to mix together", "to unify". >> So what we are doing with the Const nodes is really to *not* jumble the >> location, but for all other node types we are jumbling the location. At >> least that is my understanding. > > I am quite familiar with this term, FWIW. That's what we've inherited > from the days where this has been introduced in pg_stat_statements. I have renamed the node attributes to query_jumble_ignore and no_query_jumble at the end, after considering Peter's point that only "jumble" could be fuzzy here. The file names are changed in consequence. While doing all that, I have done some micro-benchmarking of JumbleQuery(), making it loop 50M times on my laptop each time a query ID is computed (hideous hack with a loop in queryjumble.c): - For non-utility queries, aka queries that go through JumbleQueryInternal(), I am measuring a repeatable ~10% improvement with the generated code over HEAD, which is kind of nice. I have tested a few DMLs and simple SELECTs, still it looks like a trend. - For utility queries, the new code is competing against hash_any_extended() with the query string, which is going to be hard to beat.. I am measuring what looks like a 5 times slowdown, at least, and more depending on the depth of the query tree. That's not surprising. On a 50M loop, this comes down to compare a computation of 100ns to 5ns for a 20-time slowdown for example, still this could justify the addition of a GUC to control whether utility queries have their query ID compiled depending on their nodes or their string hash, as this could become noticeable in OLTP workloads with loads of short statements and BEGIN/COMMIT queries? Thoughts or comments? -- Michael
Вложения
В списке pgsql-hackers по дате отправления: