Обсуждение: Plan caching and serialization for reuse across executions
Hi hackers, I am currently exploring whether it is possible to cache a plan generated by the PostgreSQL planner and reuse it across laterexecutions. I understand that there are existing mechanisms like PREPARE/EXECUTE and CachedPlan in plancache.c, but these are tied toprepared statements and session-level usage. My interest is more on the research side: - Is there any supported (or experimental) way to serialize a Plan or PlannedStmt structure and reload it in a future execution,bypassing the planner? - If not, would it make sense to extend the existing CachedPlan mechanism to support serialization/deserialization, or arethere known architectural reasons why this approach is discouraged? - Any relevant references in the source code (planner, plancache.c, executor) that I should study would be very helpful. The motivation is research-oriented: I want to experiment with plan persistence as a way to reduce planning overhead andstudy trade-offs of reusing cached plans. Thanks a lot for your guidance! Best regards, Xuan Chen
Hi! On Sun, 17 Aug 2025 at 23:34, Xuan Chen <cx0222@vip.qq.com> wrote: > > - Any relevant references in the source code (planner, plancache.c, executor) that I should study would be very helpful. Here is extension implementing saving and restoring plans[0] [0]https://github.com/postgrespro/sr_plan -- Best regards, Kirill Reshke
On 18/8/2025 13:50, Kirill Reshke wrote: > Hi! > > On Sun, 17 Aug 2025 at 23:34, Xuan Chen <cx0222@vip.qq.com> wrote: >> >> - Any relevant references in the source code (planner, plancache.c, executor) that I should study would be very helpful. > > Here is extension implementing saving and restoring plans[0] > > > [0]https://github.com/postgrespro/sr_plan > I wouldn't recommend this obsolete version of the extension. Since the QueryId was introduced, we have redesigned it extensively, with the basic idea that QueryId serves as a 'query plan class' and that matching parse trees provide a proof of matching incoming queries with their corresponding [parameterised] plans. -- regards, Andrei Lepikhov
On 16/8/2025 14:32, Xuan Chen wrote: > Hi hackers, > > I am currently exploring whether it is possible to cache a plan generated by the PostgreSQL planner and reuse it acrosslater executions. > > I understand that there are existing mechanisms like PREPARE/EXECUTE and CachedPlan in plancache.c, but these are tiedto prepared statements and session-level usage. My interest is more on the research side: > > - Is there any supported (or experimental) way to serialize a Plan or PlannedStmt structure and reload it in a future execution,bypassing the planner? > - If not, would it make sense to extend the existing CachedPlan mechanism to support serialization/deserialization, orare there known architectural reasons why this approach is discouraged? > - Any relevant references in the source code (planner, plancache.c, executor) that I should study would be very helpful. > > The motivation is research-oriented: I want to experiment with plan persistence as a way to reduce planning overhead andstudy trade-offs of reusing cached plans. > > Thanks a lot for your guidance! Yes, it is possible. See [1] to find out how it is implemented in the Postgres architecture. We implemented 'freezing' of a statement with parameterisation defined by the user. Each incoming query (constant or parameterised) matches the QueryId. Then, with matching incoming and the stored parse tree, it finds the corresponding plan and proves that it may be used for the query (remember, rewriting rules, indexes, and other factors may change). Such a 'frozen' plan is serialised into the shared memory and is lazily transferred to all backends and used across all instances. There are lots of issues that still exist for me after the finish of this project. So, you have a whole room of problems that you can address in your research ;). [1] https://github.com/danolivo/conf/blob/main/2023-PGDay-Israel/sr-plan.pdf -- regards, Andrei Lepikhov
Andrei Lepikhov <lepihov@gmail.com> writes: > I wouldn't recommend this obsolete version of the extension. Since the > QueryId was introduced, we have redesigned it extensively, with the > basic idea that QueryId serves as a 'query plan class' and that matching > parse trees provide a proof of matching incoming queries with their > corresponding [parameterised] plans. That's fairly scary, considering all the work that's been done to intentionally fuzz different queries together to make them look like "the same query" to pg_stat_statements. While that's fine for pg_stat_statements, it makes it really questionable to assume that QueryId is a sufficient proof of semantic equivalence. (I recall having objected to the whole concept of moving queryjumble.c into core on the grounds that it could not serve multiple masters. I think this proves my point...) regards, tom lane
Hi, On Mon, Aug 18, 2025 at 10:18:34AM -0400, Tom Lane wrote: > Andrei Lepikhov <lepihov@gmail.com> writes: > > I wouldn't recommend this obsolete version of the extension. Since the > > QueryId was introduced, we have redesigned it extensively, with the > > basic idea that QueryId serves as a 'query plan class' and that matching > > parse trees provide a proof of matching incoming queries with their > > corresponding [parameterised] plans. > > That's fairly scary, considering all the work that's been done to > intentionally fuzz different queries together to make them look > like "the same query" to pg_stat_statements. While that's fine > for pg_stat_statements, it makes it really questionable to assume > that QueryId is a sufficient proof of semantic equivalence. > > (I recall having objected to the whole concept of moving queryjumble.c > into core on the grounds that it could not serve multiple masters. > I think this proves my point...) I don't think it really proves your point, the queryid is only a hash and collisions are possible. So even if the core jumbling didn't fuzz different queries together you would still need a secondary check to ensure correctness, and IIUC Andrei said that they use the actual parsetree for that. I wrote a prototype of extension to implement a shared plan cache some years ago and I also did something similar. Even if the queryid is not (and cannot be) perfect it's still highly useful in many situations.
On 18/8/2025 16:18, Tom Lane wrote: > Andrei Lepikhov <lepihov@gmail.com> writes: >> I wouldn't recommend this obsolete version of the extension. Since the >> QueryId was introduced, we have redesigned it extensively, with the >> basic idea that QueryId serves as a 'query plan class' and that matching >> parse trees provide a proof of matching incoming queries with their >> corresponding [parameterised] plans. > > That's fairly scary, considering all the work that's been done to > intentionally fuzz different queries together to make them look > like "the same query" to pg_stat_statements. While that's fine > for pg_stat_statements, it makes it really questionable to assume > that QueryId is a sufficient proof of semantic equivalence. > > (I recall having objected to the whole concept of moving queryjumble.c > into core on the grounds that it could not serve multiple masters. > I think this proves my point...) I recall that this feature is on the "we never want it" list. It was designed using a core patch and includes a custom, more selective query ID. That's why I recently tried to discuss the extensibility of query IDs in several threads. As mentioned earlier, the query ID is utilised to narrow the search space, resulting in no more than two to three additional costly checks, according to my research. This allows the feature to be both practical and efficient. The accuracy of the proof is determined by a parse tree match, along with a few additional checks on the existence of indexes and stuff like that. I have never considered contributing any part of this to the core, just as the pg_hint_plan does. -- regards, Andrei Lepikhov