Обсуждение: Plan caching and serialization for reuse across executions

Поиск
Список
Период
Сортировка

Plan caching and serialization for reuse across executions

От
"Xuan Chen"
Дата:
Hi hackers,

I am currently exploring whether it is possible to cache a plan generated by the PostgreSQL planner and reuse it across
laterexecutions. 

I understand that there are existing mechanisms like PREPARE/EXECUTE and CachedPlan in plancache.c, but these are tied
toprepared statements and session-level usage. My interest is more on the research side:  

- Is there any supported (or experimental) way to serialize a Plan or PlannedStmt structure and reload it in a future
execution,bypassing the planner? 
- If not, would it make sense to extend the existing CachedPlan mechanism to support serialization/deserialization, or
arethere known architectural reasons why this approach is discouraged? 
- Any relevant references in the source code (planner, plancache.c, executor) that I should study would be very
helpful.

The motivation is research-oriented: I want to experiment with plan persistence as a way to reduce planning overhead
andstudy trade-offs of reusing cached plans. 

Thanks a lot for your guidance!

Best regards,
Xuan Chen

Re: Plan caching and serialization for reuse across executions

От
Kirill Reshke
Дата:
Hi!

On Sun, 17 Aug 2025 at 23:34, Xuan Chen <cx0222@vip.qq.com> wrote:
>
> - Any relevant references in the source code (planner, plancache.c, executor) that I should study would be very
helpful.

Here is extension implementing saving and restoring plans[0]


[0]https://github.com/postgrespro/sr_plan

-- 
Best regards,
Kirill Reshke



Re: Plan caching and serialization for reuse across executions

От
Andrei Lepikhov
Дата:
On 18/8/2025 13:50, Kirill Reshke wrote:
> Hi!
> 
> On Sun, 17 Aug 2025 at 23:34, Xuan Chen <cx0222@vip.qq.com> wrote:
>>
>> - Any relevant references in the source code (planner, plancache.c, executor) that I should study would be very
helpful.
> 
> Here is extension implementing saving and restoring plans[0]
> 
> 
> [0]https://github.com/postgrespro/sr_plan
> 
I wouldn't recommend this obsolete version of the extension. Since the 
QueryId was introduced, we have redesigned it extensively, with the 
basic idea that QueryId serves as a 'query plan class' and that matching 
parse trees provide a proof of matching incoming queries with their 
corresponding [parameterised] plans.

-- 
regards, Andrei Lepikhov



Re: Plan caching and serialization for reuse across executions

От
Andrei Lepikhov
Дата:
On 16/8/2025 14:32, Xuan Chen wrote:
> Hi hackers,
> 
> I am currently exploring whether it is possible to cache a plan generated by the PostgreSQL planner and reuse it
acrosslater executions.
 
> 
> I understand that there are existing mechanisms like PREPARE/EXECUTE and CachedPlan in plancache.c, but these are
tiedto prepared statements and session-level usage. My interest is more on the research side:
 
> 
> - Is there any supported (or experimental) way to serialize a Plan or PlannedStmt structure and reload it in a future
execution,bypassing the planner?
 
> - If not, would it make sense to extend the existing CachedPlan mechanism to support serialization/deserialization,
orare there known architectural reasons why this approach is discouraged?
 
> - Any relevant references in the source code (planner, plancache.c, executor) that I should study would be very
helpful.
> 
> The motivation is research-oriented: I want to experiment with plan persistence as a way to reduce planning overhead
andstudy trade-offs of reusing cached plans.
 
> 
> Thanks a lot for your guidance!
Yes, it is possible. See [1] to find out how it is implemented in the 
Postgres architecture.
We implemented 'freezing' of a statement with parameterisation defined 
by the user. Each incoming query (constant or parameterised) matches the 
QueryId. Then, with matching incoming and the stored parse tree, it 
finds the corresponding plan and proves that it may be used for the 
query (remember, rewriting rules, indexes, and other factors may change).
Such a 'frozen' plan is serialised into the shared memory and is lazily 
transferred to all backends and used across all instances.
There are lots of issues that still exist for me after the finish of 
this project. So, you have a whole room of problems that you can address 
in your research ;).

[1] https://github.com/danolivo/conf/blob/main/2023-PGDay-Israel/sr-plan.pdf

-- 
regards, Andrei Lepikhov



Re: Plan caching and serialization for reuse across executions

От
Tom Lane
Дата:
Andrei Lepikhov <lepihov@gmail.com> writes:
> I wouldn't recommend this obsolete version of the extension. Since the 
> QueryId was introduced, we have redesigned it extensively, with the 
> basic idea that QueryId serves as a 'query plan class' and that matching 
> parse trees provide a proof of matching incoming queries with their 
> corresponding [parameterised] plans.

That's fairly scary, considering all the work that's been done to
intentionally fuzz different queries together to make them look
like "the same query" to pg_stat_statements.  While that's fine
for pg_stat_statements, it makes it really questionable to assume
that QueryId is a sufficient proof of semantic equivalence.

(I recall having objected to the whole concept of moving queryjumble.c
into core on the grounds that it could not serve multiple masters.
I think this proves my point...)

            regards, tom lane



Re: Plan caching and serialization for reuse across executions

От
Julien Rouhaud
Дата:
Hi,

On Mon, Aug 18, 2025 at 10:18:34AM -0400, Tom Lane wrote:
> Andrei Lepikhov <lepihov@gmail.com> writes:
> > I wouldn't recommend this obsolete version of the extension. Since the
> > QueryId was introduced, we have redesigned it extensively, with the
> > basic idea that QueryId serves as a 'query plan class' and that matching
> > parse trees provide a proof of matching incoming queries with their
> > corresponding [parameterised] plans.
>
> That's fairly scary, considering all the work that's been done to
> intentionally fuzz different queries together to make them look
> like "the same query" to pg_stat_statements.  While that's fine
> for pg_stat_statements, it makes it really questionable to assume
> that QueryId is a sufficient proof of semantic equivalence.
>
> (I recall having objected to the whole concept of moving queryjumble.c
> into core on the grounds that it could not serve multiple masters.
> I think this proves my point...)

I don't think it really proves your point, the queryid is only a hash and
collisions are possible.  So even if the core jumbling didn't fuzz different
queries together you would still need a secondary check to ensure correctness,
and IIUC Andrei said that they use the actual parsetree for that.  I wrote a
prototype of extension to implement a shared plan cache some years ago and I
also did something similar.  Even if the queryid is not (and cannot be) perfect
it's still highly useful in many situations.



Re: Plan caching and serialization for reuse across executions

От
Andrei Lepikhov
Дата:
On 18/8/2025 16:18, Tom Lane wrote:
> Andrei Lepikhov <lepihov@gmail.com> writes:
>> I wouldn't recommend this obsolete version of the extension. Since the
>> QueryId was introduced, we have redesigned it extensively, with the
>> basic idea that QueryId serves as a 'query plan class' and that matching
>> parse trees provide a proof of matching incoming queries with their
>> corresponding [parameterised] plans.
> 
> That's fairly scary, considering all the work that's been done to
> intentionally fuzz different queries together to make them look
> like "the same query" to pg_stat_statements.  While that's fine
> for pg_stat_statements, it makes it really questionable to assume
> that QueryId is a sufficient proof of semantic equivalence.
> 
> (I recall having objected to the whole concept of moving queryjumble.c
> into core on the grounds that it could not serve multiple masters.
> I think this proves my point...)
I recall that this feature is on the "we never want it" list. It was 
designed using a core patch and includes a custom, more selective query 
ID. That's why I recently tried to discuss the extensibility of query 
IDs in several threads.
As mentioned earlier, the query ID is utilised to narrow the search 
space, resulting in no more than two to three additional costly checks, 
according to my research. This allows the feature to be both practical 
and efficient. The accuracy of the proof is determined by a parse tree 
match, along with a few additional checks on the existence of indexes 
and stuff like that.
I have never considered contributing any part of this to the core, just 
as the pg_hint_plan does.

-- 
regards, Andrei Lepikhov