Re: generic plans and "initial" pruning
| От | Amit Langote |
|---|---|
| Тема | Re: generic plans and "initial" pruning |
| Дата | |
| Msg-id | CA+HiwqFpEHBjosRackQhm6yKKnHgqm8Ewkn=qsctT1N0PqVSrg@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: generic plans and "initial" pruning (Amit Langote <amitlangote09@gmail.com>) |
| Список | pgsql-hackers |
On Wed, Nov 12, 2025 at 11:17 PM Amit Langote <amitlangote09@gmail.com> wrote: > The key idea is to avoid taking unnecessary locks when reusing a > cached plan. To achieve that, we need to perform initial partition > pruning during cached plan reuse in plancache.c so that only surviving > partitions are locked. This requires some plumbing to reuse the result > of this "early" pruning during executor startup, because repeating the > pruning logic would be both inefficient and potentially inconsistent > -- what if you get different results the second time? (I don't have > proof that this can happen, but some earlier emails mention the > theoretical risk, so better to be safe.) > > So this patch introduces ExecutorPrep(), which allows executor > metadata such as initial pruning results (valid subplan indexes) and > full unpruned_relids to be computed ahead of execution and reused > later by ExecutorStart() and during QueryDesc setup in parallel > workers using the results shared by the leader. The parallel query bit > was discussed previously at [1], though I didn’t have a solution I > liked then. > ... > The patch set is structured as follows: > > * Refactor partition pruning initialization (0001): separates the > setup of the pruning state from its execution by introducing > ExecCreatePartitionPruneStates(). This makes the pruning logic easier > to reuse and adds flexibility to do only the setup but skip pruning in > some cases. > > * Introduce ExecutorPrep infrastructure (0002): adds ExecutorPrep() > and ExecPrep as a formal way to perform executor setup ahead of > execution. This enables caching or transferring pruning results and > other metadata without triggering execution. ExecutorStart() can now > consume precomputed prep state from the EState created during > ExecutorPrep(). ExecPrepCleanup() handles cleanup when the plan is > invalidated during prep and so not executed; the state is cleaned up > in the regular ExecutorEnd() path otherwise. In v1 patch, I had not made ExecutorStart() call ExecutorPrep() to do the prep work (creating EState, setting up es_relations, checking permissions) when QueryDesc did not carry the results of ExecutorPrep() from some earlier stage. Instead, InitPlan() would detect that prep was absent and perform the missing setup itself. On second thought it is cleaner for ExecutorStart() to detect the absence of prep and call ExecutorPrep() directly, matching how prep would be created when coming from plancache et al. v2 changes the patch to do that. > * Enable pruning-aware locking in cached / generic plan reuse (0004): > extends GetCachedPlan() and CheckCachedPlan() to call ExecutorPrep() > on each PlannedStmt in the CachedPlan, locking only surviving > partitions. Adds CachedPlanPrepData to pass this through plan cache > APIs and down to execution via QueryDesc. Also reinstates the > firstResultRel locking rule added in 28317de72 but later lost due to > revert of the earlier pruning patch, to ensure correctness when all > target partitions are pruned. Looking at the changes to executor/function.c, I also noticed that I had mistakenly allocated the ExecutorPrep state in SQLFunctionCache.fcontext whereas the correct context for execution related state is SQLFunctionCache.subcontext. In the updated patch, I've made postquel_start() reparent the prep EState's es_query_cxt to subcontext from fcontext. I also did not have a test case that exercised cached plan reuse for SQL functions, so I added one. I split the function.c's GetCachedPlan() + CachedPlanPrepData plumbing into a new patch 0005 so it can be reviewed separately, since it is the only non-mechanical call-site change. > Benchmark results: > > echo "plan_cache_mode = force_generic_plan" >> $PGDATA/postgresql.conf > for p in 32 64 128 256 512 1024; do pgbench -i --partitions=$p > > /dev/null 2>&1; echo -ne "$p\t"; pgbench -n -S -T10 -Mprepared | grep > tps; done > > Master > > 32 tps = 23841.822407 (without initial connection time) > 64 tps = 21578.619816 (without initial connection time) > 128 tps = 18090.500707 (without initial connection time) > 256 tps = 14152.248201 (without initial connection time) > 512 tps = 9432.708423 (without initial connection time) > 1024 tps = 5873.696475 (without initial connection time) > > Patched > > 32 tps = 24724.245798 (without initial connection time) > 64 tps = 24858.206407 (without initial connection time) > 128 tps = 24652.655269 (without initial connection time) > 256 tps = 23656.756615 (without initial connection time) > 512 tps = 22299.865769 (without initial connection time) > 1024 tps = 21911.704317 (without initial connection time) Re-ran to include 0 partition case and more partitions than 1024: echo "plan_cache_mode = force_generic_plan" >> $PGDATA/postgresql.conf for p in 0 8 16 32 64 128 256 512 1024 2048 4096; do pgbench -i --partitions=$p > /dev/null 2>&1; echo -ne "$p\t"; pgbench -n -S -T10 -Mprepared | grep tps; done Master 0 tps = 23600.068719 (without initial connection time) 8 tps = 22548.439906 (without initial connection time) 16 tps = 22807.337363 (without initial connection time) 32 tps = 22837.789996 (without initial connection time) 64 tps = 22915.846820 (without initial connection time) 128 tps = 22958.472655 (without initial connection time) 256 tps = 22432.432730 (without initial connection time) 512 tps = 20327.618690 (without initial connection time) 1024 tps = 20554.932475 (without initial connection time) 2048 tps = 19947.061061 (without initial connection time) 4096 tps = 17294.369829 (without initial connection time) Patched 0 tps = 23869.906654 (without initial connection time) 8 tps = 22682.498914 (without initial connection time) 16 tps = 22714.445711 (without initial connection time) 32 tps = 21653.589371 (without initial connection time) 64 tps = 20571.267545 (without initial connection time) 128 tps = 17138.088269 (without initial connection time) 256 tps = 13027.168426 (without initial connection time) 512 tps = 8689.486966 (without initial connection time) 1024 tps = 5450.525617 (without initial connection time) 2048 tps = 3034.383108 (without initial connection time) 4096 tps = 1560.110609 (without initial connection time) Tabular format (+ve pct_change means patched better) partitions master patched pct_change ---------------------------------------------------- 0 23869.91 23600.07 -1.1% 8 22682.50 22548.44 -0.6% 16 22714.45 22807.34 +0.4% 32 21653.59 22837.79 +5.5% 64 20571.27 22915.85 +11.4% 128 17138.09 22958.47 +34.0% 256 13027.17 22432.43 +72.2% 512 8689.49 20327.62 +133.9% 1024 5450.53 20554.93 +277.1% 2048 3034.38 19947.06 +557.4% 4096 1560.11 17294.37 +1008.5% I also did some runs for custom plans. The custom plan path should behave about the same on master and patched since the early ExecutorPrep() business only applies to generic plan reuse cases. echo "plan_cache_mode = force_custom_plan" >> $PGDATA/postgresql.conf for p in 0 8 16 32 64 128 256 512 1024 2048 4096; do pgbench -i --partitions=$p > /dev/null 2>&1; echo -ne "$p\t"; pgbench -n -S -T10 -Mprepared | grep tps; done Master pgbench -n -S -T10 -Mprepared | grep tps; done 0 tps = 22346.419557 (without initial connection time) 8 tps = 20959.115560 (without initial connection time) 16 tps = 21390.573290 (without initial connection time) 32 tps = 21358.292393 (without initial connection time) 64 tps = 21288.742635 (without initial connection time) 128 tps = 21167.721447 (without initial connection time) 256 tps = 21256.618661 (without initial connection time) 512 tps = 19401.261197 (without initial connection time) 1024 tps = 19169.135145 (without initial connection time) 2048 tps = 19504.102179 (without initial connection time) 4096 tps = 18880.855783 (without initial connection time) Patched 0 tps = 22852.634752 (without initial connection time) 8 tps = 21596.432690 (without initial connection time) 16 tps = 21428.779996 (without initial connection time) 32 tps = 20629.225272 (without initial connection time) 64 tps = 21301.644733 (without initial connection time) 128 tps = 21098.543942 (without initial connection time) 256 tps = 21394.364662 (without initial connection time) 512 tps = 19475.152170 (without initial connection time) 1024 tps = 19585.768438 (without initial connection time) 2048 tps = 19810.211969 (without initial connection time) 4096 tps = 19160.981608 (without initial connection time) In tabular format: partitions master patched pct_change ---------------------------------------------------- 0 22346.42 22852.63 +2.3% 8 20959.12 21596.43 +3.0% 16 21390.57 21428.78 +0.2% 32 21358.29 20629.23 -3.4% 64 21288.74 21301.64 +0.1% 128 21167.72 21098.54 -0.3% 256 21256.62 21394.36 +0.6% 512 19401.26 19475.15 +0.4% 1024 19169.14 19585.77 +2.2% 2048 19504.10 19810.21 +1.6% 4096 18880.86 19160.98 +1.5% Numbers look within noise range as expected. -- Thanks, Amit Langote
Вложения
- v2-0005-Make-SQL-function-executor-track-ExecutorPrep-sta.patch
- v2-0001-Refactor-partition-pruning-initialization-for-cla.patch
- v2-0004-Use-pruning-aware-locking-in-cached-plans.patch
- v2-0003-Reuse-partition-pruning-results-in-parallel-worke.patch
- v2-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch
В списке pgsql-hackers по дате отправления: