Обсуждение: Qual push down to table AM
Hi, Please find attached a patch set proposal intended to implement WHERE clauses (qual) push down to the underlying table AM during table/sequential scan execution. The primary goal of this project is to convert quals to ScanKeys and pass them to the table AMs. Table AMs are then allowed to apply early tuple filtering during table (sequential) scans. Applying filtering at the table storage level is something necessary for non row-oriented table storage like columnar storage. Index organized table is another table storage that would need quals push down. AFAIK, CustomScan is the one and only way to go for having table scan using quals pushed down, but each table AM must implement its own mechanism. IMHO, having this feature available in core would help the development of new table AMs. About Heap, some performance testing (detailed at the end of this message) shows between 45% and 60% improvement in seq scan execution time when only one tuple is returned from the table. Only a few expressions are supported: OpExpr (<key> <operator> <value>), ScalarArrayOpExpr (<key> <operator> ANY|ALL(ARRAY[...]), and NullTest. Row comparison is not yet supported as this part is still not clear to me. On the right part of the expression, we support: constant, variable, function call, and subquery (InitPlan only). In terms of security, we check if the function related to the operator is not user defined: only functions from the catalog are supported. We also check that the function is "leakproof". Pushing down quals does not guaranty to the executor that the tuples returned during table scan satisfy a qual, as we don't know if the table AM (potentially implemented via an extension) has applied tuple filtering. In order to ensure to produce the right response to the where clause, pushed down quals are executed twice per tuple returned: once by the table AM, and once by the executor. This produces a performance regression (15-17%) where almost the entire table is returned (see perf. test results at the end of the message). This could be optimized by flagging the tuples filtered by the table AM, this way we could avoid the re-execution of the pushed down quals. Details about the patch files v1-0001-Pass-the-number-of-ScanKeys-to-scan_rescan.patch: This patch adds the number of ScanKeys passed via scan_rescan() as a new argument. The number of ScanKeys was only passed to the table AM via begin_scan(), but not in scan_rescan(). v1-0002-Simple-quals-push-down-to-table-AMs.patch: Core of the feature, this patch adds qual push down support for OpExpr expressions. v1-0003-Add-the-table-reloption-quals_push_down.patch: Adds a new reloption: quals_push_down used to enable/disable qual push down for a table. Disabled by default. v1-0004-Add-tests-for-quals-push-down-to-table-AM.patch: Regression tests. v1-0005-Push-down-IN-NOT-IN-array-quals-to-table-AMs.patch: ScalarArrayOpExpr support. v1-0006-Push-down-IS-IS-NOT-NULL-quals-to-table-AMs.patch: NullTest support. Performance testing Head: CREATE TABLE t (i INTEGER); Patch: CREATE TABLE t (i INTEGER) WITH (quals_push_down = on); n=1M: INSERT INTO t SELECT generate_series(1, 1000000); VACUUM t; n=10M: TRUNCATE t; INSERT INTO t SELECT generate_series(1, 10000000); VACUUM t; n=100M: TRUNCATE t; INSERT INTO t SELECT generate_series(1, 100000000); VACUUM t; Case #1: SELECT COUNT(*) FROM t WHERE i = 50000; | n=1M | n=10M | n=100M +--------+--------+---------+---------+----------+--------- | Head | Patch | Head | Patch | Head | Patch --------+--------+--------+---------+---------+----------+--------- Test #1 | 38.903 | 21.308 | 365.707 | 155.429 | 3939.937 | 1564.182 Test #2 | 39.239 | 21.271 | 364.206 | 153.127 | 3872.370 | 1527.988 Test #3 | 39.015 | 21.958 | 365.434 | 154.498 | 3812.382 | 1525.535 --------+--------+--------+---------+---------+----------+--------- --------+--------+--------+---------+---------+----------+--------- Average | 39.052 | 21.512 | 365.116 | 154.351 | 3874.896 | 1539.235 Std dev | 0.171 | 0.386 | 0.800 | 1.158 | 63.815 | 21.640 --------+--------+--------+---------+---------+----------+--------- Gain | 44.91% | 57.73% | 60.28% Case #2: SELECT COUNT(*) FROM t WHERE i >= 2; | n=1M | n=10M | n=100M +--------+--------+---------+---------+----------+--------- | Head | Patch | Head | Patch | Head | Patch --------+--------+--------+---------+---------+----------+--------- Test #1 | 68.422 | 81.233 | 674.397 | 778.427 | 6845.165 | 8071.627 Test #2 | 69.237 | 80.868 | 682.976 | 774.417 | 6533.091 | 7668.477 Test #3 | 69.579 | 80.418 | 676.072 | 791.465 | 6917.964 | 7916.182 --------+--------+--------+---------+---------+----------+--------- --------+--------+--------+---------+---------+----------+--------- Average | 69.079 | 80.840 | 677.815 | 781.436 | 6765.407 | 7885.429 Std dev | 0.594 | 0.408 | 4.547 | 8.914 | 204.457 | 203.327 --------+--------+--------+---------+---------+----------+--------- Gain | -17.02% | -15.29% | -16.56% Thoughts? Best regards, -- Julien Tachoires
Вложения
- v1-0001-Pass-the-number-of-ScanKeys-to-scan_rescan.patch
- v1-0002-Simple-quals-push-down-to-table-AMs.patch
- v1-0003-Add-the-table-reloption-quals_push_down.patch
- v1-0004-Add-tests-for-quals-push-down-to-table-AM.patch
- v1-0005-Push-down-IN-NOT-IN-array-quals-to-table-AMs.patch
- v1-0006-Push-down-IS-IS-NOT-NULL-quals-to-table-AMs.patch
On 2025-08-27 22:27:37 +0200, Julien Tachoires wrote: > Please find attached a patch set proposal intended to implement WHERE > clauses (qual) push down to the underlying table AM during > table/sequential scan execution. > > The primary goal of this project is to convert quals to ScanKeys and > pass them to the table AMs. Table AMs are then allowed to apply early > tuple filtering during table (sequential) scans. Applying filtering at > the table storage level is something necessary for non row-oriented > table storage like columnar storage. Index organized table is another > table storage that would need quals push down. > > AFAIK, CustomScan is the one and only way to go for having table scan > using quals pushed down, but each table AM must implement its own > mechanism. IMHO, having this feature available in core would help the > development of new table AMs. About Heap, some performance testing > (detailed at the end of this message) shows between 45% and 60% > improvement in seq scan execution time when only one tuple is returned > from the table. One problem with doing that in the case of heapam is that you're evaluating scan keys with the buffer lock held - with basically arbitrary expressions being evaluated. That's an easy path to undetected deadlocks. You'd have to redesign the relevant mechanism to filter outside of the lock... Greetings, Andres Freund
On Thu, 28 Aug 2025 at 01:27, Julien Tachoires <julien@tachoires.me> wrote: > > Hi, > > Please find attached a patch set proposal intended to implement WHERE > clauses (qual) push down to the underlying table AM during > table/sequential scan execution. > > The primary goal of this project is to convert quals to ScanKeys and > pass them to the table AMs. Table AMs are then allowed to apply early > tuple filtering during table (sequential) scans. Applying filtering at > the table storage level is something necessary for non row-oriented > table storage like columnar storage. Index organized table is another > table storage that would need quals push down. > > AFAIK, CustomScan is the one and only way to go for having table scan > using quals pushed down, but each table AM must implement its own > mechanism. IMHO, having this feature available in core would help the > development of new table AMs. About Heap, some performance testing > (detailed at the end of this message) shows between 45% and 60% > improvement in seq scan execution time when only one tuple is returned > from the table. > > Only a few expressions are supported: OpExpr (<key> <operator> <value>), > ScalarArrayOpExpr (<key> <operator> ANY|ALL(ARRAY[...]), and NullTest. > Row comparison is not yet supported as this part is still not clear to > me. On the right part of the expression, we support: constant, variable, > function call, and subquery (InitPlan only). > > In terms of security, we check if the function related to the operator > is not user defined: only functions from the catalog are supported. We > also check that the function is "leakproof". > > Pushing down quals does not guaranty to the executor that the tuples > returned during table scan satisfy a qual, as we don't know if the table > AM (potentially implemented via an extension) has applied tuple > filtering. In order to ensure to produce the right response to the where > clause, pushed down quals are executed twice per tuple returned: once by > the table AM, and once by the executor. This produces a performance > regression (15-17%) where almost the entire table is returned (see perf. > test results at the end of the message). This could be optimized by > flagging the tuples filtered by the table AM, this way we could avoid > the re-execution of the pushed down quals. > > > Details about the patch files > > v1-0001-Pass-the-number-of-ScanKeys-to-scan_rescan.patch: This patch > adds the number of ScanKeys passed via scan_rescan() as a new argument. > The number of ScanKeys was only passed to the table AM via begin_scan(), > but not in scan_rescan(). > > v1-0002-Simple-quals-push-down-to-table-AMs.patch: Core of the feature, > this patch adds qual push down support for OpExpr expressions. > > v1-0003-Add-the-table-reloption-quals_push_down.patch: Adds a new > reloption: quals_push_down used to enable/disable qual push down for a > table. Disabled by default. > > v1-0004-Add-tests-for-quals-push-down-to-table-AM.patch: Regression > tests. > > v1-0005-Push-down-IN-NOT-IN-array-quals-to-table-AMs.patch: > ScalarArrayOpExpr support. > > v1-0006-Push-down-IS-IS-NOT-NULL-quals-to-table-AMs.patch: NullTest > support. > > > Performance testing > > Head: > CREATE TABLE t (i INTEGER); > > Patch: > CREATE TABLE t (i INTEGER) WITH (quals_push_down = on); > > n=1M: > INSERT INTO t SELECT generate_series(1, 1000000); > VACUUM t; > > n=10M: > TRUNCATE t; > INSERT INTO t SELECT generate_series(1, 10000000); > VACUUM t; > > n=100M: > TRUNCATE t; > INSERT INTO t SELECT generate_series(1, 100000000); > VACUUM t; > > > Case #1: SELECT COUNT(*) FROM t WHERE i = 50000; > > | n=1M | n=10M | n=100M > +--------+--------+---------+---------+----------+--------- > | Head | Patch | Head | Patch | Head | Patch > --------+--------+--------+---------+---------+----------+--------- > Test #1 | 38.903 | 21.308 | 365.707 | 155.429 | 3939.937 | 1564.182 > Test #2 | 39.239 | 21.271 | 364.206 | 153.127 | 3872.370 | 1527.988 > Test #3 | 39.015 | 21.958 | 365.434 | 154.498 | 3812.382 | 1525.535 > --------+--------+--------+---------+---------+----------+--------- > > --------+--------+--------+---------+---------+----------+--------- > Average | 39.052 | 21.512 | 365.116 | 154.351 | 3874.896 | 1539.235 > Std dev | 0.171 | 0.386 | 0.800 | 1.158 | 63.815 | 21.640 > --------+--------+--------+---------+---------+----------+--------- > Gain | 44.91% | 57.73% | 60.28% > > > Case #2: SELECT COUNT(*) FROM t WHERE i >= 2; > > | n=1M | n=10M | n=100M > +--------+--------+---------+---------+----------+--------- > | Head | Patch | Head | Patch | Head | Patch > --------+--------+--------+---------+---------+----------+--------- > Test #1 | 68.422 | 81.233 | 674.397 | 778.427 | 6845.165 | 8071.627 > Test #2 | 69.237 | 80.868 | 682.976 | 774.417 | 6533.091 | 7668.477 > Test #3 | 69.579 | 80.418 | 676.072 | 791.465 | 6917.964 | 7916.182 > --------+--------+--------+---------+---------+----------+--------- > > --------+--------+--------+---------+---------+----------+--------- > Average | 69.079 | 80.840 | 677.815 | 781.436 | 6765.407 | 7885.429 > Std dev | 0.594 | 0.408 | 4.547 | 8.914 | 204.457 | 203.327 > --------+--------+--------+---------+---------+----------+--------- > Gain | -17.02% | -15.29% | -16.56% > > > Thoughts? > > Best regards, > > -- > Julien Tachoires Hi! I was also always wondering if something like quals pushing can be implemented in Postgres. It is indeed very beneficial for Column-based processing in MPP databases, Greenplum and Cloudberry to name a few. I did my own micro-research a while ago (while working on some Cloudberry features), so here are my thoughts on the subject. What this patchset is doing, is passing ScanKeys directly to tableam somewhat blindly. In speedups processing execution-phase. While I do not have strong objections against this approach, I suspect this breaks some layers of abstractions and *silent* (or maybe documented) agreements of what are responsibilities of TableAM functions. So, passing ScanKeys directly to TAM is used on HEAD for catalog-access only. Correct me if I'm wrong. For all other types of relation each query is planned, which includes (1) building data access patch thought various data access methods (indexes) (2) Decide for each Qual which indexes can be used to satisfy this qual (3) Using Cost Model for filtering best options All of this can not be done with your approach? Cost model can give hints to the optimizer that this TAM will process some qual much faster than any by-index access. Smart cost model/optimizer can realise that selecting only few of all attributes from column-orietired relation + filter when using SIMD etc can be really cheap. So maybe the good shape of this patch would be something that could choose between seqscan and indexscan in planner time? -- Best regards, Kirill Reshke
Hi, On Wed, Aug 27, 2025 at 05:50:01PM -0400, Andres Freund wrote: > On 2025-08-27 22:27:37 +0200, Julien Tachoires wrote: > > Please find attached a patch set proposal intended to implement WHERE > > clauses (qual) push down to the underlying table AM during > > table/sequential scan execution. > > > > The primary goal of this project is to convert quals to ScanKeys and > > pass them to the table AMs. Table AMs are then allowed to apply early > > tuple filtering during table (sequential) scans. Applying filtering at > > the table storage level is something necessary for non row-oriented > > table storage like columnar storage. Index organized table is another > > table storage that would need quals push down. > > > > AFAIK, CustomScan is the one and only way to go for having table scan > > using quals pushed down, but each table AM must implement its own > > mechanism. IMHO, having this feature available in core would help the > > development of new table AMs. About Heap, some performance testing > > (detailed at the end of this message) shows between 45% and 60% > > improvement in seq scan execution time when only one tuple is returned > > from the table. > > One problem with doing that in the case of heapam is that you're evaluating > scan keys with the buffer lock held - with basically arbitrary expressions > being evaluated. That's an easy path to undetected deadlocks. You'd have to > redesign the relevant mechanism to filter outside of the lock... Thank you for this quick feedback. One potential approach to solve this in heapgettup() would be: 1. hold the buffer lock 2. get the tuple from the buffer 3. if the tuple is not visible, move to the next tuple, back to 2. 4. release the buffer lock 5. if the tuple does not satisfy the scan keys, take the buffer lock, move to the next tuple, back to 2. 6. return the tuple Do you see something fundamentally wrong here? In practice, I might be wrong, but I think this problem affects heapgettup() only, heapgettup_pagemode() does not hold the buffer lock when HeapKeyTest() is called. To reach this problematic part we need two conditions: non-NULL ScanKey array and not to be in the page-at-a-time mode (pagemode). The only table_beginscan_something() function able to meet these conditions before calling the scan_begin() API is table_beginscan_sampling(): the caller can choose to use pagemode. The only place where table_beginscan_sampling() is called is in tablesample_init(), in this case the ScanKey array is NULL, so we cannot reach the problematic part of heapgettup(). There is only one other case where we disable the pagemode in heapam: when the snapshot is non-MVCC. Do you know any other code path/scenario I missed that can lead to reach this problematic part? Best regards, -- Julien Tachoires
Hi, On Thu, Aug 28, 2025 at 02:57:02AM +0500, Kirill Reshke wrote: > Hi! > I was also always wondering if something like quals pushing can be > implemented in Postgres. It is indeed very beneficial for Column-based > processing in MPP databases, Greenplum and Cloudberry to name a few. I > did my own micro-research a while ago (while working on some > Cloudberry features), so here are my thoughts on the subject. > > What this patchset is doing, is passing ScanKeys directly to tableam > somewhat blindly. In speedups processing execution-phase. While I do > not have strong objections against this approach, I suspect this > breaks some layers of abstractions and *silent* (or maybe documented) > agreements of what are responsibilities of TableAM functions. So, > passing ScanKeys directly to TAM is used on HEAD for catalog-access > only. Correct me if I'm wrong. For all other types of relation each > query is planned, which includes > > (1) building data access patch thought various data access methods (indexes) > (2) Decide for each Qual which indexes can be used to satisfy this qual > (3) Using Cost Model for filtering best options > > All of this can not be done with your approach? > > Cost model can give hints to the optimizer that this TAM will process > some qual much faster than any by-index access. Smart cost > model/optimizer can realise that selecting only few of all attributes > from column-orietired relation + filter when using SIMD etc can be > really cheap. > > So maybe the good shape of this patch would be something that could > choose between seqscan and indexscan in planner time? Thank you for your quick feed back. Exact, this patch does not add/reduce any cost when some quals are planned to be pushed down. I agree with you that it would be nice (necessary?) to have this. I think the table AM API should provide, via new APIs, cost estimation in case of table scan and considering the cost of evaluating the quals, if any. Best regards, -- Julien Tachoires