Обсуждение: Have the planner convert COUNT(1) / COUNT(not_null_col) to COUNT(*)
Since e2debb643, we've had the ability to determine if a column is NULLable as early as during constant folding. This seems like a good time to consider converting COUNT(not_null_col) into COUNT(*), which is faster and may result in far fewer columns being deformed from the tuple. To make this work, I invented "SupportRequestSimplifyAggref", which is similar to the existing SupportRequestSimplify, which is for FuncExprs. Aggregates use Aggrefs, so we need something else. It's easy to see that count(*) is faster. Here's a quick test in an unpatched master: create table t (a int, b int, c int, d int, e int, f int, g int, h int not null); insert into t (h) select 1 from generate_Series(1,1000000); vacuum freeze t; master: select count(h) from t; Time: 16.442 ms Time: 16.255 ms Time: 16.322 ms master: select count(*) from t; Time: 12.203 ms Time: 11.402 ms Time: 12.054 ms (+37%) With the patch applied, both queries will perform the same. It may be possible to apply transformations to other aggregate functions too, but I don't want to discuss that here. I mostly want to determine if the infrastructure is ok and do the count(*) one because it seems like the most likely one to be useful. One thing I wasn't too sure about was if we should make it possible for the support function to return something that's not an Aggref. In theory, something like COUNT(NULL) could just return '0'::bigint. While that does seem an optimisation that wouldn't be applied very often, I have opted to leave it so that such an optimisation *could* be done by the support function. I also happen to test that that doesn't entirely break the query, as ordinarily it would if we didn't have Query.hasAggs (It's not too dissimilar to removing unused columns from a subquery) Should we do this? David
Вложения
It may be possible to apply transformations to other aggregate
functions too, but I don't want to discuss that here. I mostly want to
determine if the infrastructure is ok and do the count(*) one because
it seems like the most likely one to be useful.
One thing I wasn't too sure about was if we should make it possible
for the support function to return something that's not an Aggref. In
theory, something like COUNT(NULL) could just return '0'::bigint.
While that does seem an optimisation that wouldn't be applied very
often, I have opted to leave it so that such an optimisation *could*
be done by the support function. I also happen to test that that
doesn't entirely break the query, as ordinarily it would if we didn't
have Query.hasAggs (It's not too dissimilar to removing unused columns
from a subquery)
Should we do this?
David
+1
Automatic query improvements are a Good Thing. We're never going to educate everyone that COUNT(1) is an anti-pattern, so it's easier to make it not an anti-pattern.
I'm in favor of the COUNT(NULL) optimization as well, as one of my favorite programming tropes is "There is nothing faster than nothing".
The check seems lightweight enough to me. Applies clean and tests pass. Test coverage seems to cover all the cases.
On Thu, 30 Oct 2025 at 16:40, Corey Huinker <corey.huinker@gmail.com> wrote: > I'm in favor of the COUNT(NULL) optimization as well, as one of my favorite programming tropes is "There is nothing fasterthan nothing". I think it would be much more interesting to do that if it could be detected in cases like: SELECT COUNT(col) FROM table WHERE col IS NULL; which might be a more realistic thing if the query without the WHERE clause was part of a VIEW. However, we don't currently have any infrastructure to detect when a column *is* NULL. There's only the opposite with expr_is_nonnullable() or var_is_nonnullable(). This does make me wonder if constant-folding is too early to do this transformation, as it currently happens before add_base_clause_to_rel() therefore we can't really transform cases such as: SELECT count(nullable_col) FROM t WHERE nullable_col IS NOT NULL; There might be a better spot in planning to do this at a point after add_base_clause_to_rel() is called. It just has to happen before the search for Aggrefs with the same aggtransfn in preprocess_aggref() as it's too late to swap aggregate functions around when they've already been grouped together with other aggs that can share the same transition state. I'm just subtly aware about Tom's complaints with the restriction_is_always_true() code as he thought it should go into the constant folding code, where it mostly now is, per Richard's work to put it there. > The check seems lightweight enough to me. Applies clean and tests pass. Test coverage seems to cover all the cases. Thanks for having a look and testing. I've attached a very slightly revised version of the patch. I became aware of a function named get_func_support(), which can be used rather than fetching the pg_proc tuple from SysCache, which I was doing in v1. No other changes. David
Вложения
which might be a more realistic thing if the query without the WHERE
clause was part of a VIEW. However, we don't currently have any
infrastructure to detect when a column *is* NULL. There's only the
opposite with expr_is_nonnullable() or var_is_nonnullable().
But we'd still catch NULL constants, yes?
I've attached a very slightly revised version of the patch. I became
aware of a function named get_func_support(), which can be used rather
than fetching the pg_proc tuple from SysCache, which I was doing in
v1. No other changes.
David
The change is a big win for clarity. Applies clean. Passes.
Re: Have the planner convert COUNT(1) / COUNT(not_null_col) to COUNT(*)
От
"Matheus Alcantara"
Дата:
Hi, thanks for working on this! On Thu Oct 30, 2025 at 1:20 AM -03, David Rowley wrote: > I've attached a very slightly revised version of the patch. I became > aware of a function named get_func_support(), which can be used rather > than fetching the pg_proc tuple from SysCache, which I was doing in > v1. No other changes. > I looked the code and it seems to be in a good shape, but I tried to apply the v2 on top of e7ccb247b38 in master to run some tests and a rebase is necessary. git am v2-0001-Have-the-planner-replace-COUNT-ANY-with-COUNT-whe.patch Applying: Have the planner replace COUNT(ANY) with COUNT(*), when possible error: patch failed: contrib/postgres_fdw/expected/postgres_fdw.out:2975 error: contrib/postgres_fdw/expected/postgres_fdw.out: patch does not apply error: patch failed: src/test/regress/expected/aggregates.out:1219 error: src/test/regress/expected/aggregates.out: patch does not apply Patch failed at 0001 Have the planner replace COUNT(ANY) with COUNT(*), when possible -- Matheus Alcantara
On Tue, 4 Nov 2025 at 09:38, Matheus Alcantara <matheusssilv97@gmail.com> wrote: > I looked the code and it seems to be in a good shape, but I tried to > apply the v2 on top of e7ccb247b38 in master to run some tests and a > rebase is necessary. Are you sure you've not got something else in your branch? It applies ok here, and the CFbot isn't complaining either. CFBot's is based on cf8be0225, which is 2 commits before the one you're trying, but src/test/regress/expected/aggregates.out hasn't been changed since 2025-10-07. David
On Sat, 1 Nov 2025 at 14:19, Corey Huinker <corey.huinker@gmail.com> wrote: >> >> which might be a more realistic thing if the query without the WHERE >> clause was part of a VIEW. However, we don't currently have any >> infrastructure to detect when a column *is* NULL. There's only the >> opposite with expr_is_nonnullable() or var_is_nonnullable(). > > But we'd still catch NULL constants, yes? Yes. It could. I've left that part of the patch #ifdef'd out. I wasn't planning on using it. I just left it there as an example for if someone wanted to test it. > The change is a big win for clarity. Applies clean. Passes. Thanks for checking. David