It seems that we need to improve estimate of distinct values in estimate_num_groups() when taking the selectivity of restrictions into account.
In 84f9a35e3 we changed to a new formula to perform such estimation. But that does not apply to the case here, because for an appendrel, set_append_rel_size() always sets "raw tuples" count equal to "rows", and that would make estimate_num_groups() skip the adjustment of the estimate using the new formula.
I'm wondering why we set the appendrel's 'tuples' equal to its 'rows'. Why don't we set it to the accumulated estimate of tuples from each live child, like attached? I believe this aligns more closely with reality.
And this would also allow us to adjust the estimate for the number of distinct values in estimate_num_groups() for appendrels using the new formula introduced in 84f9a35e3. As I experimented, this can improve the estimate for appendrels. For instance,
create table t (a int, b int, c float) partition by range(a); create table tp1 partition of t for values from (0) to (1000); create table tp2 partition of t for values from (1000) to (2000);
insert into t select i%2000, (100000 * random())::int, random() from generate_series(1,1000000) i; analyze t;
explain analyze select b from t where c < 0.1 group by b;
With the patch the estimate for the number of distinct 'b' values is more accurate.
BTW, this patch does not change any existing regression test results. I attempted to devise a regression test that shows how this change can improve query plans, but failed. Should I try harder to find such a test case?