Обсуждение: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column

Поиск

Список

Период

Сортировка

Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column

От

Mark Frost

Дата:

05 июня, 18:42:00

Hello all,

We’re seeing intermittently very poor performance of a query, when occasionally a poor query plan is chosen. We’re using Postgres 16.9.

One suspicious factor when looking at the EXPLAIN ANALYZE output, is a very wrong estimated number of rows to be returned from a text[] column queried with ‘&&’.

After playing around with a simple recreate (details below), it seems ANALYZE of the table is affected by the number of rows in the table. Statistic `most_common_elems` is [null] when there’s over 15,873 rows in the table when analyzed. With fewer rows it’s analyzed correctly.

Is there any good explanation for this behaviour? Preferably we’d like some way for proper `most_common_elems` statistics to be collected in our production database, in the hope that influences a good query plan to always be selected.

In our production system there’s ~150,000 rows in a table including a `text[]` column, where each row has an array containing a single 19ish char string, unique within the table. The full query joins against a couple more tables, and has a GIN index on the text[] column. If necessary, I can get into details of the real system, but hope the simple recreate will be sufficient to understand the problem:

CREATE TABLE IF NOT EXISTS public.test(

id SERIAL PRIMARY KEY,

tags text[]

)

INSERT INTO public.test (tags)

SELECT ARRAY[TO_CHAR(n,'fm00000000')] FROM ( SELECT generate_series(1,15_873) AS n );

ANALYZE public.test;

SELECT * FROM pg_stat_user_tables WHERE relname = 'test';

EXPLAIN (ANALYZE,BUFFERS,VERBOSE)

SELECT * FROM test WHERE tags && ARRAY['00000002']

Results

-------

table with 15_000 rows has most_common_elems after ANALYZE (most_common_elem_freqs : 6.666667e-05)

table with 15_872 rows has most_common_elems after ANALYZE (most_common_elem_freqs : 6.300403e-05)

table with 15_873 rows has [null] most_common_elems after ANALYZE

table with 100_000 rows has [null] most_common_elems after ANALYZE

Query plans show an estimated 1 row is predicted when statistics has `most_common_elems` available, or the hardcoded default 1/200 of the estimated table size when most_common_elems is null.

Here 79 rows are estimated, when the table contained 15,873 rows and stats weren’t available.

Query plan

-----------

Seq Scan on public.test (cost=0.00..463.41 rows=79 width=37) (actual time=9.934..17.190 rows=1 loops=1)

Output: id, tags

Filter: (test.tags && '{00000002}'::text[])

Rows Removed by Filter: 15872

Buffers: shared hit=268

Planning:

Buffers: shared hit=75

Planning Time: 2.060 ms

Execution Time: 17.205 ms

Full version

------------

"PostgreSQL 16.9 (Debian 16.9-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit"

Regards,

Mark Frost

IBM

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: Building C, IBM Hursley Office, Hursley Park Road, Winchester, Hampshire SO21 2JN

Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column

От

Frédéric Yhuel

Дата:

05 июня, 20:09:06

On 6/5/25 17:42, Mark Frost wrote:
> Is there any good explanation for this behaviour? Preferably we’d like 
> some way for proper `most_common_elems` statistics to be collected in 
> our production database, in the hope that influences a good query plan 
> to always be selected.

most_common_elems has a limited size, and if all the elements have the 
same freq, there's nothing we can do.

You could do: alter table test alter column tags set statistics X;

However, X is capped at 10000, which means that the size of 
most_common_elems will be less than 100k, and it would probably be 
stupid to go beyond that anyway.

It seems that postgres lacks some kind of "n_distinct_elems" for that 
kind of case, but let's wait and see what the statistics gurus think.

Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column

От

Tom Lane

Дата:

05 июня, 21:57:10

Mark Frost <FROSTMAR@uk.ibm.com> writes:
> We're seeing intermittently very poor performance of a query, when occasionally a poor query plan is chosen. We're
usingPostgres 16.9.
> One suspicious factor when looking at the EXPLAIN ANALYZE output, is a very wrong estimated number of rows to be
returnedfrom a text[] column queried with '&&'.
> After playing around with a simple recreate (details below), it seems ANALYZE of the table is affected by the number
ofrows in the table. Statistic `most_common_elems` is [null] when there's over 15,873 rows in the table when analyzed.
Withfewer rows it's analyzed correctly.

Thanks for the report. Poking through the code, it seems like
there are two distinct problems here:

1. ANALYZE uses a "lossy counting" algorithm (dating to commit
0e5e167aa) to estimate the frequencies of array element values.
The part of that that seems to be going off the rails is
this selection of a cutoff frequency below which element values
will be dropped:

/*
* Construct an array of the interesting hashtable items, that is,
* those meeting the cutoff frequency (s - epsilon)*N. Also identify
* the minimum and maximum frequencies among these items.
*
* Since epsilon = s/10 and bucket_width = 1/epsilon, the cutoff
* frequency is 9*N / bucket_width.
*/
cutoff_freq = 9 * element_no / bucket_width;

The first thing I find suspicious here is that the calculation is
based on element_no (the total number of array elements processed)
and not nonnull_cnt (the maximum possible frequency). Is that
really right? It wouldn't change the results in your reproducer
with just one element per array, but it seems bogus.

More relevant to your immediate problem, this creates a behavioral
cliff at the point where cutoff_freq rises from 0 to 1, which with the
default attstattarget turns out to be, you guessed it, 15873 elements.
In your example, all the element values have frequency 1, so that
switches us from being willing to record all the values to being
willing to record none of them. That doesn't feel right either.
By analogy to our treatment of regular MCVs, it seems like we should
never be willing to store values that we didn't see at least twice.
Of course, doing that would make this example worse, because then
we'd not store any "most common" elements at smaller rowcounts either.
Which brings us to the other major problem this reveals:

2. In array_selfuncs.c, we fall back to default selectivity
estimates if there's no MCELEM statistics field. What we
should be doing, if there are other stats for the column but
not MCELEM, is realizing that compute_array_stats did not find
any elements that are common enough to be worth recording.
Then we'd use some much lower-than-default estimate for the
selectivity.

I don't have any immediate patch to offer, but clearly this
area could use another look.

compute_array_stats seems to have borrowed the lossy-counting
algorithm from ts_typanalyze, so we'd better take a look at
that too.

regards, tom lane

Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column

От

Tom Lane

Дата:

06 июня, 00:52:53

I wrote:
> The part of that that seems to be going off the rails is
> this selection of a cutoff frequency below which element values
> will be dropped:

>         cutoff_freq = 9 * element_no / bucket_width;

> The first thing I find suspicious here is that the calculation is
> based on element_no (the total number of array elements processed)
> and not nonnull_cnt (the maximum possible frequency).  Is that
> really right?

I did some more digging and found that that calculation was introduced
(in the older tsvector code) in bc0f08092, which traces to this
discussion:

https://www.postgresql.org/message-id/flat/4BF4357E.6000505%40krogh.cc

So the use of element_no is correct, because what we need to consider
here is the total number of values fed to the LC algorithm.

Also, my thought that maybe we should reject entries with f < 2
is bogus, because at the end of the algorithm f is not necessarily
the true count of occurrences of the value: some early occurrences
could have been forgotten via pruning.  The "behavioral cliff" is
annoying but I'm not sure there is much to be done about it: having
a single (still-remembered) occurrence gets less and less significant
as the total input size increases, so sooner or later you are going
to hit a point where such values should be thrown away.

So at this point I'm thinking that there is nothing wrong with
ANALYZE's algorithm, although I now see that there are some relevant
comments in ts_typanalyze.c that probably ought to be transposed into
array_typanalyze.c.

The idea of treating lack of MCELEM differently from complete
lack of stats still seems to have merit, though.

            regards, tom lane

Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column

От

Frédéric Yhuel

Дата:

06 июня, 11:07:38


On 6/5/25 23:52, Tom Lane wrote:
> The idea of treating lack of MCELEM differently from complete
> lack of stats still seems to have merit, though.


Couldn't we count / estimate the number of distinct two-by-two elements, 
and use that instead of the default selectivity estimate?

RE: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column

От

Mark Frost

Дата:

06 июня, 12:21:34

> On 6/5/25 17:42, Mark Frost wrote:
> > Is there any good explanation for this behaviour? Preferably we’d like
> > some way for proper `most_common_elems` statistics to be collected in
> > our production database, in the hope that influences a good query plan
> > to always be selected.

> most_common_elems has a limited size, and if all the elements have the

> same freq, there's nothing we can do.

> You could do: alter table test alter column tags set statistics X;
> However, X is capped at 10000…

Actually *any* most_common_elems stats would be fine, because the reasoning is:

If the searched element is in most_common_elems we know it’s frequency
If it’s not, it’s less frequent than the least most_common_elems

So in our case when every row is unique, we’d only actually need stats to record a single most_common_elems (if only it would record one)

Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column

От

Tom Lane

Дата:

07 июня, 04:29:54

Mark Frost <FROSTMAR@uk.ibm.com> writes:
> Actually *any* most_common_elems stats would be fine, because the reasoning is:
>   *   If the searched element is in most_common_elems we know it's frequency
>   *   If it's not, it's less frequent than the least most_common_elems
> So in our case when every row is unique, we'd only actually need stats to record a single most_common_elems (if only
itwould record one) 

Well, we don't have a most common element in this scenario --- the
whole point is that the occurrence counts resulting from the lossy
counting algorithm are too low to be trustworthy.  However, what we
do have is the cutoff frequency, and it seems to me that we could use
that as the estimate of the maximum frequency of the non-MCEs.

Attached is an extremely crude prototype patch for that.  What it
does is to create an MCELEM stats entry with zero MCEs, but containing
min and max frequencies equal to the cutoff frequency (plus the nulls
frequency, which we know accurately in any case).  Mark, this fixes
your example case, but I wonder if it fixes your original problem ---
are you in a position to test it?

Assuming we like this direction, the main thing that makes this a hack
not a polished patch is that I had to strongarm the code into storing
a zero-length values array.  What update_attstats would normally do
is leave the values column of the MCELEM stats slot NULL, which then
causes get_attstatsslot to throw a that-shouldn't-be-null error.
An alternative answer is to change get_attstatsslot to allow a null,
but I'm not sure that that's any cleaner.  Either way it seems like
there's a hazard of breaking some code that isn't expecting the case.

An alternative that feels cleaner but a good bit more invasive
is to get rid of the convention of storing the min/max/nulls
frequencies as extra entries in the MCELEM numbers entry ---
which surely is a hack too --- and put them into some new slot
type.  I'm not sure if that's enough nicer to be worth the
conversion pain.

Thoughts?

            regards, tom lane

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 4fffb76e557..c312de7d593 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1733,7 +1733,10 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
         i = Anum_pg_statistic_stavalues1 - 1;
         for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
         {
-            if (stats->numvalues[k] > 0)
+            /* XXX ugly hack to allow zero-length MCELEM values arrays */
+            if (stats->numvalues[k] > 0 ||
+                (stats->numvalues[k] == 0 &&
+                 stats->stakind[k] == STATISTIC_KIND_MCELEM))
             {
                 ArrayType  *arry;

diff --git a/src/backend/utils/adt/array_typanalyze.c b/src/backend/utils/adt/array_typanalyze.c
index 6f61629b977..9c6dd2852ab 100644
--- a/src/backend/utils/adt/array_typanalyze.c
+++ b/src/backend/utils/adt/array_typanalyze.c
@@ -497,6 +497,15 @@ compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
          * If we obtained more elements than we really want, get rid of those
          * with least frequencies.  The easiest way is to qsort the array into
          * descending frequency order and truncate the array.
+         *
+         * On the other extreme, we might have found no elements with
+         * frequency above the cutoff.  Then there's nothing to store so far
+         * as element values go, but we still want to record the cutoff
+         * frequency, since array_selfuncs.c can use that as an upper bound on
+         * the frequency of non-MCVs.  (Note: what we store is the minimum
+         * frequency that would have been accepted as a valid MCE.  In
+         * particular, this ensures we don't create a bogus stats entry with
+         * frequency zero.)
          */
         if (num_mcelem < track_len)
         {
@@ -506,10 +515,14 @@ compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
             minfreq = sort_table[num_mcelem - 1]->frequency;
         }
         else
+        {
             num_mcelem = track_len;
+            if (track_len == 0)
+                minfreq = maxfreq = cutoff_freq + 1;
+        }

         /* Generate MCELEM slot entry */
-        if (num_mcelem > 0)
+        if (num_mcelem >= 0)
         {
             MemoryContext old_context;
             Datum       *mcelem_values;

Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column

От

Tom Lane

Дата:

19 июля, 00:54:05

I wrote:
> Well, we don't have a most common element in this scenario --- the
> whole point is that the occurrence counts resulting from the lossy
> counting algorithm are too low to be trustworthy.  However, what we
> do have is the cutoff frequency, and it seems to me that we could use
> that as the estimate of the maximum frequency of the non-MCEs.

Here's a less crude patch for that.  The array_typanalyze.c changes
are the same as before, but I reconsidered what to do about this
stumbling block:

> Assuming we like this direction, the main thing that makes this a hack
> not a polished patch is that I had to strongarm the code into storing
> a zero-length values array.  What update_attstats would normally do
> is leave the values column of the MCELEM stats slot NULL, which then
> causes get_attstatsslot to throw a that-shouldn't-be-null error.
> An alternative answer is to change get_attstatsslot to allow a null,
> but I'm not sure that that's any cleaner.  Either way it seems like
> there's a hazard of breaking some code that isn't expecting the case.

I concluded that letting get_attstatsslot accept nulls is too risky;
there is probably code that assumes that get_attstatsslot would've
rejected that.  Instead, this version changes update_attstats' rule
for when to store an array rather than a null.  Now it will do so
if the passed-in stavalues pointer isn't null, even if numvalues
is zero.  I think that this doesn't change the outcome for any
existing analyze routines because they wouldn't pass a data pointer
if they have no values; and even if it does, storing an empty array
not a null seems like it should be pretty harmless.

> An alternative that feels cleaner but a good bit more invasive
> is to get rid of the convention of storing the min/max/nulls
> frequencies as extra entries in the MCELEM numbers entry ---
> which surely is a hack too --- and put them into some new slot
> type.  I'm not sure if that's enough nicer to be worth the
> conversion pain.

A year ago I would have seriously considered doing it that way.
But now that we have code to dump-n-restore stats, that code would
have to be adjusted to convert the old representation.  It's not
worth it for this case.

Hence, v1 attached, now with a commit message.

            regards, tom lane

From ac1f22903594ce3613d615fe5fba09c77f216769 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri, 18 Jul 2025 17:43:21 -0400
Subject: [PATCH v1] Track the maximum possible frequency of non-MCE array
 elements.

The lossy-counting algorithm that ANALYZE uses to identify
most-common array elements may decide that none of the values
are common enough to be called MCEs.  In such cases we stored
nothing in the STATISTIC_KIND_MCELEM pg_statistic slot, which
resulted in array_selfuncs.c falling back to default estimates.
However, we do in fact have valuable information to convey:
we know that none of the array elements are as common as the
cutoff frequency that lossy counting reported.  If we report
that as the minimum MCE frequency, array_selfuncs.c will
arrive at significantly-better-than-default estimates.

So we want to construct a STATISTIC_KIND_MCELEM entry that contains
no "values" but does have "numbers", to wit the three extra numbers
that the MCELEM entry type defines.  A small obstacle is that
update_attstats() has traditionally stored a null, not an empty array,
when passed zero "values" for a slot.  That gives rise to an MCELEM
entry that get_attstatsslot() will spit up on.  The least risky
solution seems to be to adjust update_attstats() so that it will emit
a non-null (but possibly empty) array when the passed stavalues array
pointer isn't NULL, rather than conditioning that on numvalues > 0.
In other existing cases I don't believe that that changes anything.
For consistency, handle the stanumbers array the same way.

Reported-by: Mark Frost <FROSTMAR@uk.ibm.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/PH3PPF1C905D6E6F24A5C1A1A1D8345B593E16FA@PH3PPF1C905D6E6.namprd15.prod.outlook.com
---
 src/backend/commands/analyze.c           |  7 +++----
 src/backend/utils/adt/array_typanalyze.c | 15 ++++++++++++++-
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 7111d5d5334..a47e23eaa8c 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1712,10 +1712,9 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
         i = Anum_pg_statistic_stanumbers1 - 1;
         for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
         {
-            int            nnum = stats->numnumbers[k];
-
-            if (nnum > 0)
+            if (stats->stanumbers[k] != NULL)
             {
+                int            nnum = stats->numnumbers[k];
                 Datum       *numdatums = (Datum *) palloc(nnum * sizeof(Datum));
                 ArrayType  *arry;

@@ -1733,7 +1732,7 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
         i = Anum_pg_statistic_stavalues1 - 1;
         for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
         {
-            if (stats->numvalues[k] > 0)
+            if (stats->stavalues[k] != NULL)
             {
                 ArrayType  *arry;

diff --git a/src/backend/utils/adt/array_typanalyze.c b/src/backend/utils/adt/array_typanalyze.c
index 6f61629b977..346c190eaf3 100644
--- a/src/backend/utils/adt/array_typanalyze.c
+++ b/src/backend/utils/adt/array_typanalyze.c
@@ -497,6 +497,15 @@ compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
          * If we obtained more elements than we really want, get rid of those
          * with least frequencies.  The easiest way is to qsort the array into
          * descending frequency order and truncate the array.
+         *
+         * On the other extreme, we might have found no elements with
+         * frequency above the cutoff.  Then there's nothing to store so far
+         * as element values go, but we still want to record the cutoff
+         * frequency, since array_selfuncs.c can use that as an upper bound on
+         * the frequency of non-MCEs.  (Note: what we store is the minimum
+         * frequency that would have been accepted as a valid MCE.  In
+         * particular, this ensures we don't create a bogus stats entry with
+         * frequency zero.)
          */
         if (num_mcelem < track_len)
         {
@@ -506,10 +515,14 @@ compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
             minfreq = sort_table[num_mcelem - 1]->frequency;
         }
         else
+        {
             num_mcelem = track_len;
+            if (track_len == 0)
+                minfreq = maxfreq = cutoff_freq + 1;
+        }

         /* Generate MCELEM slot entry */
-        if (num_mcelem > 0)
+        if (num_mcelem >= 0)
         {
             MemoryContext old_context;
             Datum       *mcelem_values;
--
2.43.5

Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column

От

Matt Long

Дата:

08 сентября, 22:13:41

Not to let perfect be the enemy of better, but we're facing a variant of this issue that would not be addressed by the proposed patch. If you're interested, the real world use case is described in https://github.com/medplum/medplum/issues/7310. Rows have an array consisting of one common element present in every row and one unique element. With a statistics target at the default 100, this pushes the threshold for vastly overestimated row counts for the unique elements down to 7,937 rows.

In this case, the effects of the proposed patch are not applied since the most_common_elems array is not empty. I'm not a statistician, so maybe this wouldn't be valid, but it seems like using the highest frequency of an element that did not qualify for the mce list instead of the 0.5% default frequency could be an elegant, but more invasive solution.

Simple repro of the problem:

CREATE TABLE public.test(
id integer,
tags text[]
);

INSERT INTO public.test (id, tags)
SELECT n, ARRAY['common', TO_CHAR(n,'fm00000000')] FROM ( SELECT generate_series(1,7_937) AS n );

ANALYZE public.test;

SELECT most_common_elems, most_common_elem_freqs FROM pg_stats WHERE schemaname = 'public' AND tablename = 'test' AND attname = 'tags';

EXPLAIN (ANALYZE,BUFFERS,VERBOSE)
SELECT * FROM test WHERE tags && ARRAY['common'];

EXPLAIN (ANALYZE,BUFFERS,VERBOSE)
SELECT * FROM test WHERE tags && ARRAY['00000001'];

With 7,936 rows:

most_common_elems has 1,000 entries and looks like {00000001,00000003,00000009,...,common}

most_common_elem_freqs has 1,003 entries and looks like {0.00012600806,0.00012600806,0.00012600806,...,1,0.00012600806,1,0}

With 7,937+ rows:

most_common_elems is {common}

most_common_elem_freqs is {1,1,1,0}

On Mon, Sep 8, 2025 at 11:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:
> Well, we don't have a most common element in this scenario --- the
> whole point is that the occurrence counts resulting from the lossy
> counting algorithm are too low to be trustworthy. However, what we
> do have is the cutoff frequency, and it seems to me that we could use
> that as the estimate of the maximum frequency of the non-MCEs.

Here's a less crude patch for that. The array_typanalyze.c changes
are the same as before, but I reconsidered what to do about this
stumbling block:

> Assuming we like this direction, the main thing that makes this a hack
> not a polished patch is that I had to strongarm the code into storing
> a zero-length values array. What update_attstats would normally do
> is leave the values column of the MCELEM stats slot NULL, which then
> causes get_attstatsslot to throw a that-shouldn't-be-null error.
> An alternative answer is to change get_attstatsslot to allow a null,
> but I'm not sure that that's any cleaner. Either way it seems like
> there's a hazard of breaking some code that isn't expecting the case.

I concluded that letting get_attstatsslot accept nulls is too risky;
there is probably code that assumes that get_attstatsslot would've
rejected that. Instead, this version changes update_attstats' rule
for when to store an array rather than a null. Now it will do so
if the passed-in stavalues pointer isn't null, even if numvalues
is zero. I think that this doesn't change the outcome for any
existing analyze routines because they wouldn't pass a data pointer
if they have no values; and even if it does, storing an empty array
not a null seems like it should be pretty harmless.

> An alternative that feels cleaner but a good bit more invasive
> is to get rid of the convention of storing the min/max/nulls
> frequencies as extra entries in the MCELEM numbers entry ---
> which surely is a hack too --- and put them into some new slot
> type. I'm not sure if that's enough nicer to be worth the
> conversion pain.

A year ago I would have seriously considered doing it that way.
But now that we have code to dump-n-restore stats, that code would
have to be adjusted to convert the old representation. It's not
worth it for this case.

Hence, v1 attached, now with a commit message.

regards, tom lane

Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column

От

Tom Lane

Дата:

09 сентября, 02:37:01

Matt Long <matt@mattlong.org> writes:
> Not to let perfect be the enemy of better, but we're facing a variant of
> this issue that would not be addressed by the proposed patch.
> ...
> In this case, the effects of the proposed patch are not applied since the
> most_common_elems array is not empty. I'm not a statistician, so maybe this
> wouldn't be valid, but it seems like using the highest frequency of an
> element that did not qualify for the mce list instead of the 0.5% default
> frequency could be an elegant, but more invasive solution.

Yeah, I think you are quite right: we can apply this idea not only
when the MCE list is empty, but whenever we didn't have to truncate
the MCE list.  In that case we know there are no additional element
values that exceed the cutoff frequency, and that's what the
selectivity functions want to know.

Nosing around in the code that uses STATISTIC_KIND_MCELEM entries,
I spotted two additional issues that the attached v2 patch addresses:

* ts_typanalyze/ts_selfuncs have code essentially identical to the
array case, and should receive the same treatment.

* The selectivity functions believe that the upper bound on the
frequency of non-MCEs is minfreq / 2, not the stored minfreq.
This seems like complete brain fade: there could easily be
elements with frequency just less than minfreq, and probably are
if the data distribution follows Zipf's law.  I did not dig into
the git history, but I wonder if the divide-by-two business
predates the introduction of the lossy-counting algorithm, and
if so whether it was less insane with the original collection
algorithm.  In any case, this patch removes the divisions by 2,
and makes some nearby cosmetic improvements.

Many thanks for the suggestion!

            regards, tom lane

From b2c31c0d31c1edb931d4613b9706efe7ce04edbf Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon, 8 Sep 2025 19:19:00 -0400
Subject: [PATCH v2] Track the maximum possible frequency of non-MCE array
 elements.

The lossy-counting algorithm that ANALYZE uses to identify most-common
array elements has a notion of cutoff frequency: elements with
frequency greater than that are guaranteed to be collected, elements
with smaller frequencies are not.  In cases where we find fewer MCEs
than the stats target would permit us to store, the cutoff frequency
provides valuable additional information, to wit that there are no
non-MCEs with frequency greater than that.  What the selectivity
estimation functions actually use the "minfreq" entry for is as a
ceiling on the possible frequency of non-MCEs, so using the cutoff
rather than the lowest stored MCE frequency provides a tighter bound
and more accurate estimates.

Therefore, instead of redundantly storing the minimum observed MCE
frequency, store the cutoff frequency when there are fewer tracked
values than we want.  (When there are more, then of course we cannot
assert that no non-stored elements are above the cutoff frequency,
since we're throwing away some that are; so we still use the
minimum stored frequency in that case.)

Notably, this works even when none of the values are common enough
to be called MCEs.  In such cases we stored nothing in the
STATISTIC_KIND_MCELEM pg_statistic slot, which resulted in the
selectivity functions falling back to default estimates.  So in that
case we want to construct a STATISTIC_KIND_MCELEM entry that contains
no "values" but does have "numbers", to wit the three extra numbers
that the MCELEM entry type defines.  A small obstacle is that
update_attstats() has traditionally stored a null, not an empty array,
when passed zero "values" for a slot.  That gives rise to an MCELEM
entry that get_attstatsslot() will spit up on.  The least risky
solution seems to be to adjust update_attstats() so that it will emit
a non-null (but possibly empty) array when the passed stavalues array
pointer isn't NULL, rather than conditioning that on numvalues > 0.
In other existing cases I don't believe that that changes anything.
For consistency, handle the stanumbers array the same way.

In addition to adjusting the typanalyze routines that collect
STATISTIC_KIND_MCELEM data, adjust the routines that use the data
to assume that minfreq is the ceiling on the frequency of non-MCE
values.  Previously they assumed that the ceiling is minfreq / 2,
which seems obviously wrong.  (Perhaps it was correct with the
original implementation of MCE collection, but surely it is not
with the lossy-counting algorithm.)

Thanks to Matt Long for the suggestion that we could apply this
idea even when there are more than zero MCEs.

Reported-by: Mark Frost <FROSTMAR@uk.ibm.com>
Reported-by: Matt Long <matt@mattlong.org>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/PH3PPF1C905D6E6F24A5C1A1A1D8345B593E16FA@PH3PPF1C905D6E6.namprd15.prod.outlook.com
---
 contrib/intarray/_int_selfuncs.c         |  8 +++----
 src/backend/commands/analyze.c           |  7 +++---
 src/backend/tsearch/ts_selfuncs.c        | 10 ++++-----
 src/backend/tsearch/ts_typanalyze.c      | 27 +++++++++++++++++++-----
 src/backend/utils/adt/array_selfuncs.c   | 14 ++++++------
 src/backend/utils/adt/array_typanalyze.c | 27 +++++++++++++++++++-----
 src/include/catalog/pg_statistic.h       |  4 ++++
 7 files changed, 67 insertions(+), 30 deletions(-)

diff --git a/contrib/intarray/_int_selfuncs.c b/contrib/intarray/_int_selfuncs.c
index 9bf64486242..c983de998dc 100644
--- a/contrib/intarray/_int_selfuncs.c
+++ b/contrib/intarray/_int_selfuncs.c
@@ -210,8 +210,8 @@ _int_matchsel(PG_FUNCTION_ARGS)
              */
             if (sslot.nnumbers == sslot.nvalues + 3)
             {
-                /* Grab the lowest frequency. */
-                minfreq = sslot.numbers[sslot.nnumbers - (sslot.nnumbers - sslot.nvalues)];
+                /* Grab the minimal MCE frequency. */
+                minfreq = sslot.numbers[sslot.nvalues];

                 mcelems = sslot.values;
                 mcefreqs = sslot.numbers;
@@ -270,9 +270,9 @@ int_query_opr_selec(ITEM *item, Datum *mcelems, float4 *mcefreqs,
         {
             /*
              * The element is not in MCELEM.  Punt, but assume that the
-             * selectivity cannot be more than minfreq / 2.
+             * selectivity cannot be more than minfreq.
              */
-            selec = Min(DEFAULT_EQ_SEL, minfreq / 2);
+            selec = Min(DEFAULT_EQ_SEL, minfreq);
         }
     }
     else if (item->type == OPR)
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8ea2913d906..12b4f3fd36e 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1711,10 +1711,9 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
         i = Anum_pg_statistic_stanumbers1 - 1;
         for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
         {
-            int            nnum = stats->numnumbers[k];
-
-            if (nnum > 0)
+            if (stats->stanumbers[k] != NULL)
             {
+                int            nnum = stats->numnumbers[k];
                 Datum       *numdatums = (Datum *) palloc(nnum * sizeof(Datum));
                 ArrayType  *arry;

@@ -1732,7 +1731,7 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
         i = Anum_pg_statistic_stavalues1 - 1;
         for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
         {
-            if (stats->numvalues[k] > 0)
+            if (stats->stavalues[k] != NULL)
             {
                 ArrayType  *arry;

diff --git a/src/backend/tsearch/ts_selfuncs.c b/src/backend/tsearch/ts_selfuncs.c
index 453a5e5c2ea..f108ab85f27 100644
--- a/src/backend/tsearch/ts_selfuncs.c
+++ b/src/backend/tsearch/ts_selfuncs.c
@@ -239,8 +239,8 @@ mcelem_tsquery_selec(TSQuery query, Datum *mcelem, int nmcelem,
     }

     /*
-     * Grab the lowest frequency. compute_tsvector_stats() stored it for us in
-     * the one before the last cell of the Numbers array. See ts_typanalyze.c
+     * Grab the lowest MCE frequency. compute_tsvector_stats() stored it for
+     * us in the one before the last cell of the Numbers array.
      */
     minfreq = numbers[nnumbers - 2];

@@ -348,7 +348,7 @@ tsquery_opr_selec(QueryItem *item, char *operand,
              * preserves the property that "word:*" should be estimated to
              * match at least as many rows as "word" would be.
              */
-            selec = Max(Min(DEFAULT_TS_MATCH_SEL, minfreq / 2), selec);
+            selec = Max(Min(DEFAULT_TS_MATCH_SEL, minfreq), selec);
         }
         else
         {
@@ -375,9 +375,9 @@ tsquery_opr_selec(QueryItem *item, char *operand,
             {
                 /*
                  * The element is not in MCELEM.  Punt, but assume that the
-                 * selectivity cannot be more than minfreq / 2.
+                 * selectivity cannot be more than minfreq.
                  */
-                selec = Min(DEFAULT_TS_MATCH_SEL, minfreq / 2);
+                selec = Min(DEFAULT_TS_MATCH_SEL, minfreq);
             }
         }
     }
diff --git a/src/backend/tsearch/ts_typanalyze.c b/src/backend/tsearch/ts_typanalyze.c
index c5a71331ce8..c8dbea96c22 100644
--- a/src/backend/tsearch/ts_typanalyze.c
+++ b/src/backend/tsearch/ts_typanalyze.c
@@ -312,7 +312,7 @@ compute_tsvector_stats(VacAttrStats *stats,
         /*
          * Construct an array of the interesting hashtable items, that is,
          * those meeting the cutoff frequency (s - epsilon)*N.  Also identify
-         * the minimum and maximum frequencies among these items.
+         * the maximum frequency among these items.
          *
          * Since epsilon = s/10 and bucket_width = 1/epsilon, the cutoff
          * frequency is 9*N / bucket_width.
@@ -324,14 +324,12 @@ compute_tsvector_stats(VacAttrStats *stats,

         hash_seq_init(&scan_status, lexemes_tab);
         track_len = 0;
-        minfreq = lexeme_no;
         maxfreq = 0;
         while ((item = (TrackItem *) hash_seq_search(&scan_status)) != NULL)
         {
             if (item->frequency > cutoff_freq)
             {
                 sort_table[track_len++] = item;
-                minfreq = Min(minfreq, item->frequency);
                 maxfreq = Max(maxfreq, item->frequency);
             }
         }
@@ -346,19 +344,38 @@ compute_tsvector_stats(VacAttrStats *stats,
          * If we obtained more lexemes than we really want, get rid of those
          * with least frequencies.  The easiest way is to qsort the array into
          * descending frequency order and truncate the array.
+         *
+         * If we did not find more elements than we want, then it is safe to
+         * assume that the stored MCE array will contain every element with
+         * frequency above the cutoff.  In that case, rather than storing the
+         * smallest frequency we are keeping, we want to store the minimum
+         * frequency that would have been accepted as a valid MCE.  The
+         * selectivity functions can assume that that is an upper bound on the
+         * frequency of elements not present in the array.
+         *
+         * If we found no candidate MCEs at all, we still want to record the
+         * cutoff frequency, since it's still valid to assume that no element
+         * has frequency more than that.
          */
         if (num_mcelem < track_len)
         {
             qsort_interruptible(sort_table, track_len, sizeof(TrackItem *),
                                 trackitem_compare_frequencies_desc, NULL);
-            /* reset minfreq to the smallest frequency we're keeping */
+            /* set minfreq to the smallest frequency we're keeping */
             minfreq = sort_table[num_mcelem - 1]->frequency;
         }
         else
+        {
             num_mcelem = track_len;
+            /* set minfreq to the minimum frequency above the cutoff */
+            minfreq = cutoff_freq + 1;
+            /* ensure maxfreq is nonzero, too */
+            if (track_len == 0)
+                maxfreq = minfreq;
+        }

         /* Generate MCELEM slot entry */
-        if (num_mcelem > 0)
+        if (num_mcelem >= 0)
         {
             MemoryContext old_context;
             Datum       *mcelem_values;
diff --git a/src/backend/utils/adt/array_selfuncs.c b/src/backend/utils/adt/array_selfuncs.c
index a69a84c2aee..9a9936e5eda 100644
--- a/src/backend/utils/adt/array_selfuncs.c
+++ b/src/backend/utils/adt/array_selfuncs.c
@@ -544,13 +544,13 @@ mcelem_array_contain_overlap_selec(Datum *mcelem, int nmcelem,

     if (numbers)
     {
-        /* Grab the lowest observed frequency */
+        /* Grab the minimal MCE frequency */
         minfreq = numbers[nmcelem];
     }
     else
     {
         /* Without statistics make some default assumptions */
-        minfreq = 2 * (float4) DEFAULT_CONTAIN_SEL;
+        minfreq = (float4) DEFAULT_CONTAIN_SEL;
     }

     /* Decide whether it is faster to use binary search or not. */
@@ -622,9 +622,9 @@ mcelem_array_contain_overlap_selec(Datum *mcelem, int nmcelem,
         {
             /*
              * The element is not in MCELEM.  Punt, but assume that the
-             * selectivity cannot be more than minfreq / 2.
+             * selectivity cannot be more than minfreq.
              */
-            elem_selec = Min(DEFAULT_CONTAIN_SEL, minfreq / 2);
+            elem_selec = Min(DEFAULT_CONTAIN_SEL, minfreq);
         }

         /*
@@ -728,7 +728,7 @@ mcelem_array_contained_selec(Datum *mcelem, int nmcelem,

     /*
      * Grab some of the summary statistics that compute_array_stats() stores:
-     * lowest frequency, frequency of null elements, and average distinct
+     * lowest MCE frequency, frequency of null elements, and average distinct
      * element count.
      */
     minfreq = numbers[nmcelem];
@@ -803,10 +803,10 @@ mcelem_array_contained_selec(Datum *mcelem, int nmcelem,
         {
             /*
              * The element is not in MCELEM.  Punt, but assume that the
-             * selectivity cannot be more than minfreq / 2.
+             * selectivity cannot be more than minfreq.
              */
             elem_selec[unique_nitems] = Min(DEFAULT_CONTAIN_SEL,
-                                            minfreq / 2);
+                                            minfreq);
         }

         unique_nitems++;
diff --git a/src/backend/utils/adt/array_typanalyze.c b/src/backend/utils/adt/array_typanalyze.c
index 6f61629b977..560b27f3ca7 100644
--- a/src/backend/utils/adt/array_typanalyze.c
+++ b/src/backend/utils/adt/array_typanalyze.c
@@ -461,7 +461,7 @@ compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
         /*
          * Construct an array of the interesting hashtable items, that is,
          * those meeting the cutoff frequency (s - epsilon)*N.  Also identify
-         * the minimum and maximum frequencies among these items.
+         * the maximum frequency among these items.
          *
          * Since epsilon = s/10 and bucket_width = 1/epsilon, the cutoff
          * frequency is 9*N / bucket_width.
@@ -473,14 +473,12 @@ compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,

         hash_seq_init(&scan_status, elements_tab);
         track_len = 0;
-        minfreq = element_no;
         maxfreq = 0;
         while ((item = (TrackItem *) hash_seq_search(&scan_status)) != NULL)
         {
             if (item->frequency > cutoff_freq)
             {
                 sort_table[track_len++] = item;
-                minfreq = Min(minfreq, item->frequency);
                 maxfreq = Max(maxfreq, item->frequency);
             }
         }
@@ -497,19 +495,38 @@ compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
          * If we obtained more elements than we really want, get rid of those
          * with least frequencies.  The easiest way is to qsort the array into
          * descending frequency order and truncate the array.
+         *
+         * If we did not find more elements than we want, then it is safe to
+         * assume that the stored MCE array will contain every element with
+         * frequency above the cutoff.  In that case, rather than storing the
+         * smallest frequency we are keeping, we want to store the minimum
+         * frequency that would have been accepted as a valid MCE.  The
+         * selectivity functions can assume that that is an upper bound on the
+         * frequency of elements not present in the array.
+         *
+         * If we found no candidate MCEs at all, we still want to record the
+         * cutoff frequency, since it's still valid to assume that no element
+         * has frequency more than that.
          */
         if (num_mcelem < track_len)
         {
             qsort_interruptible(sort_table, track_len, sizeof(TrackItem *),
                                 trackitem_compare_frequencies_desc, NULL);
-            /* reset minfreq to the smallest frequency we're keeping */
+            /* set minfreq to the smallest frequency we're keeping */
             minfreq = sort_table[num_mcelem - 1]->frequency;
         }
         else
+        {
             num_mcelem = track_len;
+            /* set minfreq to the minimum frequency above the cutoff */
+            minfreq = cutoff_freq + 1;
+            /* ensure maxfreq is nonzero, too */
+            if (track_len == 0)
+                maxfreq = minfreq;
+        }

         /* Generate MCELEM slot entry */
-        if (num_mcelem > 0)
+        if (num_mcelem >= 0)
         {
             MemoryContext old_context;
             Datum       *mcelem_values;
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index 4216e27a8a4..444dc27dcad 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -240,6 +240,10 @@ DECLARE_FOREIGN_KEY((starelid, staattnum), pg_attribute, (attrelid, attnum));
  * the fraction of non-null rows that contain at least one null element).  If
  * this member is omitted, the column is presumed to contain no null elements.
  *
+ * Starting in v19, the first extra member can be smaller than the smallest
+ * frequency of any stored MCE, indicating that it's known that no element
+ * not present in the MCE array has frequency greater than that value.
+ *
  * Note: in current usage for tsvector columns, the stavalues elements are of
  * type text, even though their representation within tsvector is not
  * exactly text.
--
2.43.7

Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column

От

Tom Lane

Дата:

09 сентября, 22:19:46

I wrote:
> * The selectivity functions believe that the upper bound on the
> frequency of non-MCEs is minfreq / 2, not the stored minfreq.
> This seems like complete brain fade: there could easily be
> elements with frequency just less than minfreq, and probably are
> if the data distribution follows Zipf's law.  I did not dig into
> the git history, but I wonder if the divide-by-two business
> predates the introduction of the lossy-counting algorithm, and
> if so whether it was less insane with the original collection
> algorithm.  In any case, this patch removes the divisions by 2,
> and makes some nearby cosmetic improvements.

In the light of morning, I started to have second thoughts about
this aspect of the patch.  I checked the git history this time,
and found that the oldest instance of "minfreq / 2" dates to
4e57668da which originated from this discussion:

https://www.postgresql.org/message-id/flat/488DAEB8.3000402%40students.mimuw.edu.pl

It's already coded like that in Jan's initial patch, and there
was no discussion in the thread, so not a lot to be gleaned:

+            /*
+             * The element is not in MCELEM. Punt, but  assure that the
+             * selectivity cannot be more than minfreq / 2.
+             */
+            return (Selectivity) Min(DEFAULT_TS_SEL, minfreq / 2);

But looking at this, I think the problem is really that the comment
fails to describe his thought process.  We know that the frequency
of this not-in-the-MCE-list value cannot be more than minfreq, but
we have no idea how much less it is.  I think the idea of the
division might have been to assume that an "average" non-MCE value
would have a frequency about half that of the lowest MCE.  It does
seem reasonable to use some number lower than the cutoff, although
I dunno if 0.5 is exactly the right factor.

So now I'm wondering if we should leave the divisions alone and
instead fix the comments to explain why they are there.  Bolstering
this is that on the two test cases you guys submitted, the patch
with the divisions gets a spot-on estimate (1 row) while removing
the divisions yields an estimate of 2 rows.  I don't put a lot of
weight on that, since these are toy examples.  But I wonder if you
guys could test the patch on some of your real-world cases and
see if it looks like we should keep the divisions.

            regards, tom lane

Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column

От

Matt Long

Дата:

19 сентября, 21:26:59

I finally got around to testing your patch on a realistic data set. In short, the patch worked beautifully even with the division by 2 removed. In case it's helpful, the full write up of my investigation can be found at https://gist.github.com/mattlong/0617bec6e1cf5bc6b70c6c2951901df7

Your reasoning about the division by 2 being a way to approximate the "average" non-MCE value seems logical. The patch as tested above without a division yielded a row estimate of 28 for a non-MCE element on a table with ~2.1 million rows where all non-MCE elements occur exactly once. For our scenario/distribution of the data, the most conservative option, i.e. no division, was good enough to produce optimal query plans, but this is of course a very specific data distribution.

Thinking more generally, there are two cases: When the MCE list is full and when the MCE list is NOT full.

When the MCE list is NOT full, the case that this patch addresses, the data is definitionally heavily skewed; minfreq will already be quite low. Whether or not it gets divided by 2 (or some other number) probably won't make much of a difference either way.

When the MCE list is full, the data is less skewed; there will generally be a longer tail of element frequencies. Dividing by 2 will be more impactful here in an attempt to approximate the "average" non-MCE element frequency.

Putting that together, it seems like keeping the division by 2 would be preferable?

On Tue, Sep 9, 2025 at 12:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wrote:
> > * The selectivity functions believe that the upper bound on the
> > frequency of non-MCEs is minfreq / 2, not the stored minfreq.
> > This seems like complete brain fade: there could easily be
> > elements with frequency just less than minfreq, and probably are
> > if the data distribution follows Zipf's law. I did not dig into
> > the git history, but I wonder if the divide-by-two business
> > predates the introduction of the lossy-counting algorithm, and
> > if so whether it was less insane with the original collection
> > algorithm. In any case, this patch removes the divisions by 2,
> > and makes some nearby cosmetic improvements.
>
> In the light of morning, I started to have second thoughts about
> this aspect of the patch. I checked the git history this time,
> and found that the oldest instance of "minfreq / 2" dates to
> 4e57668da which originated from this discussion:
>
> https://www.postgresql.org/message-id/flat/488DAEB8.3000402%40students.mimuw.edu.pl
>
> It's already coded like that in Jan's initial patch, and there
> was no discussion in the thread, so not a lot to be gleaned:
>
> + /*
> + * The element is not in MCELEM. Punt, but assure that the
> + * selectivity cannot be more than minfreq / 2.
> + */
> + return (Selectivity) Min(DEFAULT_TS_SEL, minfreq / 2);
>
> But looking at this, I think the problem is really that the comment
> fails to describe his thought process. We know that the frequency
> of this not-in-the-MCE-list value cannot be more than minfreq, but
> we have no idea how much less it is. I think the idea of the
> division might have been to assume that an "average" non-MCE value
> would have a frequency about half that of the lowest MCE. It does
> seem reasonable to use some number lower than the cutoff, although
> I dunno if 0.5 is exactly the right factor.
>
> So now I'm wondering if we should leave the divisions alone and
> instead fix the comments to explain why they are there. Bolstering
> this is that on the two test cases you guys submitted, the patch
> with the divisions gets a spot-on estimate (1 row) while removing
> the divisions yields an estimate of 2 rows. I don't put a lot of
> weight on that, since these are toy examples. But I wonder if you
> guys could test the patch on some of your real-world cases and
> see if it looks like we should keep the divisions.
>
> regards, tom lane

Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column

От

Tom Lane

Дата:

19 сентября, 22:23:22

Matt Long <matt@mattlong.org> writes:
> I finally got around to testing your patch on a realistic data set. In
> short, the patch worked beautifully even with the division by 2 removed. In
> case it's helpful, the full write up of my investigation can be found at
> https://gist.github.com/mattlong/0617bec6e1cf5bc6b70c6c2951901df7

Thank you!  I'm now inclined to keep the divisions by 2 (with comment
fixes).

            regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column