Обсуждение: Less selective index chosen unexpectedly

Поиск
Список
Период
Сортировка

Less selective index chosen unexpectedly

От
James Coleman
Дата:
Note: I've chosen to send this initially to -bugs because it seems to me to be just over the fine line between "this could be enhanced" to "this is always the wrong choice/unexpected behavior", but I'd like feedback on that, and we can move the discussion to -hackers if people think that'd be a better fit.                            

We stumbled across an interesting planner outcome in one of our production environments on a fairly new database.

This database has a table that contains a date (not timestamp) column whose value is effectively the current date on insert (it's not exactly this, but sufficiently so for a test case). That column is indexed by itself. Other sets of columns are also indexed (some partial).                                         

As new rows are inserted, queries that filter on the current insert date value (at least on a "small [over the ultimate lifetime of the time]" new database) result in a query plan that uses the date index rather than the more specific indexes.                                                                        

Analyzing resolves the issue when it arises.

At this point in the story it seems like an open-and-shut "you just need to analyze more often/change the auto analyze threshold" case, but I think there's perhaps a deeper planner design question.                                                                                                                      

Specifically we have a table (simplified repro case):
create table items(d date, t text, fk integer);
create index on items(d);
create index on items(t, fk, d);

For a query like:
select * from items where d = '2021-05-18' and fk = 1 and t = 'type0' limit 1;

It's possible to get either an index scan on items_d_idx with a filter on "fk" and "t" or an index scan on items_t_fk_d_idx without the need for a filter. Even if both plans estimate a low cost and a single row, it seems to be that the scan on the index containing more columns (and no filter) ought to be pretty strongly preferred unless the cost or estimate rows is dramatically higher. I assume (but haven't verified with a debugger) that what's happening here is at least partially related to fuzzy cost comparison on paths.                                                                                                       

The attached test case demonstrates the full problem at least some of the time--I've not been able to make it deterministic, but I'd say it shows the wrong plan choice (resulting in ~5k unnecessarily processed and filtered rows) roughly 2/3 of the time on my laptop against PG11 (what we're running on in production). Against ~master I able to reproduce the wrong plan choice, but not the large number of filtered rows until I modified the second INSERT's case statement to use "n.i > 5000" instead of "n.i < 25000" -- I assume this is due to some combination of index deduplication/suffix truncation. Interestingly with those changes it seems to be more deterministic against ~master than the original repro case against 11.

Thoughts?
James Coleman
Вложения

Re: Less selective index chosen unexpectedly

От
Peter Geoghegan
Дата:
On Tue, May 18, 2021 at 1:36 PM James Coleman <jtc331@gmail.com> wrote:
> The attached test case demonstrates the full problem at least some of the time--I've not been able to make it
deterministic,but I'd say it shows the wrong plan choice (resulting in ~5k unnecessarily processed and filtered rows)
roughly2/3 of the time on my laptop against PG11 (what we're running on in production). Against ~master I able to
reproducethe wrong plan choice, but not the large number of filtered rows until I modified the second INSERT's case
statementto use "n.i > 5000" instead of "n.i < 25000" -- I assume this is due to some combination of index
deduplication/suffixtruncation. Interestingly with those changes it seems to be more deterministic against ~master than
theoriginal repro case against 11. 

That's expected with 12 because the old "getting tired" behavior to
deal with pages full of duplicates in !heapkeyspace indexes adds
non-determinism (to every index tuple insertion involving an incoming
item that already has many duplicates in the index). I actually relied
on the determinism when I wrote my own index space utilization test
suite for the Postgres 12 and Postgres 13 work. With a little care you
can expect to get exactly the same index with a test case involving
serial inserts.

--
Peter Geoghegan



Re: Less selective index chosen unexpectedly

От
Tom Lane
Дата:
James Coleman <jtc331@gmail.com> writes:
> Specifically we have a table (simplified repro case):
> create table items(d date, t text, fk integer);
> create index on items(d);
> create index on items(t, fk, d);

> For a query like:
> select * from items where d = '2021-05-18' and fk = 1 and t = 'type0' limit
> 1;

> It's possible to get either an index scan on items_d_idx with a filter on
> "fk" and "t" or an index scan on items_t_fk_d_idx without the need for a
> filter. Even if both plans estimate a low cost and a single row, it seems
> to be that the scan on the index containing more columns (and no filter)
> ought to be pretty strongly preferred unless the cost or estimate rows is
> dramatically higher.

Actually not.  The multi-column index is going to be physically larger,
which means that the estimated cost to descend into it is going to be
larger just on the grounds of more I/O.  The extra comparisons to
include the additional columns in that search aren't free either.
Since you've specified LIMIT 1, you've also given up much of any cost
advantage that might accrue to scanning items after the first match.
Hence, the only way that the multi-column index is going to win out is
if the extra filter conditions are estimated to be selective enough
(relative to the condition on "d") that we have to scan multiple
entries in the d-only index before getting the first match.

Experimenting by adding

explain select * from items where d = '2021-05-18' limit 1;

(to see what the estimated selectivity of just that condition is)
at each step of your script, I see that in the trouble cases,
the "d" condition is by itself estimated to match only one row.
If that were accurate, the planner would be quite right to pick
the smaller index.

The only thing I see that's really going wrong here is marginally
inaccurate stats, especially right after a big insertion that's
not reflected into the stats yet.  I'm not sure there's much to
improve there.  You could increase the stats target some more,
though of course that just pushes out the size of table where
the issue will appear.

            regards, tom lane



Re: Less selective index chosen unexpectedly

От
Alvaro Herrera
Дата:
On 2021-May-18, Tom Lane wrote:

> The only thing I see that's really going wrong here is marginally
> inaccurate stats, especially right after a big insertion that's
> not reflected into the stats yet.  I'm not sure there's much to
> improve there.  You could increase the stats target some more,
> though of course that just pushes out the size of table where
> the issue will appear.

I think the real winner would be a mechanism to incrementally analyze
tables, so that it updates the existing stats by sampling only blocks
that have new data, and "somehow" merge that into the existing
statistics.  You could have such a process run much more frequently than
standard analyze, because the cost is [supposed to be] smaller.

Of course, the big problem with this idea is how would you merge/update
the stats at all in the first place.

... ah, it appears there have been attempts at this already:
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.63.5414&rep=rep1&type=pdf

-- 
Álvaro Herrera       Valdivia, Chile
"Ed is the standard text editor."
      http://groups.google.com/group/alt.religion.emacs/msg/8d94ddab6a9b0ad3



Re: Less selective index chosen unexpectedly

От
Peter Geoghegan
Дата:
On Tue, May 18, 2021 at 2:50 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> I think the real winner would be a mechanism to incrementally analyze
> tables, so that it updates the existing stats by sampling only blocks
> that have new data, and "somehow" merge that into the existing
> statistics.  You could have such a process run much more frequently than
> standard analyze, because the cost is [supposed to be] smaller.

I wonder if there is a more general design that incorporates changes
over time. That is, a design that has ANALYZE store old statistics for
a table in order to give the system (and the DBA) a sense of how
things change over time. This could enable autoanalyze behavior that
more or less settles on an optimal frequency between ANALYZE
operations -- frequent enough to get stable statistics with some
margin of error, but not too frequent.

I also wonder if this general approach could enable a strategy that
uses something like Bayesian inference to detect bad statistics,
and/or to dampen the consequences of bad estimates.

-- 
Peter Geoghegan



Re: Less selective index chosen unexpectedly

От
James Coleman
Дата:
On Tue, May 18, 2021 at 5:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> James Coleman <jtc331@gmail.com> writes:
> > Specifically we have a table (simplified repro case):
> > create table items(d date, t text, fk integer);
> > create index on items(d);
> > create index on items(t, fk, d);
>
> > For a query like:
> > select * from items where d = '2021-05-18' and fk = 1 and t = 'type0' limit
> > 1;
>
> > It's possible to get either an index scan on items_d_idx with a filter on
> > "fk" and "t" or an index scan on items_t_fk_d_idx without the need for a
> > filter. Even if both plans estimate a low cost and a single row, it seems
> > to be that the scan on the index containing more columns (and no filter)
> > ought to be pretty strongly preferred unless the cost or estimate rows is
> > dramatically higher.
>
> Actually not.  The multi-column index is going to be physically larger,
> which means that the estimated cost to descend into it is going to be
> larger just on the grounds of more I/O.  The extra comparisons to
> include the additional columns in that search aren't free either.
> Since you've specified LIMIT 1, you've also given up much of any cost
> advantage that might accrue to scanning items after the first match.
> Hence, the only way that the multi-column index is going to win out is
> if the extra filter conditions are estimated to be selective enough
> (relative to the condition on "d") that we have to scan multiple
> entries in the d-only index before getting the first match.
>
> Experimenting by adding
>
> explain select * from items where d = '2021-05-18' limit 1;
>
> (to see what the estimated selectivity of just that condition is)
> at each step of your script, I see that in the trouble cases,
> the "d" condition is by itself estimated to match only one row.
> If that were accurate, the planner would be quite right to pick
> the smaller index.
>
> The only thing I see that's really going wrong here is marginally
> inaccurate stats, especially right after a big insertion that's
> not reflected into the stats yet.  I'm not sure there's much to
> improve there.  You could increase the stats target some more,
> though of course that just pushes out the size of table where
> the issue will appear.

There are two axes that matter here in costing (I should have
communicated this better in my original email): 1.) correctness of
selectivity and 2.) the relative risk of getting that selectivity
wrong.

What I'm interested in here is that, at least in this case (and I'm
theorizing it's more generalizable, but that's why I wanted to get
discussion on it) the risk of getting selectivity wrong is quite high.
And it seems to me that a more specific index (though larger and
costlier to descend) is generally lower risk, albeit higher cost in
best cases for the less specific index. In this particular example it
wouldn't even matter which permutation of column ordering you have in
the more specific index -- it's always guaranteed to return the first
row that's found by descending the tree (excluding vacuumable rows).

James Coleman



Re: Less selective index chosen unexpectedly

От
David Rowley
Дата:
On Thu, 20 May 2021 at 02:54, James Coleman <jtc331@gmail.com> wrote:
> What I'm interested in here is that, at least in this case (and I'm
> theorizing it's more generalizable, but that's why I wanted to get
> discussion on it) the risk of getting selectivity wrong is quite high.
> And it seems to me that a more specific index (though larger and
> costlier to descend) is generally lower risk, albeit higher cost in
> best cases for the less specific index. In this particular example it
> wouldn't even matter which permutation of column ordering you have in
> the more specific index -- it's always guaranteed to return the first
> row that's found by descending the tree (excluding vacuumable rows).

Unfortunately we don't currently include risk in our cost model.
Today's planner is fairly happy to dance blindfolded at the top of
tall cliffs. Unfortunately we fall off them a little too often, such
as in cases like you describe. It would be good to one day give the
planner some proper lessons in safety.

I have considered that we maybe should consider adding some sort of
risk factor in our cost model. I mentioned something in [1] about it,
but I don't really have a great idea yet as to how the "risk" cost
value would be calculated. Calculating that could be hard. It might
make sense to bump up the risk factor for some paths, but as we go
through and add a risk level to more paths, then what we set the risk
value to is suddenly much more important.  It might be easy to say
that the risk value should be set to what the total_cost would have to
be multiplied by to get the cost for the worst possible case. However,
if the worst possible case is exceedingly unlikely, then maybe that's
not a great choice.

The planner does often take plenty of other risks. The situation you
describe here is just one of many.  Another example is that if we do a
join on say 3 or more conditions that those conditions could be
correlated. We currently always assume they're independent and just
multiply individual selectivities. We could end up thinking that the
join will produce far fewer rows than it actually will.  Problems can
arise in subsequent joins. We may end up doing a subsequent join using
a nested loop thinking that only 1 row will be produced from the first
join.  That is likely to turn bad if the innermost join instead
produces millions of rows.  Unfortunately extended statistics don't
yet help us with join selectivities.

We also take risks with LIMIT clauses as we assume the non-filtered
rows will be completely evenly distributed through the result.  That
ends badly when the rows we want are right at the end.

David

[1] https://www.postgresql.org/message-id/CAApHDvqBoYU8aES4a0t-J15wk1wPMFJDHcyafyfHj7JqJ+u9wg@mail.gmail.com



Re: Less selective index chosen unexpectedly

От
Arne Roland
Дата:

David Rowley <dgrowleyml@gmail.com> wrote:
> The planner does often take plenty of other risks. The situation you
> describe here is just one of many.  Another example is that if we do a
> join on say 3 or more conditions that those conditions could be
> correlated. We currently always assume they're independent and just
> multiply individual selectivities. We could end up thinking that the
> join will produce far fewer rows than it actually will.  Problems can
> arise in subsequent joins. We may end up doing a subsequent join using
> a nested loop thinking that only 1 row will be produced from the first
> join.  That is likely to turn bad if the innermost join instead
> produces millions of rows.

The n_distinct case is another especially hard one. I go around and set those values by hand all the time. Or how about the gazillion of fallback cases where the planner gives up estimating any selectivity and just pulls up another constant?
We are still living in a world where even the simplest function could cause one of these "I don't know the selectivity...let's say X0%"-cases.
And I am still surprised how well these broad estimates tend to work.
While we progressed a lot on that front, and I am grateful for all these folks who worked on the planner, the general problem is way to hard.
Getting to a point where I as a human cannot easily find a case where I can say "Ugh, that cliff is dangerous, don't go ther..." seems impossible to me.

The underlying problem here is more severe. We make systematic errors and the more complex the query gets the more likely it is that we will find a cliff to jump.
It's not just that we stumble upon one, no we actively search for it. Thinking about predicate pushing: That stuff can alter the estimated rows of a subquery.
From an optimizing standpoint that's a very scary thought. It tells me how fragile my models are.

To be honest with you here, the simple one relation index case doesn't interest me.
I have confirmed again and again, that the planner is better than me picking the correct index than I am. It wins in more than 80 % of the cases.
We always notice this one case where it spectacularly doesn't.
But if benchmarking has taught me anything, it's how much better on average the optimizer is at choosing a good index than I am.
And if get more indexes, the speed disparity tends to get bigger. It's not actively searching for a cliff. It might be, that I am just terrible at choosing from 50+ indexes. But I have learned, that I not that great at estimating whether using the more specific index is indeed the less risky thing to do. Spoiler alert: Reading the more specific and rarely used index from disc is an actual risk for some workloads.
That's different from join orders, where it's trying hard to shot itself into the foot. Or these limit cases where it lacks better statistics/knowledge of the problem.

I am at a loss what to do about this. I suspect there isn’t one solution.
I dislike the idea of a risk score though. In part because of the sheer insane amount of complexity that would be adding to the beast the planner already is.
But I also suspect, there is no way to reach sensible consensus on what is indeed risky, because it depends on the kind of querys and workloads you have.

Peter Geoghegan <pg@bowt.ie> wrote:
> I wonder if there is a more general design that incorporates changes
> over time. That is, a design that has ANALYZE store old statistics for
> a table in order to give the system (and the DBA) a sense of how
> things change over time. This could enable autoanalyze behavior that
> more or less settles on an optimal frequency between ANALYZE
> operations -- frequent enough to get stable statistics with some
> margin of error, but not too frequent.

I feel like the current design mainly views the analyze as a little addition to the autovacuum. And fwiw syncing them up, seems worthwhile to me in most cases.
I don't want to think to much about combining different parts into an incremental statistic. Even for MVC that seems impossible to do.

I find that idea interesting though. In particular because it shows potential to get rid of a lot of analyze scripts I have running at timed intervals, some of which definitely often run more frequently than necessary.
Do you have a particular idea in mind to predict future changes? Forecasting problems are impossible to solve reliably.
If it breaks in some weird high velocity cases, we could always allow for a configuration to switch back to the old behavior.
Every change could ruin somebody's day. But to beat the simple heuristic of y% of rows in most cases, doesn't sound like rocket science. We can always drop in more complex ai models, once we have figured out what breaks with some test databases.
What are your thoughts on this?

Regards
Arne