Re: More stable query plans via more predictable column statistics

Поиск
Список
Период
Сортировка
От Joel Jacobson
Тема Re: More stable query plans via more predictable column statistics
Дата
Msg-id CAASwCXdpk5648Jo-QCmnya42x8a-4zm6+qm6=pZguxOTKJDsvw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: More stable query plans via more predictable column statistics  ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
Ответы Re: More stable query plans via more predictable column statistics  ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
Список pgsql-hackers
On Wed, Mar 9, 2016 at 1:25 AM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
> Thank you for spending your time to run these :-)

n/p, it took like 30 seconds :-)

> I don't want to be asking for too much here, but is there a chance you could
> try the effects of the proposed patch on an offline copy of your database?

Yes, I think that should be possible.

> Do you envision or maybe have experienced problems with query plans
> referring to the columns that are near the top of the above hist_ratio
> report?  In other words: what are the practical implications for you with
> the values being duplicated rather badly throughout the histogram like in
> the example you shown?

I don't know much about the internals of query planner,
I just read the "57.1. Row Estimation Examples" to get a basic understanding.

If I understand it correctly, if the histogram_bounds contains a lot
of duplicated values,
then the row estimation will be inaccurate, which in turn will trick
the query planner
into a sub-optimal plan?

We've had some problems lately with the query planner, or actually we've always
had them but never noticed them nor cared about them, but now during peak times
we've had short periods where we haven't been able to fully cope up
with the traffic.

I tracked down the most self_time-consuming functions and quickly saw
how to optimize them.
Many of them where on the form:
SELECT .. FROM SomeBigTable WHERE Col1 = [some dynamic value] AND Col2
= [some constant value] AND Col3 = [some other constant value]
The number of rows matching the WHERE clause were very tiny, perfect
match for a partial index:
CREATE INDEX .. ON SomeBigTable USING btree (Col1) WHERE Col2 = [some
constant value] AND Col3 = [some other constant value];

Even though the new partial index matched the query perfectly, the
query planner didn't want to use it. Instead it continued to use some
other sub-optimal index.

The only way to force it to use the correct index was to use the
"+0"-trick which I recently learned from one of my colleagues:
SELECT .. FROM SomeBigTable WHERE Col1 = [some dynamic value] AND
Col2+0 = [some constant value] AND Col3+0 = [some other constant
value]
CREATE INDEX .. ON SomeBigTable USING btree (Col1) WHERE Col2+0 =
[some constant value] AND Col3+0 = [some other constant value];

By adding +0 to the columns, the query planner will as I understand it
be extremely motivated to use the correct index, as otherwise it would
have to do a seq scan on the entire big table, which would be very
costly.

I'm glad the trick worked, now the system is fast again.

We're still on 9.1, so maybe these problems will go away once we upgrade to 9.5.

I don't know if these problems I described can be fixed by your patch,
but I wanted to share this story since I know our systems (Trustly's
and Zalando's) are quite similar in design,
so maybe you have experienced something similar.

(Side note: My biggest wish would be some way to specify explicitly on
a per top-level function level a list of indexes the query planner is
allowed to consider or is NOT allowed to consider.)



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Relation extension scalability
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: More stable query plans via more predictable column statistics