Обсуждение: [BUGS] BUG #14729: Between operator is slow when same value used for low andhigh margin

Поиск
Список
Период
Сортировка

[BUGS] BUG #14729: Between operator is slow when same value used for low andhigh margin

От
pavel.tavoda@gmail.com
Дата:
The following bug has been logged on the website:

Bug reference:      14729
Logged by:          Pavel Tavoda
Email address:      pavel.tavoda@gmail.com
PostgreSQL version: 9.4.10
Operating system:   Linux
Description:

When using JOIN with BETWEEN with same value for low and high marging query
is taking more than 40 times longer. Amount of result rows is aroung
20.000.

SELECT count(*) FROM contractportfolio cp JOIN contract co ON cp.contract =
co.id WHERE validfor between '2017-05-30' AND '2017-05-31';
Time: 0.473

SELECT count(*) FROM contractportfolio cp JOIN contract co ON cp.contract =
co.id WHERE validfor between '2017-05-31' AND '2017-05-31';
Time: 19.172

SELECT count(*) FROM contractportfolio cp JOIN contract co ON cp.contract =
co.id WHERE validfor = '2017-05-31';
Time: 0.467



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14729: Between operator is slow when same value usedfor low and high margin

От
Pavel Stehule
Дата:
Hi

2017-07-03 12:09 GMT+02:00 <pavel.tavoda@gmail.com>:
The following bug has been logged on the website:

Bug reference:      14729
Logged by:          Pavel Tavoda
Email address:      pavel.tavoda@gmail.com
PostgreSQL version: 9.4.10
Operating system:   Linux
Description:

When using JOIN with BETWEEN with same value for low and high marging query
is taking more than 40 times longer. Amount of result rows is aroung
20.000.

SELECT count(*) FROM contractportfolio cp JOIN contract co ON cp.contract =
co.id WHERE validfor between '2017-05-30' AND '2017-05-31';
Time: 0.473

SELECT count(*) FROM contractportfolio cp JOIN contract co ON cp.contract =
co.id WHERE validfor between '2017-05-31' AND '2017-05-31';
Time: 19.172

SELECT count(*) FROM contractportfolio cp JOIN contract co ON cp.contract =
co.id WHERE validfor = '2017-05-31';
Time: 0.467

Please show result of EXPLAIN ANALYZE your query

Regards

Pavel 



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14729: Between operator is slow when same value usedfor low and high margin

От
Pavel Tavoda
Дата:
Hi Pavel, results down in text.

On Mon, Jul 3, 2017 at 4:22 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

2017-07-03 12:09 GMT+02:00 <pavel.tavoda@gmail.com>:
SELECT count(*) FROM contractportfolio cp JOIN contract co ON cp.contract =
co.id WHERE validfor between '2017-05-30' AND '2017-05-31';
Time: 0.473
Aggregate  (cost=110320.00..110320.01 rows=1 width=0) (actual time=520.304..520.304 rows=1 loops=1)
  ->  Hash Join  (cost=426.27..110244.02 rows=30393 width=0) (actual time=5.852..509.223 rows=40844 loops=1)
        Hash Cond: (cp.contract = co.id)
        ->  Seq Scan on contractportfolio cp  (cost=0.00..109171.90 rows=30393 width=8) (actual time=0.007..471.669 rows=40844 loops=1)
              Filter: ((validfor >= '2017-05-30'::date) AND (validfor <= '2017-05-31'::date))
              Rows Removed by Filter: 2946433
        ->  Hash  (cost=336.12..336.12 rows=7212 width=8) (actual time=5.833..5.833 rows=7198 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 197kB
              ->  Seq Scan on contract co  (cost=0.00..336.12 rows=7212 width=8) (actual time=0.006..2.856 rows=7198 loops=1)
Planning time: 0.333 ms
Execution time: 520.343 ms

 
 
SELECT count(*) FROM contractportfolio cp JOIN contract co ON cp.contract =
co.id WHERE validfor between '2017-05-31' AND '2017-05-31';
Time: 19.172
Aggregate  (cost=109598.17..109598.18 rows=1 width=0) (actual time=87769.576..87769.577 rows=1 loops=1)
  ->  Nested Loop  (cost=0.00..109598.16 rows=1 width=0) (actual time=1.949..87759.415 rows=20426 loops=1)
        Join Filter: (cp.contract = co.id)
        Rows Removed by Join Filter: 147005922
        ->  Seq Scan on contractportfolio cp  (cost=0.00..109171.90 rows=1 width=8) (actual time=0.009..486.918 rows=20426 loops=1)
              Filter: ((validfor >= '2017-05-31'::date) AND (validfor <= '2017-05-31'::date))
              Rows Removed by Filter: 2966851
        ->  Seq Scan on contract co  (cost=0.00..336.12 rows=7212 width=8) (actual time=0.001..2.046 rows=7198 loops=20426)
Planning time: 0.319 ms
Execution time: 87769.621 ms

STUNNING!!!!
 

SELECT count(*) FROM contractportfolio cp JOIN contract co ON cp.contract =
co.id WHERE validfor = '2017-05-31';
Time: 0.467
Aggregate  (cost=102402.86..102402.87 rows=1 width=0) (actual time=492.645..492.646 rows=1 loops=1)
  ->  Hash Join  (cost=426.27..102352.37 rows=20197 width=0) (actual time=5.873..486.873 rows=20426 loops=1)
        Hash Cond: (cp.contract = co.id)
        ->  Seq Scan on contractportfolio cp  (cost=0.00..101496.91 rows=20197 width=8) (actual time=0.009..463.063 rows=20426 loops=1)
              Filter: (validfor = '2017-05-31'::date)
              Rows Removed by Filter: 2966851
        ->  Hash  (cost=336.12..336.12 rows=7212 width=8) (actual time=5.856..5.856 rows=7198 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 197kB
              ->  Seq Scan on contract co  (cost=0.00..336.12 rows=7212 width=8) (actual time=0.005..2.835 rows=7198 loops=1)
Planning time: 0.325 ms
Execution time: 492.686 ms

Re: [BUGS] BUG #14729: Between operator is slow when same value usedfor low and high margin

От
Pavel Stehule
Дата:


2017-07-03 16:39 GMT+02:00 Pavel Tavoda <pavel.tavoda@gmail.com>:
Hi Pavel, results down in text.

On Mon, Jul 3, 2017 at 4:22 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

2017-07-03 12:09 GMT+02:00 <pavel.tavoda@gmail.com>:
SELECT count(*) FROM contractportfolio cp JOIN contract co ON cp.contract =
co.id WHERE validfor between '2017-05-30' AND '2017-05-31';
Time: 0.473
Aggregate  (cost=110320.00..110320.01 rows=1 width=0) (actual time=520.304..520.304 rows=1 loops=1)
  ->  Hash Join  (cost=426.27..110244.02 rows=30393 width=0) (actual time=5.852..509.223 rows=40844 loops=1)
        Hash Cond: (cp.contract = co.id)
        ->  Seq Scan on contractportfolio cp  (cost=0.00..109171.90 rows=30393 width=8) (actual time=0.007..471.669 rows=40844 loops=1)
              Filter: ((validfor >= '2017-05-30'::date) AND (validfor <= '2017-05-31'::date))
              Rows Removed by Filter: 2946433
        ->  Hash  (cost=336.12..336.12 rows=7212 width=8) (actual time=5.833..5.833 rows=7198 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 197kB
              ->  Seq Scan on contract co  (cost=0.00..336.12 rows=7212 width=8) (actual time=0.006..2.856 rows=7198 loops=1)
Planning time: 0.333 ms
Execution time: 520.343 ms

 
 
SELECT count(*) FROM contractportfolio cp JOIN contract co ON cp.contract =
co.id WHERE validfor between '2017-05-31' AND '2017-05-31';
Time: 19.172
Aggregate  (cost=109598.17..109598.18 rows=1 width=0) (actual time=87769.576..87769.577 rows=1 loops=1)
  ->  Nested Loop  (cost=0.00..109598.16 rows=1 width=0) (actual time=1.949..87759.415 rows=20426 loops=1)
        Join Filter: (cp.contract = co.id)
        Rows Removed by Join Filter: 147005922
        ->  Seq Scan on contractportfolio cp  (cost=0.00..109171.90 rows=1 width=8) (actual time=0.009..486.918 rows=20426 loops=1)
              Filter: ((validfor >= '2017-05-31'::date) AND (validfor <= '2017-05-31'::date))
              Rows Removed by Filter: 2966851
        ->  Seq Scan on contract co  (cost=0.00..336.12 rows=7212 width=8) (actual time=0.001..2.046 rows=7198 loops=20426)
Planning time: 0.319 ms
Execution time: 87769.621 ms

STUNNING!!!!
 

SELECT count(*) FROM contractportfolio cp JOIN contract co ON cp.contract =
co.id WHERE validfor = '2017-05-31';
Time: 0.467
Aggregate  (cost=102402.86..102402.87 rows=1 width=0) (actual time=492.645..492.646 rows=1 loops=1)
  ->  Hash Join  (cost=426.27..102352.37 rows=20197 width=0) (actual time=5.873..486.873 rows=20426 loops=1)
        Hash Cond: (cp.contract = co.id)
        ->  Seq Scan on contractportfolio cp  (cost=0.00..101496.91 rows=20197 width=8) (actual time=0.009..463.063 rows=20426 loops=1)
              Filter: (validfor = '2017-05-31'::date)
              Rows Removed by Filter: 2966851
        ->  Hash  (cost=336.12..336.12 rows=7212 width=8) (actual time=5.856..5.856 rows=7198 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 197kB
              ->  Seq Scan on contract co  (cost=0.00..336.12 rows=7212 width=8) (actual time=0.005..2.835 rows=7198 loops=1)
Planning time: 0.325 ms
Execution time: 492.686 ms

It looks like unanalyzed data - the system thinking so there are not any data. Try to run ANALYZE more frequently or after any significant change of table

Regards

Pavel

Pavel Tavoda <pavel.tavoda@gmail.com> writes:
>         ->  Seq Scan on contractportfolio cp  (cost=0.00..109171.90 rows=1 width=8) (actual time=0.009..486.918
rows=20426loops=1) 
>               Filter: ((validfor >= '2017-05-31'::date) AND (validfor <= '2017-05-31'::date))

> STUNNING!!!!

Yup, it's certainly that factor-of-20K rowcount misestimate that is
killing you here.  Given that this estimate isn't too bad:

>         ->  Seq Scan on contractportfolio cp  (cost=0.00..101496.91 rows=20197 width=8) (actual time=0.009..463.063
rows=20426loops=1) 
>               Filter: (validfor = '2017-05-31'::date)

I do not think your problem is one of out-of-date statistics.  Rather,
the issue is just that we're bad at narrow range estimates.  I did
some work on that today[1] but it won't show up in a released PG version
before next year.  In the meantime, I believe that the existing code would
arrive at a plausible answer if the value being checked were present in
the column's pg_stats.most_common_vals list.  Maybe you could fix this
by increasing the statistics target for the column or the whole table
(and re-analyzing it, of course).
        regards, tom lane

[1] https://www.postgresql.org/message-id/12232.1499140410%40sss.pgh.pa.us


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14729: Between operator is slow when same value usedfor low and high margin

От
Pavel Tavoda
Дата:
Hello, after 'VACUUM FULL VERBOSE ANALYZE' of both tables still very strange results:

VACUUM FULL VERBOSE ANALYZE contractportfolio;
VACUUM FULL VERBOSE ANALYZE contract;

EXPLAIN ANALYZE SELECT count(*) FROM contractportfolio cp JOIN contract co ON cp.contract = co.id WHERE validfor = '2017-05-31';
Aggregate  (cost=87255.63..87255.64 rows=1 width=0) (actual time=488.634..488.635 rows=1 loops=1)
  ->  Hash Join  (cost=424.96..87208.66 rows=18789 width=0) (actual time=6.160..482.780 rows=20426 loops=1)
        Hash Cond: (cp.contract = co.id)
        ->  Seq Scan on contractportfolio cp  (cost=0.00..86384.44 rows=18789 width=8) (actual time=0.025..456.431 rows=20426 loops=1)
              Filter: (validfor = '2017-05-31'::date)
              Rows Removed by Filter: 2966971
        ->  Hash  (cost=334.98..334.98 rows=7198 width=8) (actual time=6.125..6.125 rows=7198 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 197kB
              ->  Seq Scan on contract co  (cost=0.00..334.98 rows=7198 width=8) (actual time=0.003..2.931 rows=7198 loops=1)
Planning time: 0.329 ms
Execution time: 488.674 ms

EXPLAIN ANALYZE SELECT count(*) FROM contractportfolio cp JOIN contract co ON cp.contract = co.id WHERE validfor between '2017-05-30' AND '2017-05-31';
Aggregate  (cost=94304.35..94304.36 rows=1 width=0) (actual time=528.667..528.667 rows=1 loops=1)
  ->  Hash Join  (cost=424.96..94301.58 rows=1106 width=0) (actual time=6.198..517.042 rows=40844 loops=1)
        Hash Cond: (cp.contract = co.id)
        ->  Seq Scan on contractportfolio cp  (cost=0.00..93853.12 rows=1106 width=8) (actual time=0.022..474.856 rows=40844 loops=1)
              Filter: ((validfor >= '2017-05-30'::date) AND (validfor <= '2017-05-31'::date))
              Rows Removed by Filter: 2946553
        ->  Hash  (cost=334.98..334.98 rows=7198 width=8) (actual time=6.166..6.166 rows=7198 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 197kB
              ->  Seq Scan on contract co  (cost=0.00..334.98 rows=7198 width=8) (actual time=0.004..2.960 rows=7198 loops=1)
Planning time: 0.340 ms
Execution time: 528.708 ms

EXPLAIN ANALYZE SELECT count(*) FROM contractportfolio cp JOIN contract co ON cp.contract = co.id WHERE validfor between '2017-05-31' AND '2017-05-31'
Aggregate  (cost=94278.08..94278.09 rows=1 width=0) (actual time=93832.610..93832.610 rows=1 loops=1)
  ->  Nested Loop  (cost=0.00..94278.08 rows=1 width=0) (actual time=2.042..93822.069 rows=20426 loops=1)
        Join Filter: (cp.contract = co.id)
        Rows Removed by Join Filter: 147005922
        ->  Seq Scan on contractportfolio cp  (cost=0.00..93853.12 rows=1 width=8) (actual time=0.023..487.343 rows=20426 loops=1)
              Filter: ((validfor >= '2017-05-31'::date) AND (validfor <= '2017-05-31'::date))
              Rows Removed by Filter: 2966971
        ->  Seq Scan on contract co  (cost=0.00..334.98 rows=7198 width=8) (actual time=0.001..2.165 rows=7198 loops=20426)
Planning time: 0.333 ms
Execution time: 93832.656 ms

Same problem... here is something wrong with optimization.
The problem is shown when both dates are same it doesn't matter which date. Same result for 2017-05-12.

Regards

/Pavel


On Jul 4, 2017 07:15, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
Pavel Tavoda <pavel.tavoda@gmail.com> writes:
>         ->  Seq Scan on contractportfolio cp  (cost=0.00..109171.90 rows=1 width=8) (actual time=0.009..486.918 rows=20426 loops=1)
>               Filter: ((validfor >= '2017-05-31'::date) AND (validfor <= '2017-05-31'::date))

> STUNNING!!!!

Yup, it's certainly that factor-of-20K rowcount misestimate that is
killing you here.  Given that this estimate isn't too bad:

>         ->  Seq Scan on contractportfolio cp  (cost=0.00..101496.91 rows=20197 width=8) (actual time=0.009..463.063 rows=20426 loops=1)
>               Filter: (validfor = '2017-05-31'::date)

I do not think your problem is one of out-of-date statistics.  Rather,
the issue is just that we're bad at narrow range estimates.  I did
some work on that today[1] but it won't show up in a released PG version
before next year.  In the meantime, I believe that the existing code would
arrive at a plausible answer if the value being checked were present in
the column's pg_stats.most_common_vals list.  Maybe you could fix this
by increasing the statistics target for the column or the whole table
(and re-analyzing it, of course).

                        regards, tom lane

[1] https://www.postgresql.org/message-id/12232.1499140410%40sss.pgh.pa.us