Re: BUG #13908: Query returns too few rows

Поиск
Список
Период
Сортировка
От Seth P
Тема Re: BUG #13908: Query returns too few rows
Дата
Msg-id SN1PR18MB0399675E22D0346F11C74DEC8BD00@SN1PR18MB0399.namprd18.prod.outlook.com
обсуждение исходный текст
Ответ на Re: BUG #13908: Query returns too few rows  ("David G. Johnston" <david.g.johnston@gmail.com>)
Список pgsql-bugs
Below are the EXPLAIN ANALYZE results. I will try to reproduce the problem =
with isolated/toy data, but that may take a while.


Query A (results believed to be incorrect)

"Aggregate  (cost=3D1821781.18..1821781.19 rows=3D1 width=3D0) (actual time=
=3D15887.259..15887.259 rows=3D1 loops=3D1)"
"  ->  Hash Join  (cost=3D1811365.12..1821096.53 rows=3D273861 width=3D0) (=
actual time=3D15562.422..15871.122 rows=3D415983 loops=3D1)"
"        Hash Cond: ((uuu.barrid)::text =3D (rrr.barrid)::text)"
"        ->  Bitmap Heap Scan on temp_universe_instruments uuu  (cost=3D231=
.25..932.60 rows=3D8108 width=3D8) (actual time=3D0.445..1.281 rows=3D7993 =
loops=3D1)"
"              Recheck Cond: (universe_hash =3D '5188205190738336870'::bigi=
nt)"
"              Heap Blocks: exact=3D44"
"              ->  Bitmap Index Scan on pk_temp_universe_instruments  (cost=
=3D0.00..229.23 rows=3D8108 width=3D0) (actual time=3D0.436..0.436 rows=3D7=
993 loops=3D1)"
"                    Index Cond: (universe_hash =3D '5188205190738336870'::=
bigint)"
"        ->  Hash  (cost=3D1796474.21..1796474.21 rows=3D893492 width=3D8) =
(actual time=3D15556.367..15556.367 rows=3D1275138 loops=3D1)"
"              Buckets: 131072 (originally 131072)  Batches: 32 (originally=
 16)  Memory Usage: 3073kB"
"              ->  Hash Join  (cost=3D2341.15..1796474.21 rows=3D893492 wid=
th=3D8) (actual time=3D5761.544..15363.787 rows=3D1275138 loops=3D1)"
"                    Hash Cond: (rrr.barra_file_idx =3D fff.idx)"
"                    ->  Seq Scan on models_direct_row_asset_data rrr  (cos=
t=3D0.00..1518763.74 rows=3D71049174 width=3D12) (actual time=3D0.002..6316=
.855 rows=3D71098547 loops=3D1)"
"                    ->  Hash  (cost=3D2329.00..2329.00 rows=3D972 width=3D=
4) (actual time=3D0.613..0.613 rows=3D964 loops=3D1)"
"                          Buckets: 1024  Batches: 1  Memory Usage: 42kB"
"                          ->  Bitmap Heap Scan on models_direct_file fff  =
(cost=3D35.33..2329.00 rows=3D972 width=3D4) (actual time=3D0.092..0.516 ro=
ws=3D964 loops=3D1)"
"                                Recheck Cond: ((file_name_date >=3D '2005-=
03-01'::date) AND (file_name_date <=3D '2005-07-30'::date))"
"                                Filter: ((file_name_model_ver)::text =3D '=
100'::text)"
"                                Rows Removed by Filter: 540"
"                                Heap Blocks: exact=3D66"
"                                ->  Bitmap Index Scan on ix_models_direct_=
file_file_name_date  (cost=3D0.00..35.08 rows=3D1479 width=3D0) (actual tim=
e=3D0.082..0.082 rows=3D1504 loops=3D1)"
"                                      Index Cond: ((file_name_date >=3D '2=
005-03-01'::date) AND (file_name_date <=3D '2005-07-30'::date))"
"Planning time: 0.866 ms"
"Execution time: 15887.534 ms"


Query A-D

"Aggregate  (cost=3D1886439.23..1886439.24 rows=3D1 width=3D0) (actual time=
=3D15927.999..15927.999 rows=3D1 loops=3D1)"
"  ->  Unique  (cost=3D1875484.79..1883015.97 rows=3D273861 width=3D64) (ac=
tual time=3D15745.902..15911.374 rows=3D416075 loops=3D1)"
"        ->  Sort  (cost=3D1875484.79..1876169.44 rows=3D273861 width=3D64)=
 (actual time=3D15745.901..15794.280 rows=3D416075 loops=3D1)"
"              Sort Key: rrr.idx, rrr.row_number, rrr.barrid, rrr.yield_pct=
, rrr.total_risk_pct, rrr.spec_risk_pct, rrr.hist_beta, rrr.pred_beta, rrr.=
data_date, rrr.barra_file_idx"
"              Sort Method: external sort  Disk: 30080kB"
"              ->  Hash Join  (cost=3D3375.10..1840453.92 rows=3D273861 wid=
th=3D64) (actual time=3D5718.845..15502.041 rows=3D416075 loops=3D1)"
"                    Hash Cond: ((rrr.barrid)::text =3D (uuu.barrid)::text)=
"
"                    ->  Hash Join  (cost=3D2341.15..1796474.21 rows=3D8934=
92 width=3D64) (actual time=3D5716.653..15310.931 rows=3D1275138 loops=3D1)=
"
"                          Hash Cond: (rrr.barra_file_idx =3D fff.idx)"
"                          ->  Seq Scan on models_direct_row_asset_data rrr=
  (cost=3D0.00..1518763.74 rows=3D71049174 width=3D64) (actual time=3D0.002=
..6262.125 rows=3D71098547 loops=3D1)"
"                          ->  Hash  (cost=3D2329.00..2329.00 rows=3D972 wi=
dth=3D4) (actual time=3D0.630..0.630 rows=3D964 loops=3D1)"
"                                Buckets: 1024  Batches: 1  Memory Usage: 4=
2kB"
"                                ->  Bitmap Heap Scan on models_direct_file=
 fff  (cost=3D35.33..2329.00 rows=3D972 width=3D4) (actual time=3D0.101..0.=
535 rows=3D964 loops=3D1)"
"                                      Recheck Cond: ((file_name_date >=3D =
'2005-03-01'::date) AND (file_name_date <=3D '2005-07-30'::date))"
"                                      Filter: ((file_name_model_ver)::text=
 =3D '100'::text)"
"                                      Rows Removed by Filter: 540"
"                                      Heap Blocks: exact=3D66"
"                                      ->  Bitmap Index Scan on ix_models_d=
irect_file_file_name_date  (cost=3D0.00..35.08 rows=3D1479 width=3D0) (actu=
al time=3D0.092..0.092 rows=3D1504 loops=3D1)"
"                                            Index Cond: ((file_name_date >=
=3D '2005-03-01'::date) AND (file_name_date <=3D '2005-07-30'::date))"
"                    ->  Hash  (cost=3D932.60..932.60 rows=3D8108 width=3D8=
) (actual time=3D2.085..2.085 rows=3D7993 loops=3D1)"
"                          Buckets: 8192  Batches: 1  Memory Usage: 377kB"
"                          ->  Bitmap Heap Scan on temp_universe_instrument=
s uuu  (cost=3D231.25..932.60 rows=3D8108 width=3D8) (actual time=3D0.463..=
1.255 rows=3D7993 loops=3D1)"
"                                Recheck Cond: (universe_hash =3D '51882051=
90738336870'::bigint)"
"                                Heap Blocks: exact=3D44"
"                                ->  Bitmap Index Scan on pk_temp_universe_=
instruments  (cost=3D0.00..229.23 rows=3D8108 width=3D0) (actual time=3D0.4=
50..0.450 rows=3D7993 loops=3D1)"
"                                      Index Cond: (universe_hash =3D '5188=
205190738336870'::bigint)"
"Planning time: 0.889 ms"
"Execution time: 15932.735 ms"

Query B

"Aggregate  (cost=3D1809368.84..1809368.85 rows=3D1 width=3D0) (actual time=
=3D15031.329..15031.330 rows=3D1 loops=3D1)"
"  ->  Hash Join  (cost=3D2484.83..1809086.39 rows=3D112981 width=3D0) (act=
ual time=3D5653.925..15024.207 rows=3D168886 loops=3D1)"
"        Hash Cond: ((rrr.barrid)::text =3D (uuu.barrid)::text)"
"        ->  Hash Join  (cost=3D1450.88..1790335.13 rows=3D368611 width=3D8=
) (actual time=3D5651.724..14946.845 rows=3D533241 loops=3D1)"
"              Hash Cond: (rrr.barra_file_idx =3D fff.idx)"
"              ->  Seq Scan on models_direct_row_asset_data rrr  (cost=3D0.=
00..1518763.74 rows=3D71049174 width=3D12) (actual time=3D0.002..6188.050 r=
ows=3D71098547 loops=3D1)"
"              ->  Hash  (cost=3D1445.87..1445.87 rows=3D401 width=3D4) (ac=
tual time=3D0.261..0.261 rows=3D391 loops=3D1)"
"                    Buckets: 1024  Batches: 1  Memory Usage: 22kB"
"                    ->  Bitmap Heap Scan on models_direct_file fff  (cost=
=3D14.49..1445.87 rows=3D401 width=3D4) (actual time=3D0.043..0.218 rows=3D=
391 loops=3D1)"
"                          Recheck Cond: ((file_name_date >=3D '2005-03-01'=
::date) AND (file_name_date <=3D '2005-04-30'::date))"
"                          Filter: ((file_name_model_ver)::text =3D '100'::=
text)"
"                          Rows Removed by Filter: 219"
"                          Heap Blocks: exact=3D34"
"                          ->  Bitmap Index Scan on ix_models_direct_file_f=
ile_name_date  (cost=3D0.00..14.39 rows=3D610 width=3D0) (actual time=3D0.0=
36..0.036 rows=3D610 loops=3D1)"
"                                Index Cond: ((file_name_date >=3D '2005-03=
-01'::date) AND (file_name_date <=3D '2005-04-30'::date))"
"        ->  Hash  (cost=3D932.60..932.60 rows=3D8108 width=3D8) (actual ti=
me=3D2.108..2.108 rows=3D7993 loops=3D1)"
"              Buckets: 8192  Batches: 1  Memory Usage: 377kB"
"              ->  Bitmap Heap Scan on temp_universe_instruments uuu  (cost=
=3D231.25..932.60 rows=3D8108 width=3D8) (actual time=3D0.460..1.280 rows=
=3D7993 loops=3D1)"
"                    Recheck Cond: (universe_hash =3D '5188205190738336870'=
::bigint)"
"                    Heap Blocks: exact=3D44"
"                    ->  Bitmap Index Scan on pk_temp_universe_instruments =
 (cost=3D0.00..229.23 rows=3D8108 width=3D0) (actual time=3D0.452..0.452 ro=
ws=3D7993 loops=3D1)"
"                          Index Cond: (universe_hash =3D '5188205190738336=
870'::bigint)"
"Planning time: 0.850 ms"
"Execution time: 15031.386 ms"


Query B-D

"Aggregate  (cost=3D1827336.97..1827336.98 rows=3D1 width=3D0) (actual time=
=3D15193.952..15193.952 rows=3D1 loops=3D1)"
"  ->  Unique  (cost=3D1822817.73..1825924.71 rows=3D112981 width=3D64) (ac=
tual time=3D15122.057..15187.357 rows=3D168886 loops=3D1)"
"        ->  Sort  (cost=3D1822817.73..1823100.18 rows=3D112981 width=3D64)=
 (actual time=3D15122.056..15141.047 rows=3D168886 loops=3D1)"
"              Sort Key: rrr.idx, rrr.row_number, rrr.barrid, rrr.yield_pct=
, rrr.total_risk_pct, rrr.spec_risk_pct, rrr.hist_beta, rrr.pred_beta, rrr.=
data_date, rrr.barra_file_idx"
"              Sort Method: external sort  Disk: 12208kB"
"              ->  Hash Join  (cost=3D2484.83..1809086.39 rows=3D112981 wid=
th=3D64) (actual time=3D5655.552..15024.729 rows=3D168886 loops=3D1)"
"                    Hash Cond: ((rrr.barrid)::text =3D (uuu.barrid)::text)=
"
"                    ->  Hash Join  (cost=3D1450.88..1790335.13 rows=3D3686=
11 width=3D64) (actual time=3D5653.397..14944.115 rows=3D533241 loops=3D1)"
"                          Hash Cond: (rrr.barra_file_idx =3D fff.idx)"
"                          ->  Seq Scan on models_direct_row_asset_data rrr=
  (cost=3D0.00..1518763.74 rows=3D71049174 width=3D64) (actual time=3D0.002=
..6210.401 rows=3D71098547 loops=3D1)"
"                          ->  Hash  (cost=3D1445.87..1445.87 rows=3D401 wi=
dth=3D4) (actual time=3D0.268..0.268 rows=3D391 loops=3D1)"
"                                Buckets: 1024  Batches: 1  Memory Usage: 2=
2kB"
"                                ->  Bitmap Heap Scan on models_direct_file=
 fff  (cost=3D14.49..1445.87 rows=3D401 width=3D4) (actual time=3D0.053..0.=
236 rows=3D391 loops=3D1)"
"                                      Recheck Cond: ((file_name_date >=3D =
'2005-03-01'::date) AND (file_name_date <=3D '2005-04-30'::date))"
"                                      Filter: ((file_name_model_ver)::text=
 =3D '100'::text)"
"                                      Rows Removed by Filter: 219"
"                                      Heap Blocks: exact=3D34"
"                                      ->  Bitmap Index Scan on ix_models_d=
irect_file_file_name_date  (cost=3D0.00..14.39 rows=3D610 width=3D0) (actua=
l time=3D0.047..0.047 rows=3D610 loops=3D1)"
"                                            Index Cond: ((file_name_date >=
=3D '2005-03-01'::date) AND (file_name_date <=3D '2005-04-30'::date))"
"                    ->  Hash  (cost=3D932.60..932.60 rows=3D8108 width=3D8=
) (actual time=3D2.050..2.050 rows=3D7993 loops=3D1)"
"                          Buckets: 8192  Batches: 1  Memory Usage: 377kB"
"                          ->  Bitmap Heap Scan on temp_universe_instrument=
s uuu  (cost=3D231.25..932.60 rows=3D8108 width=3D8) (actual time=3D0.467..=
1.239 rows=3D7993 loops=3D1)"
"                                Recheck Cond: (universe_hash =3D '51882051=
90738336870'::bigint)"
"                                Heap Blocks: exact=3D44"
"                                ->  Bitmap Index Scan on pk_temp_universe_=
instruments  (cost=3D0.00..229.23 rows=3D8108 width=3D0) (actual time=3D0.4=
57..0.457 rows=3D7993 loops=3D1)"
"                                      Index Cond: (universe_hash =3D '5188=
205190738336870'::bigint)"
"Planning time: 0.883 ms"
"Execution time: 15197.640 ms"


Query C

"Aggregate  (cost=3D1818525.28..1818525.29 rows=3D1 width=3D0) (actual time=
=3D15181.269..15181.269 rows=3D1 loops=3D1)"
"  ->  Hash Join  (cost=3D2826.27..1818128.02 rows=3D158907 width=3D0) (act=
ual time=3D5814.662..15170.804 rows=3D247189 loops=3D1)"
"        Hash Cond: ((rrr.barrid)::text =3D (uuu.barrid)::text)"
"        ->  Hash Join  (cost=3D1792.32..1792174.92 rows=3D518446 width=3D8=
) (actual time=3D5812.495..15061.248 rows=3D741897 loops=3D1)"
"              Hash Cond: (rrr.barra_file_idx =3D fff.idx)"
"              ->  Seq Scan on models_direct_row_asset_data rrr  (cost=3D0.=
00..1518763.74 rows=3D71049174 width=3D12) (actual time=3D0.002..6233.423 r=
ows=3D71098547 loops=3D1)"
"              ->  Hash  (cost=3D1785.27..1785.27 rows=3D564 width=3D4) (ac=
tual time=3D0.367..0.367 rows=3D573 loops=3D1)"
"                    Buckets: 1024  Batches: 1  Memory Usage: 29kB"
"                    ->  Bitmap Heap Scan on models_direct_file fff  (cost=
=3D21.01..1785.27 rows=3D564 width=3D4) (actual time=3D0.058..0.305 rows=3D=
573 loops=3D1)"
"                          Recheck Cond: ((file_name_date >=3D '2005-05-01'=
::date) AND (file_name_date <=3D '2005-07-30'::date))"
"                          Filter: ((file_name_model_ver)::text =3D '100'::=
text)"
"                          Rows Removed by Filter: 321"
"                          Heap Blocks: exact=3D43"
"                          ->  Bitmap Index Scan on ix_models_direct_file_f=
ile_name_date  (cost=3D0.00..20.87 rows=3D858 width=3D0) (actual time=3D0.0=
51..0.051 rows=3D894 loops=3D1)"
"                                Index Cond: ((file_name_date >=3D '2005-05=
-01'::date) AND (file_name_date <=3D '2005-07-30'::date))"
"        ->  Hash  (cost=3D932.60..932.60 rows=3D8108 width=3D8) (actual ti=
me=3D2.066..2.066 rows=3D7993 loops=3D1)"
"              Buckets: 8192  Batches: 1  Memory Usage: 377kB"
"              ->  Bitmap Heap Scan on temp_universe_instruments uuu  (cost=
=3D231.25..932.60 rows=3D8108 width=3D8) (actual time=3D0.463..1.228 rows=
=3D7993 loops=3D1)"
"                    Recheck Cond: (universe_hash =3D '5188205190738336870'=
::bigint)"
"                    Heap Blocks: exact=3D44"
"                    ->  Bitmap Index Scan on pk_temp_universe_instruments =
 (cost=3D0.00..229.23 rows=3D8108 width=3D0) (actual time=3D0.454..0.454 ro=
ws=3D7993 loops=3D1)"
"                          Index Cond: (universe_hash =3D '5188205190738336=
870'::bigint)"
"Planning time: 0.866 ms"
"Execution time: 15181.330 ms"


Query C-D

"Aggregate  (cost=3D1844190.13..1844190.14 rows=3D1 width=3D0) (actual time=
=3D15495.232..15495.233 rows=3D1 loops=3D1)"
"  ->  Unique  (cost=3D1837833.85..1842203.79 rows=3D158907 width=3D64) (ac=
tual time=3D15388.762..15485.467 rows=3D247189 loops=3D1)"
"        ->  Sort  (cost=3D1837833.85..1838231.12 rows=3D158907 width=3D64)=
 (actual time=3D15388.761..15416.951 rows=3D247189 loops=3D1)"
"              Sort Key: rrr.idx, rrr.row_number, rrr.barrid, rrr.yield_pct=
, rrr.total_risk_pct, rrr.spec_risk_pct, rrr.hist_beta, rrr.pred_beta, rrr.=
data_date, rrr.barra_file_idx"
"              Sort Method: external sort  Disk: 17880kB"
"              ->  Hash Join  (cost=3D2826.27..1818128.02 rows=3D158907 wid=
th=3D64) (actual time=3D5832.453..15240.599 rows=3D247189 loops=3D1)"
"                    Hash Cond: ((rrr.barrid)::text =3D (uuu.barrid)::text)=
"
"                    ->  Hash Join  (cost=3D1792.32..1792174.92 rows=3D5184=
46 width=3D64) (actual time=3D5830.312..15121.828 rows=3D741897 loops=3D1)"
"                          Hash Cond: (rrr.barra_file_idx =3D fff.idx)"
"                          ->  Seq Scan on models_direct_row_asset_data rrr=
  (cost=3D0.00..1518763.74 rows=3D71049174 width=3D64) (actual time=3D0.002=
..6244.816 rows=3D71098547 loops=3D1)"
"                          ->  Hash  (cost=3D1785.27..1785.27 rows=3D564 wi=
dth=3D4) (actual time=3D0.360..0.360 rows=3D573 loops=3D1)"
"                                Buckets: 1024  Batches: 1  Memory Usage: 2=
9kB"
"                                ->  Bitmap Heap Scan on models_direct_file=
 fff  (cost=3D21.01..1785.27 rows=3D564 width=3D4) (actual time=3D0.055..0.=
310 rows=3D573 loops=3D1)"
"                                      Recheck Cond: ((file_name_date >=3D =
'2005-05-01'::date) AND (file_name_date <=3D '2005-07-30'::date))"
"                                      Filter: ((file_name_model_ver)::text=
 =3D '100'::text)"
"                                      Rows Removed by Filter: 321"
"                                      Heap Blocks: exact=3D43"
"                                      ->  Bitmap Index Scan on ix_models_d=
irect_file_file_name_date  (cost=3D0.00..20.87 rows=3D858 width=3D0) (actua=
l time=3D0.048..0.048 rows=3D894 loops=3D1)"
"                                            Index Cond: ((file_name_date >=
=3D '2005-05-01'::date) AND (file_name_date <=3D '2005-07-30'::date))"
"                    ->  Hash  (cost=3D932.60..932.60 rows=3D8108 width=3D8=
) (actual time=3D2.043..2.043 rows=3D7993 loops=3D1)"
"                          Buckets: 8192  Batches: 1  Memory Usage: 377kB"
"                          ->  Bitmap Heap Scan on temp_universe_instrument=
s uuu  (cost=3D231.25..932.60 rows=3D8108 width=3D8) (actual time=3D0.466..=
1.240 rows=3D7993 loops=3D1)"
"                                Recheck Cond: (universe_hash =3D '51882051=
90738336870'::bigint)"
"                                Heap Blocks: exact=3D44"
"                                ->  Bitmap Index Scan on pk_temp_universe_=
instruments  (cost=3D0.00..229.23 rows=3D8108 width=3D0) (actual time=3D0.4=
56..0.456 rows=3D7993 loops=3D1)"
"                                      Index Cond: (universe_hash =3D '5188=
205190738336870'::bigint)"
"Planning time: 0.876 ms"
"Execution time: 15499.128 ms"



________________________________
From: David G. Johnston <david.g.johnston@gmail.com>
Sent: Tuesday, February 2, 2016 6:27 PM
To: Tom Lane
Cc: seth-p@outlook.com; pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #13908: Query returns too few rows

On Tue, Feb 2, 2016 at 3:52 PM, Tom Lane <tgl@sss.pgh.pa.us<mailto:tgl@sss.=
pgh.pa.us>> wrote:
seth-p@outlook.com<mailto:seth-p@outlook.com> writes:
> Query (A-D) (with DISTINCT) should not return more rows than query (A) (t=
he
> identical query without DISTINCT), so clearly something is wrong there.

That does seem fishy, but unless you can provide a self-contained test
case, it's unlikely that we are going to be able to magically locate
the problem.  I'd suggest seeing if you can reproduce the issue with
some obfuscated or randomly-generated data.

?While Tom is correct I'd like to make a couple of points...

It apparently isn't the DISTINCT query that is increasing the count of rows=
 but rather than the non-DISTINCT version fails to return/count as many as =
are actually present - but only when dealing with the entire range...

?Lacking a reproducible test case you really need to at least supply an EXP=
LAIN ANALYZE so that actual row counts at each node can be observed.

The note about the apparent extra HASH first made me think that there must =
be some kind of hash collision involved in the data - apparently one that o=
ccurs between data points in B and C but not within B or within C...but I f=
ear this might be a red herring.  But if it is a collision then the odds of=
 random data exhibiting the problem are quite slim...

David J.

В списке pgsql-bugs по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: BUG #13908: Query returns too few rows
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: BUG #13908: Query returns too few rows