I have found a nasty bug when using parallel sequential scans with an =
exists clause on postgresql 9.6.1. I have found that the rows returned =
using parallel sequential scan plans are incorrect (though I haven=E2=80=99=
t dug sufficiently to know in what ways). See below for an example of =
the issue.
denver=3D# set max_parallel_workers_per_gather =3D 0;
SET
denver=3D# select count(*)
denver-# from users u=20
denver-# join address a on (a.users_id =3D u.id)=20
denver-# where exists (select 1 from address where users_id =3D u.id)
count =20
---------
9486910
(1 row)
QUERY PLAN =
=20
=
--------------------------------------------------------------------------=
-------------------------------------
Aggregate (cost=3D2117655.96..2117655.97 rows=3D1 width=3D8)
-> Merge Join (cost=3D1257.34..2094242.48 rows=3D9365393 width=3D0)
Merge Cond: (u.id =3D a.users_id)
-> Merge Semi Join (cost=3D495.43..1607025.52 rows=3D2824687 =
width=3D8)
Merge Cond: (u.id =3D address.users_id)
-> Index Only Scan using users_pk on users u =
(cost=3D0.43..1018930.31 rows=3D11648927 width=3D4)
-> Index Only Scan using address_idx01 on address =
(cost=3D0.43..456495.80 rows=3D9365393 width=3D4)
-> Index Only Scan using address_idx01 on address a =
(cost=3D0.43..456495.80 rows=3D9365393 width=3D4)
(8 rows)
denver=3D# set max_parallel_workers_per_gather =3D 1;
SET
denver=3D# select count(*)
denver-# from users u=20
denver-# join address a on (a.users_id =3D u.id)=20
denver-# where exists (select 1 from address where users_id =3D u.id);
count =20
----------
29288954
(1 row)
QUERY PLAN =
=20
=
--------------------------------------------------------------------------=
-------------------------------
Aggregate (cost=3D1889898.47..1889898.48 rows=3D1 width=3D8)
-> Hash Join (cost=3D1401575.70..1866484.99 rows=3D9365393 width=3D0)=
Hash Cond: (a.users_id =3D u.id)
-> Seq Scan on address a (cost=3D0.00..299463.93 rows=3D9365393=
width=3D4)
-> Hash (cost=3D1355233.12..1355233.12 rows=3D2824687 =
width=3D8)
-> Gather (cost=3D571820.86..1355233.12 rows=3D2824687 =
width=3D8)
Workers Planned: 1
-> Hash Join (cost=3D570820.86..1071764.42 =
rows=3D2824687 width=3D8)
Hash Cond: (address.users_id =3D u.id)
-> Parallel Seq Scan on address =
(cost=3D0.00..260900.55 rows=3D5509055 width=3D4)
-> Hash (cost=3D379705.27..379705.27 =
rows=3D11648927 width=3D4)
-> Seq Scan on users u =
(cost=3D0.00..379705.27 rows=3D11648927 width=3D4)
(12 rows)