RE: BUG #16031: Group by returns duplicate groups
| От | David Raymond |
|---|---|
| Тема | RE: BUG #16031: Group by returns duplicate groups |
| Дата | |
| Msg-id | AM6PR07MB6022C91A384768FAB540B4C0879C0@AM6PR07MB6022.eurprd07.prod.outlook.com обсуждение исходный текст |
| Ответ на | Re: BUG #16031: Group by returns duplicate groups (Andrew Gierth <andrew@tao11.riddles.org.uk>) |
| Ответы |
Re: BUG #16031: Group by returns duplicate groups
Re: BUG #16031: Group by returns duplicate groups |
| Список | pgsql-bugs |
As an update, I've currently got a dump that consistently shows weirdness when loaded. It's just the "name" field, has
1.3million records, is 15 MB zipped, and has things garbled enough that I don't mind sending it.
How small does it need to be before it's good to send to someone?
Output after re-loading the table:
testing=> \d+ weird_grouping
Table "name_stuff.weird_grouping"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+------------------------+-----------+----------+---------+----------+--------------+-------------
name | character varying(254) | | not null | | extended | |
testing=> select count(*), count(distinct name) from weird_grouping;
count | count
-----------+-----------
1,297,265 | 1,176,103
(1 row)
Time: 7616.186 ms (00:07.616)
testing=> explain (analyze, verbose, costs, buffers, timing, summary) select count(*), count(distinct name) from
weird_grouping;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=10187.10..10187.11 rows=1 width=16) (actual time=7805.463..7805.463 rows=1 loops=1)
Output: count(*), count(DISTINCT name)
Buffers: shared hit=2080 read=6236, temp read=5484 written=5506
-> Seq Scan on name_stuff.weird_grouping (cost=0.00..9563.40 rows=124740 width=516) (actual time=0.031..128.203
rows=1297265loops=1)
Output: name
Buffers: shared hit=2080 read=6236
Planning Time: 0.027 ms
Execution Time: 7805.483 ms
(8 rows)
Time: 7805.822 ms (00:07.806)
testing=> with foo as (select name from weird_grouping group by name) select name from foo group by name having
count(*)> 1;
name
------
(0 rows)
Time: 1950.401 ms (00:01.950)
testing=> explain (analyze, verbose, costs, buffers, timing, summary) with foo as (select name from weird_grouping
groupby name) select name from foo group by name having count(*) > 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=9882.25..9884.75 rows=67 width=516) (actual time=1466.738..1466.738 rows=0 loops=1)
Output: foo.name
Group Key: foo.name
Filter: (count(*) > 1)
Rows Removed by Filter: 1176101
Buffers: shared hit=2144 read=6172, temp written=4533
CTE foo
-> HashAggregate (cost=9875.25..9877.25 rows=200 width=516) (actual time=494.343..734.005 rows=1176101 loops=1)
Output: weird_grouping.name
Group Key: weird_grouping.name
Buffers: shared hit=2144 read=6172
-> Seq Scan on name_stuff.weird_grouping (cost=0.00..9563.40 rows=124740 width=516) (actual
time=0.283..120.898rows=1297265 loops=1)
Output: weird_grouping.name
Buffers: shared hit=2144 read=6172
-> CTE Scan on foo (cost=0.00..4.00 rows=200 width=516) (actual time=494.346..1035.185 rows=1176101 loops=1)
Output: foo.name
Buffers: shared hit=2144 read=6172, temp written=4533
Planning Time: 0.464 ms
Execution Time: 2230.238 ms
(19 rows)
Time: 2231.291 ms (00:02.231)
testing=> analyze verbose weird_grouping;
INFO: analyzing "name_stuff.weird_grouping"
INFO: "weird_grouping": scanned 8316 of 8316 pages, containing 1297265 live rows and 0 dead rows; 30000 rows in
sample,1297265 estimated total rows
ANALYZE
Time: 474.963 ms
testing=> select count(*), count(distinct name) from weird_grouping;
count | count
-----------+-----------
1,297,265 | 1,176,103
(1 row)
Time: 7449.983 ms (00:07.450)
testing=> explain (analyze, verbose, costs, buffers, timing, summary) select count(*), count(distinct name) from
weird_grouping;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=27774.98..27774.99 rows=1 width=16) (actual time=7754.578..7754.578 rows=1 loops=1)
Output: count(*), count(DISTINCT name)
Buffers: shared hit=2240 read=6076, temp read=5484 written=5506
-> Seq Scan on name_stuff.weird_grouping (cost=0.00..21288.65 rows=1297265 width=20) (actual time=0.036..102.144
rows=1297265loops=1)
Output: name
Buffers: shared hit=2240 read=6076
Planning Time: 0.031 ms
Execution Time: 7754.598 ms
(8 rows)
Time: 7754.902 ms (00:07.755)
testing=> with foo as (select name from weird_grouping group by name) select name from foo group by name having
count(*)> 1;
name
-------
DCT
DELTA
(2 rows)
Time: 9561.382 ms (00:09.561)
testing=> explain (analyze, verbose, costs, buffers, timing, summary) with foo as (select name from weird_grouping
groupby name) select name from foo group by name having count(*) > 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=209783.95..209786.45 rows=67 width=516) (actual time=8591.210..8686.132 rows=2 loops=1)
Output: foo.name
Group Key: foo.name
Filter: (count(*) > 1)
Rows Removed by Filter: 1176099
Buffers: shared hit=2304 read=6012, temp read=7800 written=12363
CTE foo
-> Group (cost=179613.72..186100.05 rows=947356 width=20) (actual time=6416.900..7842.634 rows=1176103 loops=1)
Output: weird_grouping.name
Group Key: weird_grouping.name
Buffers: shared hit=2304 read=6012, temp read=7800 written=7830
-> Sort (cost=179613.72..182856.89 rows=1297265 width=20) (actual time=6416.899..7665.158 rows=1297265
loops=1)
Output: weird_grouping.name
Sort Key: weird_grouping.name
Sort Method: external merge Disk: 39048kB
Buffers: shared hit=2304 read=6012, temp read=7800 written=7830
-> Seq Scan on name_stuff.weird_grouping (cost=0.00..21288.65 rows=1297265 width=20) (actual
time=0.294..137.486rows=1297265 loops=1)
Output: weird_grouping.name
Buffers: shared hit=2304 read=6012
-> CTE Scan on foo (cost=0.00..18947.12 rows=947356 width=516) (actual time=6416.902..8105.771 rows=1176103
loops=1)
Output: foo.name
Buffers: shared hit=2304 read=6012, temp read=7800 written=12363
Planning Time: 0.258 ms
Execution Time: 10305.891 ms
(24 rows)
Time: 10306.990 ms (00:10.307)
testing=>
-----Original Message-----
From: Andrew Gierth <andrew@tao11.riddles.org.uk>
Sent: Tuesday, October 1, 2019 3:27 PM
To: David Raymond <David.Raymond@tomtom.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #16031: Group by returns duplicate groups
>>>>> "David" == David Raymond <David.Raymond@tomtom.com> writes:
David> I checked for other places where there was weirdness going on
David> and there are more.
You could try something like,
select name, lname
from (select name, lag(name) over (order by name) as lname
from big_table) s
where name < lname;
That should show all cases where the sort order was inconsistent with
the < operator (which obviously should never happen).
However, there's not much more I can do to help with this, since I don't
use Windows myself and have no useful access to any Windows system. You
might try and cut down the data to the smallest set that shows
inconsistencies using the above; particularly relevant is whether the
problem only shows up for external merge sorts or whether it happens for
in-memory sorts too.
--
Andrew (irc:RhodiumToad)
В списке pgsql-bugs по дате отправления: