Обсуждение: DELETE query and indexes again

Поиск
Список
Период
Сортировка

DELETE query and indexes again

От
Wells Oliver
Дата:
This play_statistics table has 96,577,179 rows. Its PK is gid integer, guid uuid, cat text, name text, target integer.

Doing a EXPLAIN ANALYZE DELETE WHERE gid IN (SELECT DISTINCT gid FROM temptable) where there is one row in the temptable, I get this:

Delete on play_statistics (cost=1984.14..8037586.36 rows=3734075 width=34) (actual time=407093.859..407093.859 rows=0 loops=1)
-> Hash Join (cost=1984.14..8037586.36 rows=3734075 width=34) (actual time=407093.857..407093.857 rows=0 loops=1)
Hash Cond: (play_statistics.gid = "ANY_subquery".gid)
-> Seq Scan on play_statistics (cost=0.00..7781818.16 rows=96600516 width=10) (actual time=201.521..396052.101 rows=96556116 loops=1)
-> Hash (cost=1981.64..1981.64 rows=200 width=32) (actual time=13.020..13.020 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Subquery Scan on "ANY_subquery" (cost=1977.64..1981.64 rows=200 width=32) (actual time=13.012..13.013 rows=1 loops=1)
-> HashAggregate (cost=1977.64..1979.64 rows=200 width=4) (actual time=13.001..13.002 rows=1 loops=1)
Group Key: stats.gid
-> Seq Scan on stats (cost=0.00..1924.91 rows=21091 width=4) (actual time=0.028..9.244 rows=21091 loops=1)
Planning Time: 0.657 ms
JIT:
Functions: 15
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 1.816 ms, Inlining 53.472 ms, Optimization 89.014 ms, Emission 58.759 ms, Total 203.060 ms
Execution Time: 407112.908 ms
(16 rows)


It takes 7 minutes to delete 21063 rows for one given gid value, which seems excessive given I think it should be using the PK index.

Anything I'm  missing here?

--
Wells Oliver
wells.oliver@gmail.com

Re: DELETE query and indexes again

От
Jeff Janes
Дата:


On Thu, Jul 23, 2020 at 5:11 PM Wells Oliver <wells.oliver@gmail.com> wrote:
This play_statistics table has 96,577,179 rows. Its PK is gid integer, guid uuid, cat text, name text, target integer.

Doing a EXPLAIN ANALYZE DELETE WHERE gid IN (SELECT DISTINCT gid FROM temptable) where there is one row in the temptable, I get this:

Delete on play_statistics (cost=1984.14..8037586.36 rows=3734075 width=34) (actual time=407093.859..407093.859 rows=0 loops=1)
-> Hash Join (cost=1984.14..8037586.36 rows=3734075 width=34) (actual time=407093.857..407093.857 rows=0 loops=1)
Hash Cond: (play_statistics.gid = "ANY_subquery".gid)
-> Seq Scan on play_statistics (cost=0.00..7781818.16 rows=96600516 width=10) (actual time=201.521..396052.101 rows=96556116 loops=1)
-> Hash (cost=1981.64..1981.64 rows=200 width=32) (actual time=13.020..13.020 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Subquery Scan on "ANY_subquery" (cost=1977.64..1981.64 rows=200 width=32) (actual time=13.012..13.013 rows=1 loops=1)
-> HashAggregate (cost=1977.64..1979.64 rows=200 width=4) (actual time=13.001..13.002 rows=1 loops=1)
Group Key: stats.gid
-> Seq Scan on stats (cost=0.00..1924.91 rows=21091 width=4) (actual time=0.028..9.244 rows=21091 loops=1)
Planning Time: 0.657 ms
JIT:
Functions: 15
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 1.816 ms, Inlining 53.472 ms, Optimization 89.014 ms, Emission 58.759 ms, Total 203.060 ms
Execution Time: 407112.908 ms
(16 rows)


It takes 7 minutes to delete 21063 rows for one given gid value, which seems excessive given I think it should be using the PK index.

Anything I'm  missing here?


Your command text is over "temptable", but your plan is over "stats".  Is "temptable" a view?  

Cheers,

Jeff

Re: DELETE query and indexes again

От
Wells Oliver
Дата:
I should have included this: when I use the first two cols from the PK (gid, guid) it uses the index. When I use the first col only (gid), it does not. Here's the index delete.

 Delete on play_statistics ps  (cost=0.69..472003.08 rows=28920 width=12) (actual time=3534.409..3534.409 rows=0 loops=1)
   ->  Nested Loop  (cost=0.69..472003.08 rows=28920 width=12) (actual time=16.703..1857.140 rows=2269013 loops=1)
         ->  Seq Scan on stats s  (cost=0.00..2253.24 rows=54624 width=26) (actual time=16.565..60.615 rows=21063 loops=1)
         ->  Index Scan using play_statistics_pkey on play_statistics ps  (cost=0.69..8.59 rows=1 width=26) (actual time=0.008..0.065 rows=108 loops=21063)
               Index Cond: ((gid = s.gid) AND (guid = s.guid))
 Planning Time: 1.494 ms
 JIT:
   Functions: 9
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 2.500 ms, Inlining 0.000 ms, Optimization 0.873 ms, Emission 15.309 ms, Total 18.682 ms
 Execution Time: 3569.225 ms
(11 rows)




On Thu, Jul 23, 2020 at 2:11 PM Wells Oliver <wells.oliver@gmail.com> wrote:
This play_statistics table has 96,577,179 rows. Its PK is gid integer, guid uuid, cat text, name text, target integer.

Doing a EXPLAIN ANALYZE DELETE WHERE gid IN (SELECT DISTINCT gid FROM temptable) where there is one row in the temptable, I get this:

Delete on play_statistics (cost=1984.14..8037586.36 rows=3734075 width=34) (actual time=407093.859..407093.859 rows=0 loops=1)
-> Hash Join (cost=1984.14..8037586.36 rows=3734075 width=34) (actual time=407093.857..407093.857 rows=0 loops=1)
Hash Cond: (play_statistics.gid = "ANY_subquery".gid)
-> Seq Scan on play_statistics (cost=0.00..7781818.16 rows=96600516 width=10) (actual time=201.521..396052.101 rows=96556116 loops=1)
-> Hash (cost=1981.64..1981.64 rows=200 width=32) (actual time=13.020..13.020 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Subquery Scan on "ANY_subquery" (cost=1977.64..1981.64 rows=200 width=32) (actual time=13.012..13.013 rows=1 loops=1)
-> HashAggregate (cost=1977.64..1979.64 rows=200 width=4) (actual time=13.001..13.002 rows=1 loops=1)
Group Key: stats.gid
-> Seq Scan on stats (cost=0.00..1924.91 rows=21091 width=4) (actual time=0.028..9.244 rows=21091 loops=1)
Planning Time: 0.657 ms
JIT:
Functions: 15
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 1.816 ms, Inlining 53.472 ms, Optimization 89.014 ms, Emission 58.759 ms, Total 203.060 ms
Execution Time: 407112.908 ms
(16 rows)


It takes 7 minutes to delete 21063 rows for one given gid value, which seems excessive given I think it should be using the PK index.

Anything I'm  missing here?

--
Wells Oliver
wells.oliver@gmail.com


--

Re: DELETE query and indexes again

От
Wells Oliver
Дата:
Apologies: stats is temptable in this query.

On Thu, Jul 23, 2020 at 2:47 PM Jeff Janes <jeff.janes@gmail.com> wrote:


On Thu, Jul 23, 2020 at 5:11 PM Wells Oliver <wells.oliver@gmail.com> wrote:
This play_statistics table has 96,577,179 rows. Its PK is gid integer, guid uuid, cat text, name text, target integer.

Doing a EXPLAIN ANALYZE DELETE WHERE gid IN (SELECT DISTINCT gid FROM temptable) where there is one row in the temptable, I get this:

Delete on play_statistics (cost=1984.14..8037586.36 rows=3734075 width=34) (actual time=407093.859..407093.859 rows=0 loops=1)
-> Hash Join (cost=1984.14..8037586.36 rows=3734075 width=34) (actual time=407093.857..407093.857 rows=0 loops=1)
Hash Cond: (play_statistics.gid = "ANY_subquery".gid)
-> Seq Scan on play_statistics (cost=0.00..7781818.16 rows=96600516 width=10) (actual time=201.521..396052.101 rows=96556116 loops=1)
-> Hash (cost=1981.64..1981.64 rows=200 width=32) (actual time=13.020..13.020 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Subquery Scan on "ANY_subquery" (cost=1977.64..1981.64 rows=200 width=32) (actual time=13.012..13.013 rows=1 loops=1)
-> HashAggregate (cost=1977.64..1979.64 rows=200 width=4) (actual time=13.001..13.002 rows=1 loops=1)
Group Key: stats.gid
-> Seq Scan on stats (cost=0.00..1924.91 rows=21091 width=4) (actual time=0.028..9.244 rows=21091 loops=1)
Planning Time: 0.657 ms
JIT:
Functions: 15
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 1.816 ms, Inlining 53.472 ms, Optimization 89.014 ms, Emission 58.759 ms, Total 203.060 ms
Execution Time: 407112.908 ms
(16 rows)


It takes 7 minutes to delete 21063 rows for one given gid value, which seems excessive given I think it should be using the PK index.

Anything I'm  missing here?


Your command text is over "temptable", but your plan is over "stats".  Is "temptable" a view?  

Cheers,

Jeff


--

Re: DELETE query and indexes again

От
Wells Oliver
Дата:
So, uses the index:

explain analyze delete from statcast.play_statistics as ps
using stats as s
where ps.gid = s.gid
and ps.guid = s.guid;

It does this:

->  Index Scan using play_statistics_pkey on play_statistics ps  (cost=0.69..8.59 rows=1 width=26) (actual time=0.008..0.065 rows=108 loops=21063)
         Index Cond: ((gid = s.gid) AND (guid = s.guid))

Does not use the index:

explain analyze delete from statcast.play_statistics where gid in (select distinct gid from stats);

It does this:
-> Seq Scan on play_statistics (cost=0.00..7781818.16 rows=96600516 width=10) (actual time=201.521..396052.101 rows=96556116 loops=1)