Re: Long Running Update - My Solution
От | Harry Mantheakis |
---|---|
Тема | Re: Long Running Update - My Solution |
Дата | |
Msg-id | 4E09A389.80604@riskcontrollimited.com обсуждение исходный текст |
Ответ на | Re: Long Running Update - My Solution ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
Список | pgsql-performance |
Hello Kevin > If you use EXPLAIN with both statements... Yes, the plans are indeed very different. Here is the statement, set to update up to 100,000 records, which took about 5 seconds to complete: UPDATE table_A SET field_1 = table_B.field_1 , field_2 = table_B.field_2 FROM table_B WHERE table_B.tb_id >= 0 AND table_B.tb_id <= 100000 AND table_B.tb_id = table_A.ta_id ; The query plan for the above is: Nested Loop (cost=0.00..2127044.47 rows=73620 width=63) -> Index Scan using table_B_pkey on table_B (cost=0.00..151830.75 rows=73620 width=20) Index Cond: ((tb_id >= 0) AND (tb_id <= 100000)) -> Index Scan using table_A_pkey on table_A (cost=0.00..26.82 rows=1 width=47) Index Cond: (table_A.ta_id = table_B.tb_id) Now, if I change the first AND clause to update 1M records, as follows: table_B.id <= 1000000 I get the following - quite different - query plan: Hash Join (cost=537057.49..8041177.88 rows=852150 width=63) Hash Cond: (table_A.ta_id = table_B.tb_id) -> Seq Scan on table_A (cost=0.00..3294347.71 rows=145561171 width=47) -> Hash (cost=521411.62..521411.62 rows=852150 width=20) -> Bitmap Heap Scan on table_B (cost=22454.78..521411.62 rows=852150 width=20) Recheck Cond: ((tb_id >= 0) AND (tb_id <= 1000000)) -> Bitmap Index Scan on table_B_pkey (cost=0.00..22241.74 rows=852150 width=0) Index Cond: ((tb_id >= 0) AND (tb_id <= 1000000)) Note: When I tried updating 1M records, the command was still running after 25 minutes before I killed it. The sequential scan in the later plan looks expensive, and (I think) supports what others have since mentioned, namely that when the optimizer moves to using sequential scans (working off the disk) things get a lot slower. For me, the penny has finally dropped on why I should use EXPLAIN for bulk operations. Thanks too, to Greg Smith, Robert Klemme and Thomas for all the feedback. Kind regards Harry Mantheakis London, UK
В списке pgsql-performance по дате отправления: