Re: Speed up UPDATE query?
От | Scott Marlowe |
---|---|
Тема | Re: Speed up UPDATE query? |
Дата | |
Msg-id | dcc563d10910311708s281a81deg81a3968cbc4073ab@mail.gmail.com обсуждение исходный текст |
Ответ на | Speed up UPDATE query? (Lee Hachadoorian <lee.hachadoorian@gmail.com>) |
Список | pgsql-sql |
On Thu, Oct 29, 2009 at 11:50 AM, Lee Hachadoorian <lee.hachadoorian@gmail.com> wrote: > I'm trying to update several tables (all child tables of the same > parent), and as the number of records increases, the length of time it > takes to run the update is shooting up exponentially. I have imported > the new data to an import table, and then join the import table to the > update table. The update statement looks like: > > UPDATE > household_2000 h > SET > hhincome = new_hhincome > FROM ( > SELECT > serial, new_hhincome > FROM > import > WHERE > year = 2000 > ) r > WHERE > h.serial = r.serial > > household_2000 is a child table of a household table that, as you > might guess, only contains records from the year 2000. I am putting a > year = 2000 restriction on the import table and then linking on the > unique identifier. > > For different child tables, this is how long the update takes to run > (numbers are approximate): > > Records Cost (via EXPLAIN) Actual time > 460,000 300,000 23 seconds > 510,000 320,000 26 seconds > 1.2 million 670,000 3:16 > 1.3 million 820,000 3:25 > 6.2 million 2.7 million ~2.5 hours > > So, the cost estimate given by EXPLAIN seems to be roughly > proportional to the number of records in the dataset, but the actual > time it takes to run seems to increase faster than the cost, even for > the small and medium tables, and shoots through the roof for the large > tables. Since I need to run this on additional child tables that are > larger (the largest is 14 million records), I want to know what I can > do to speed up the query. > > Here's the EXPLAIN for the query. Note that the query plan is the same > for the small, medium, and large tables. > > Hash Join (cost=1268532.36..2379787.06 rows=5465837 width=1128) > Hash Cond: (import_6_17_rev_hh.serial = h.serial) > -> Bitmap Heap Scan on import_6_17_rev_hh > (cost=126551.72..308495.69 rows=5465837 width=8) > Recheck Cond: (year = 1990) > -> Bitmap Index Scan on import_6_17_rev_hh_pkey > (cost=0.00..125185.26 rows=5465837 width=0) > Index Cond: (year = 1990) > -> Hash (cost=295596.06..295596.06 rows=5527406 width=1124) > -> Seq Scan on household_1990 h (cost=0.00..295596.06 > rows=5527406 width=1124) Any chance of getting the output of explain analyze for a fast and a slow run of this query?
В списке pgsql-sql по дате отправления: