Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit
От | Heikki Linnakangas |
---|---|
Тема | Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit |
Дата | |
Msg-id | 47D5150C.8090802@enterprisedb.com обсуждение исходный текст |
Ответ на | Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit (Craig Ringer <craig@postnewspapers.com.au>) |
Ответы |
Re: Very slow (2 tuples/second) sequential scan after bulk
insert; speed returns to ~500 tuples/second after commit
Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit |
Список | pgsql-performance |
Craig Ringer wrote: > I'm encountering an odd issue with a bulk import query using PostgreSQL > 8.3. After a 400,000 row import into a just-truncated table `booking', a > sequential scan run on the table in the same transaction is incredibly > slow, taking ~ 166738.047 ms. After a: > `COMMIT; BEGIN;' > the same query runs in 712.615 ms, with almost all the time difference > being in the sequential scan of the `booking' table [schema at end of post]. > > The table is populated by a complex pl/pgsql function that draws from > several other tables to convert data from another app's format. You must be having an exception handler block in that pl/pgsql function, which implicitly creates a new subtransaction on each invocation of the exception handler block, so you end up with hundreds of thousands of committed subtransactions. For each row in the seq scan, the list of subtransactions is scanned, to see if the transaction that inserted the row is part of the current top-level transaction. That's fine for a handful of subtransactions, but it gets really slow with large numbers of them, as you've seen. It's an O(n^2) operation, where n is the number of rows inserted, so you'll be in even more trouble if the number of rows increases. As a work-around, avoid using exception handlers, or process more than 1 row per function invocation. Or COMMIT the transaction, as you did. For 8.4, it would be nice to improve that. I tested that on my laptop with a similarly-sized table, inserting each row in a pl/pgsql function with an exception handler, and I got very similar run times. According to oprofile, all the time is spent in TransactionIdIsInProgress. I think it would be pretty straightforward to store the committed subtransaction ids in a sorted array, instead of a linked list, and binary search. Or to use a hash table. That should eliminate this problem, though there is still other places as well where a large number of subtransactions will hurt performance. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
В списке pgsql-performance по дате отправления: