Inserting heap tuples in bulk in COPY
От | Heikki Linnakangas |
---|---|
Тема | Inserting heap tuples in bulk in COPY |
Дата | |
Msg-id | 4E457C22.9010708@enterprisedb.com обсуждение исходный текст |
Ответы |
Re: Inserting heap tuples in bulk in COPY
Re: Inserting heap tuples in bulk in COPY Re: Inserting heap tuples in bulk in COPY Re: Inserting heap tuples in bulk in COPY Re: Inserting heap tuples in bulk in COPY Re: Inserting heap tuples in bulk in COPY |
Список | pgsql-hackers |
COPY is slow. Let's make it faster. One obvious optimization is to insert heap tuples in bigger chunks, instead of calling heap_insert() separately for every tuple. That saves the overhead of pinning and locking the buffer for every tuple, and you only need to write one WAL record for all the tuples written to the same page, instead of one for each tuple. Attached is a WIP patch to do that. It adds a new function, heap_multi_insert, which does the same thing as heap_insert, but works in bulk. It takes an array of tuples as argument, and tries to cram as many of them into the chosen targe page as it can, and only writes a single WAL record of the operation. This gives a significant speedup to COPY, particularly for narrow tables, with small tuples. Grouping multiple tuples into one WAL record reduces the WAL volume significantly, and the time spent in writing that WAL. The reduced overhead of repeatedly locking the buffer is also most noticeable on narrow tables. On wider tables, the effects are smaller. See copytest-results.txt, containing test results with three tables of different widths. The scripts used to get those numbers are also attached. Triggers complicate this. I believe it is only safe to group tuples together like this if the table has no triggers. A BEFORE ROW trigger might run a SELECT on the table being copied to, and check if some of the tuples we're about to insert exist. If we run BEFORE ROW triggers for a bunch of tuples first, and only then insert them, none of the trigger invocations will see the other rows as inserted yet. Similarly, if we run AFTER ROW triggers after inserting a bunch of tuples, the trigger for each of the insertions would see all the inserted rows. So at least for now, the patch simply falls back to inserting one row at a time if there are any triggers on the table. The patch is WIP, mainly because I didn't write the WAL replay routines yet, but please let me know if you see any issues. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Вложения
В списке pgsql-hackers по дате отправления: