Re: COPY enhancements

Поиск
Список
Период
Сортировка
От Greg Smith
Тема Re: COPY enhancements
Дата
Msg-id alpine.GSO.2.01.0910081310300.25300@westnet.com
обсуждение исходный текст
Ответ на Re: COPY enhancements  (Rod Taylor <rod.taylor@gmail.com>)
Список pgsql-hackers
On Thu, 8 Oct 2009, Rod Taylor wrote:

> 1) Having copy remember which specific line caused the error. So it can 
> replace lines 1 through 487 in a subtransaction since it knows those are 
> successful. Run 488 in its on subtransaction. Run 489 through ... in a 
> new subtransaction.

This is the standard technique used in other bulk loaders I'm aware of.

> 2) Increasing the number of records per subtransaction if data is clean. 
> It wouldn't take long until you were inserting millions of records per 
> subtransaction for a large data set.

You can make it adaptive in both directions with some boundaries.  If you 
double the batch size every time there's a clean commit, and halve it 
every time there's an error, start batching at 1024 and bound to the range 
[1,1048576].  That's close to optimal behavior here if combined with the 
targeted retry described in (1).

The retry scheduling and batch size parts are the trivial and well 
understood parts here.  Actually getting all this to play nicely with 
transactions and commit failures (rather than just bad data failures) is 
what's difficult.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Joshua D. Drake"
Дата:
Сообщение: Re: Concurrency testing
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Issues for named/mixed function notation patch