Re: VLDB Features
От | Dimitri Fontaine |
---|---|
Тема | Re: VLDB Features |
Дата | |
Msg-id | 200712121119.05192.dfontaine@hi-media.com обсуждение исходный текст |
Ответ на | Re: VLDB Features (Josh Berkus <josh@agliodbs.com>) |
Список | pgsql-hackers |
Hi, Le mercredi 12 décembre 2007, Josh Berkus a écrit : > > I'm curious what you feel is missing that pgloader doesn't fill that > > requirement: http://pgfoundry.org/projects/pgloader/ > > Because pgloader is implemented in middleware, it carries a very high > overhead if you have bad rows. As little as 1% bad rows will slow down > loading by 20% due to retries. Not that much, in fact, I'd say. pgloader allows its user to configure how large a COPY buffer to use (global parameter as of now, could easily be a per-section configuration knob, just didn't see any need for this yet). It's the 'copy_every' parameter as seen on the man page here: http://pgloader.projects.postgresql.org/#toc4 pgloader will obviously prepare a in-memory buffer of copy_every tuples to give to COPY, and in case of error will cut it and retry. Classic dichotomy approach, from initial implementation by Jan Wieck. So you can easily balance the error recovery costs against the COPY bulk size. Note also that the overall loading time with pgloader is not scaling the same as the COPY buffer size, the optimal choice depends on the dataset --- and the data massaging pgloader has to make on it ---, and I've experienced best results with 10000 and 15000 tuples buffers so far. FYI, now the pgloader topic is on the table, the next items I think I'm gonna develop for it are configurable behavior on errors tuples (load to another table when pk error, e.g.), and some limited ddl-partioning support. I'm playing with the idea for pgloader to be able to read some partitioning schemes (parsing CHECK constraint on inherited tables) and load directly into the right partitions. That would of course be done only when configured this way, and if constraints are misread it would only result in a lot more rejected rows than expected, and you still can retry using your insert trigger instead of pgloader buggy smartness. Comments welcome, regards, -- dim
В списке pgsql-hackers по дате отправления: