Re: COPY fast parse patch
От | Andrew Dunstan |
---|---|
Тема | Re: COPY fast parse patch |
Дата | |
Msg-id | 30750.203.26.206.129.1117693241.squirrel@www.dunslane.net обсуждение исходный текст |
Ответ на | Re: COPY fast parse patch ("Luke Lonergan" <llonergan@greenplum.com>) |
Ответы |
Re: COPY fast parse patch
|
Список | pgsql-patches |
Luke Lonergan said: > Andrew, > >> I will be the first to admit that there are probably some very good >> possibilities for optimisation of this code. My impression though has >> been that in almost all cases it's fast enough anyway. I know that on >> some very modest hardware I have managed to load a 6m row TPC >> line-items table in just a few minutes. Before we start getting too >> hung up, I'd be interested to know just how much data people want to >> load and how fast they want it to be. If people have massive data >> loads that take hours, days or weeks then it's obviously worth >> improving if we can. I'm curious to know what size datasets people are >> really handling this way. > > x0+ GB files are common in data warehousing. The issue is often "can > we load our data within the time allotted for the batch window", > usually a matter of an hour or two. > > Assuming that TPC lineitem is 140Bytes/row, 6M rows in 3 minutes is 4.7 > MB/s. To load a 10GB file at that rate takes about 2/3 hour. If one > were to restore a 300GB database, it would take 18 hours. Maintenance > operations are impractical after a few hours, 18 is a non-starter. > > In practice, we're usually replacing an Oracle system with PostgreSQL, > and the load speed difference between the two is currently embarrassing > and makes the work impractical. > OK ... that seems fair enough. The next question is where the data being loaded comes from? pg_dump? How does load speed compare with using COPY's binary mode? cheers andrew
В списке pgsql-patches по дате отправления: