Re: Loading 500m json files to database
От | Reid Thompson |
---|---|
Тема | Re: Loading 500m json files to database |
Дата | |
Msg-id | 157720d0e07960130e2eef7282a3118f071e7f29.camel@omnicell.com обсуждение исходный текст |
Ответ на | Loading 500m json files to database (pinker <pinker@onet.eu>) |
Список | pgsql-general |
On Mon, 2020-03-23 at 03:24 -0700, pinker wrote: > [EXTERNAL SOURCE] > > > > Hi, do you have maybe idea how to make loading process faster? > > I have 500 millions of json files (1 json per file) that I need to load to > db. > My test set is "only" 1 million files. > > What I came up with now is: > > time for i in datafiles/*; do > psql -c "\copy json_parts(json_data) FROM $i"& > done > > which is the fastest so far. But it's not what i expect. Loading 1m of data > takes me ~3h so loading 500 times more is just unacceptable. > > some facts: > * the target db is on cloud so there is no option to do tricks like turning > fsync off > * version postgres 11 > * i can spin up huge postgres instance if necessary in terms of cpu/ram > * i tried already hash partitioning (to write to 10 different tables instead > of 1) > > > Any ideas? https://www.gnu.org/software/parallel/
В списке pgsql-general по дате отправления: