Re: Loading 500m json files to database
| От | Andrei Zhidenkov |
|---|---|
| Тема | Re: Loading 500m json files to database |
| Дата | |
| Msg-id | 1468923B-049B-4EE6-A2BD-79650CC04149@n26.com обсуждение исходный текст |
| Ответ на | Re: Loading 500m json files to database (Ertan Küçükoğlu <ertan.kucukoglu@1nar.com.tr>) |
| Ответы |
Re: Loading 500m json files to database
Re: Loading 500m json files to database |
| Список | pgsql-general |
Try to write a stored procedure (probably pl/python) that will accept an array of JSON objects so it will be possible toload data in chunks (by 100-1000 files) which should be faster. > On 23. Mar 2020, at 12:49, Ertan Küçükoğlu <ertan.kucukoglu@1nar.com.tr> wrote: > > >> On 23 Mar 2020, at 13:20, pinker <pinker@onet.eu> wrote: >> >> Hi, do you have maybe idea how to make loading process faster? >> >> I have 500 millions of json files (1 json per file) that I need to load to >> db. >> My test set is "only" 1 million files. >> >> What I came up with now is: >> >> time for i in datafiles/*; do >> psql -c "\copy json_parts(json_data) FROM $i"& >> done >> >> which is the fastest so far. But it's not what i expect. Loading 1m of data >> takes me ~3h so loading 500 times more is just unacceptable. >> >> some facts: >> * the target db is on cloud so there is no option to do tricks like turning >> fsync off >> * version postgres 11 >> * i can spin up huge postgres instance if necessary in terms of cpu/ram >> * i tried already hash partitioning (to write to 10 different tables instead >> of 1) >> >> >> Any ideas? > Hello, > > I may not be knowledge enough to answer your question. > > However, if possible, you may think of using a local physical computer to do all uploading and after do backup/restoreon cloud system. > > Compressed backup will be far less internet traffic compared to direct data inserts. > > Moreover you can do additional tricks as you mentioned. > > Thanks & regards, > Ertan > > > >
В списке pgsql-general по дате отправления: