Re: Loading 500m json files to database
От | Adrian Klaver |
---|---|
Тема | Re: Loading 500m json files to database |
Дата | |
Msg-id | bfe3f14a-37b7-2047-20bf-8ece013ae470@aklaver.com обсуждение исходный текст |
Ответ на | Loading 500m json files to database (pinker <pinker@onet.eu>) |
Ответы |
Re: Loading 500m json files to database
|
Список | pgsql-general |
On 3/23/20 3:24 AM, pinker wrote: > Hi, do you have maybe idea how to make loading process faster? > > I have 500 millions of json files (1 json per file) that I need to load to > db. > My test set is "only" 1 million files. > > What I came up with now is: > > time for i in datafiles/*; do > psql -c "\copy json_parts(json_data) FROM $i"& > done > > which is the fastest so far. But it's not what i expect. Loading 1m of data > takes me ~3h so loading 500 times more is just unacceptable. Aggregating the JSON files as others have suggested would help greatly. Knowing what is happening in json_parts() might help folks provide further tips. > > some facts: > * the target db is on cloud so there is no option to do tricks like turning > fsync off > * version postgres 11 > * i can spin up huge postgres instance if necessary in terms of cpu/ram > * i tried already hash partitioning (to write to 10 different tables instead > of 1) > > > Any ideas? > > > > -- > Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html > > -- Adrian Klaver adrian.klaver@aklaver.com
В списке pgsql-general по дате отправления: