Re: Loading 500m json files to database
| От | Rob Sargent |
|---|---|
| Тема | Re: Loading 500m json files to database |
| Дата | |
| Msg-id | 3A9D9AC0-AD61-45C6-87AD-BD8C9E255A37@gmail.com обсуждение исходный текст |
| Ответ на | Re: Loading 500m json files to database ("David G. Johnston" <david.g.johnston@gmail.com>) |
| Ответы |
Re: Loading 500m json files to database
|
| Список | pgsql-general |
Can one put 550M files in a single directory? I thought it topped out at 16M or so.On Mar 23, 2020, at 7:11 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:On Mon, Mar 23, 2020 at 3:24 AM pinker <pinker@onet.eu> wrote:time for i in datafiles/*; do
psql -c "\copy json_parts(json_data) FROM $i"&
doneDon't know whether this is faster but it does avoid spinning up a connection multiple times.#bash, linuxfunction append_each_split_file_to_etl_load_script() {
for filetoload in ./*; do
ronumber="$(basename $filetoload)"
# only process files since subdirs can be present
if [[ -f "$filetoload" ]]; then
echo ""
echo "\set invoice"' `cat '"'""$filetoload""'"'`'
echo ", ('$ronumber',:'invoice')"
fi >> "$PSQLSCRIPT"
done
echo "" >> "$PSQLSCRIPT"
echo ";" >> "$PSQLSCRIPT"
echo "" >> "$PSQLSCRIPT"
}There is a bit other related code that is needed (for my specific usage) but this is the core of it. Use psql variables to capture the contents of each file into a variable and then just perform a normal insert (specifically, a VALUES (...), (...) variant). Since you can intermix psql and SQL you basically output a bloody long script, that has memory issues at scale - but you can divide and conquer - and then "psql --file bloody_long_script_part_1_of_100000.psql".David J.
В списке pgsql-general по дате отправления: