Re: How to load data from CSV into a table that has array types in its columns?
От | Adrian Klaver |
---|---|
Тема | Re: How to load data from CSV into a table that has array types in its columns? |
Дата | |
Msg-id | 8830ee61-cf09-7ba6-0055-efe7a910ac59@aklaver.com обсуждение исходный текст |
Ответ на | How to load data from CSV into a table that has array types in its columns? (Siddharth Jain <siddhsql@gmail.com>) |
Список | pgsql-general |
On 10/26/22 17:30, Siddharth Jain wrote: > Hello, > > Given a long list like this: > > 1,2 > 2,4 --> 2 appears once > 7,9 > 8,9 > 5,3 > 2,5 --> note 2 appears twice > > I want to load it into this table: > > create table tbl ( > id integer primary key, > fids integer[] > ) > > so we will have 2 -> [4,5] where 2 is id and [4,5] are the fids > > My actual dataset is very large and has 100M rows in it. How can it be > efficiently loaded into postgres? > > I think I have to use a program for this and am trying to use the pg > library that comes with Node.js. I am reading the data in batches of 1M > or 100k rows for example. I have created a dictionary in Node.js where I > am storing the hashmap. The part where I am stuck is how to generate the > SQL command? > > 1. I want to make one call to the server for the batch of 1M rows, not > 1M calls > 2. Some keys in the hashmap might already exist in the database. For > these keys we want to append to the array > 3. Some keys will not exist and for these we want to insert new rows > > Can someone help me please? I am using Postgres for the first time. The best way to deal with importing large batches of data is to use COPY https://www.postgresql.org/docs/current/sql-copy.html But that will not play well with modifying the data as you input it. What I can see doing is: 1) COPY the data into a staging table: create staging_tbl (id integer, fid integer) 2) Then using SQL statements to move the data to the final table. As example of one possibility, using ON CONFLICT from here: https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT insert into table tbl select id, array[fid] from staging_table on conflict(id) DO UPDATE SET fids = array_append(fids, excluded.fid); I would test with a smaller example data set to vetify. > > -- Adrian Klaver adrian.klaver@aklaver.com
В списке pgsql-general по дате отправления: