Обсуждение: Fastest import?
Greetings, What is the fastest way to import data from an ASCII file? I have ~17,000 records in a .DBF file that I need to get into my pg6.4.2 database. I have written a program that reads the .DBF file, massages the data and writes it out to a text file, one record per line. I initially had my program issue the insert commands instead of writing them to a text file, but the import took about 2 hours! We are only talking about 4Meg worth of data here, it should not take that long should it? I was hoping that once in a text file that psql < text.file would be faster, but it is not. Is there any way to speed this up? I was also looking at the copy command and maybe using the binary mode, but writing out binary data is not my idea of a good time and I'm on a deadline here. Is there a program out there that might take a text file of insert statements and convert them to a binary format that copy can use? Any insight would be greatly appreciated. Thanks, Matthew Hagerty
> What is the fastest way to import data from an ASCII file? I have ~17,000 > records in a .DBF file that I need to get into my pg6.4.2 database. I have > written a program that reads the .DBF file, massages the data and writes it > out to a text file, one record per line. Funny, I just reinvented the wheel too on Friday. > I initially had my program issue the insert commands instead of writing > them to a text file, but the import took about 2 hours! Did you surround the inserts with "BEGIN WORK" and "COMMIT"? That is rumoured to help - I only had 6000 rows, so it wasn't a problem. Re the text file, are you using "INSERT" or better "COPY"? Cheers, Patrick
On Sat, 10 Apr 1999, Matthew Hagerty wrote:
> Greetings,
>
> What is the fastest way to import data from an ASCII file? I have ~17,000
> records in a .DBF file that I need to get into my pg6.4.2 database. I have
> written a program that reads the .DBF file, massages the data and writes it
> out to a text file, one record per line.
>
> I initially had my program issue the insert commands instead of writing
> them to a text file, but the import took about 2 hours! We are only
> talking about 4Meg worth of data here, it should not take that long should
> it? I was hoping that once in a text file that psql < text.file would be
> faster, but it is not. Is there any way to speed this up?
>
> I was also looking at the copy command and maybe using the binary mode, but
> writing out binary data is not my idea of a good time and I'm on a deadline
> here. Is there a program out there that might take a text file of insert
> statements and convert them to a binary format that copy can use?
>
> Any insight would be greatly appreciated.
Why use binary? Just get your program to output Tab delimited ASCII
data, and use ASCII COPY.
Tip: Don't create your indices before the copy. It will load faster
without them. Once the data's in there, then create them.
Peter
--
Peter T Mount peter@retep.org.uk
Main Homepage: http://www.retep.org.uk
PostgreSQL JDBC Faq: http://www.retep.org.uk/postgres
Java PDF Generator: http://www.retep.org.uk/pdf
Matthew Hagerty wrote:
>
> Greetings,
>
> What is the fastest way to import data from an ASCII file? I have ~17,000
> records in a .DBF file that I need to get into my pg6.4.2 database. I have
> written a program that reads the .DBF file, massages the data and writes it
> out to a text file, one record per line.
I do that straight under linux by reading the .dbf file with a small
C-program which writes the data out in tab-separated records. Those
can be read with the copy from command:
/* __________ db2u __________
convert dbase file
Usage:
db2u < dbase-file.dbf > unix_database
or
db2u dbase-file.dbf > unix_database
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXFIELDS 100 /* maximum nr of fields */
void main(int argc, char *argv[])
{ char *p,*q,*r;
typedef char string11[12];
FILE *in;
int i,j,n,field_pos[MAXFIELDS+1],nfield=0,pos,lrec;
char *record,field_type[MAXFIELDS];
string11 field_name[MAXFIELDS];
struct
{ char version; /* dBase version */
char datum[3]; /* date of last update YYMMDD */
long nrec; /* nr of records in th database file */
short nhead; /* nr of bytes in the header */
short lrec; /* nr of bytes in the record */
char dum1[2];
char transact; /* flag for incomplete transaction */
char encrypt; /* encryption flag */
char lan[12]; /* reserved for network applications */
char mdx; /* flag for presence of mdx-file */
char dum2[3];
} header;
struct
{ char name[11]; /* fieldname zero filled */
char type; /* field type (C, D, F, L, M, N) */
char dum1[4];
unsigned char length; /* field length in binary */
char ndec; /* field decimal count in binary */
char dum2[2];
char workid; /* work area ID */
char dum3[11];
} field;
switch(argc)
{ case 1: in=stdin;
break;
case 2: if ((in = fopen(argv[1],"rb")) == NULL)
{ fprintf(stderr,"Input file %s not found\n",argv[1]);
exit(1);
}
break;
default: fprintf(stderr,"Usage: %s dbase_file > unix_database\n",argv[0]);
fprintf(stderr,"Or: cat dbase_file | %s > unix_database\n",argv[0]);
exit(1);
}
fread(&header,32,1,in);
n=header.lrec;
pos=1;
while (n>1)
{ if (nfield==MAXFIELDS)
{ fprintf(stderr,"Too many fields\n");
exit(1);
}
fread(&field,32,1,in);
strcpy(field_name[nfield],field.name);
field_type[nfield]=field.type;
field_pos[nfield++]=pos;
pos+=(int)field.length;
n-=(int)field.length;
}
for (i=0;i<nfield;i++) printf("!%d-%s%c",i+1,field_name[i],i==nfield-1?'\n':'\t');
for (i=0;i<nfield;i++) printf( "!%s%c", field_name[i],i==nfield-1?'\n':'\t');
field_pos[nfield]=pos; /* one extra for length calculation of last field */
fread(&i,1,1,in); /*read field terminator */
if (i!=0xD)
{ fprintf(stderr,"Illegal field terminator; more than %d fields?\n",MAXFIELDS);
exit(1);
}
lrec=header.lrec;
record=malloc(lrec);
for (i=0;i<header.nrec;i++)
{ if (fread(record,lrec,1,in)!=1)
{ fprintf(stderr,"Error reading record %d\n",i+1);
exit(1);
}
if (*record!='*') /* if not a deleted record */
for (j=0;j<nfield;j++)
{
p=record+field_pos[j]; /* first char of the field */
q=record+field_pos[j+1]-1; /* last char of the field */
while (*p==' ' && p<=q) p++; /* remove leading spaces */
while (*q==' ' && p<=q) *(q--)='\0'; /* remove trailing spaces */
for (r=p;r<=q;r++) putchar(*r);
putchar(j==nfield-1?'\n':'\t');
}
}
}
--
Hartelijke groet, Wybo Dekker
___________________Servalys Analytical Chemistry Services__________________
wybo@servalys.hobby.nl | Deilsedijk 60 | tel +31-345-652164
www.hobby.nl/~servalys | 4158 CH Deil, The Netherlands | fax
+31-345-652383
AppGEN works nicely. Peter T Mount wrote: > On Sat, 10 Apr 1999, Matthew Hagerty wrote: > > > Greetings, > > > > What is the fastest way to import data from an ASCII file? I have ~17,000 > > records in a .DBF file that I need to get into my pg6.4.2 database. I have > > written a program that reads the .DBF file, massages the data and writes it > > out to a text file, one record per line. > > > > I initially had my program issue the insert commands instead of writing > > them to a text file, but the import took about 2 hours! We are only > > talking about 4Meg worth of data here, it should not take that long should > > it? I was hoping that once in a text file that psql < text.file would be > > faster, but it is not. Is there any way to speed this up? > > > > I was also looking at the copy command and maybe using the binary mode, but > > writing out binary data is not my idea of a good time and I'm on a deadline > > here. Is there a program out there that might take a text file of insert > > statements and convert them to a binary format that copy can use? > > > > Any insight would be greatly appreciated. > > Why use binary? Just get your program to output Tab delimited ASCII > data, and use ASCII COPY. > > Tip: Don't create your indices before the copy. It will load faster > without them. Once the data's in there, then create them. > > Peter > > -- > Peter T Mount peter@retep.org.uk > Main Homepage: http://www.retep.org.uk > PostgreSQL JDBC Faq: http://www.retep.org.uk/postgres > Java PDF Generator: http://www.retep.org.uk/pdf
Man libpq++ says it has 3 sample c++ programs at the end, by on my system they're not there. Does anybody know how I can get these sample files? My c++ skills are well, extremely limited, and examples would really help. Hugh Lawson Greensboro, North Carolina hglawson@nr.infi.net