Re: Time consuming process ...3 million records please help
От | Sean Davis |
---|---|
Тема | Re: Time consuming process ...3 million records please help |
Дата | |
Msg-id | 44289FEB.207@mail.nih.gov обсуждение исходный текст |
Ответ на | Re: Time consuming process ...3 million records please help (Srinivas Iyyer <srini_iyyer_bio@yahoo.com>) |
Ответы |
Re: Non-Unique intems
|
Список | pgsql-novice |
Srinivas Iyyer wrote: > Hi Sean, > thanks for your reply. in my temp table, > temp_refseq_id column cannot be indexed due to > duplicate values. will that effect drastically. > I havent tested it yet. on top of that i killed 8 hrs > process running previously on tab_update() function. > :-) Sri, Columns with duplicate values can be indexed; only if you declare an index to be "unique" will you then have a problem with inserting duplicates. You just need to do: 1. Create your temp table structure. 2. Copy in your data. 3. Create an index for all fields that show up in a "where" or join clause (THIS IS NECESSARY FOR ANY RDBMS TO WORK EFFICIENTLY!!!) create index temp_refseq_id_index on TABLENAME(temp_refseq_id); 4. Vacuum analyze the table(s) after copying and creating the index (thereby telling postgres what is in the table) vacuum analyze TABLENAME 5. Do something along the lines of: insert into .... select from .... That should do it, approximately. If you still have problems, then you will need to be more specific about what the problems are, I think. As an aside, I have similar data stored in a postgres database, but in some places on the order of several 10's of millions of records; postgres is not limiting in this respect. Sean > thanks > sri > > >> Hi, Sri. >> >> I would do this differently. You don't need the >> function I don't >> think. I would do something like this (untested): >> >> insert into spontanno (seq_id, spontanno_imageid, >> spontanno_genbankacc) >> select seqdump.seq_id, referrer_keys.temp_imageid, >> referrer_keys.temp.genbankacc from >> referrer_keys, seqdump >> where >> referrer_keys.temp_refseq_id=seqdump.seq_acc; >> >> Make sure there is an index on seq_acc and >> temp_refseq_id. This should >> be done in seconds to minutes, not hours. >> >> Sean > > - > > > > <snip> > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com
В списке pgsql-novice по дате отправления: