Re: Capacity questions
От | Bill House |
---|---|
Тема | Re: Capacity questions |
Дата | |
Msg-id | 4FDFE2B2.5050405@house-grp.net обсуждение исходный текст |
Ответ на | Re: Capacity questions (Federico Di Gregorio <fog@dndg.it>) |
Список | psycopg |
On 06/18/2012 05:04 AM, Federico Di Gregorio wrote: > On 17/06/12 17:32, Bill House wrote: >> I have a written a script which does what I want it to do on a table >> with 999 records. >> >> I was wondering if there is anything I need to consider if I run this >> script on a table with 1.1 million records? >> >> The purpose of the script is to flag records representing files if they >> are redundant (duplicate md5sum). >> >> Later this table will be used to move or remove the files represented by >> the flagged records. >> >> The script performs the following actions >> >> 1 resets the del column to false for all records >> >> 2 creates a temporary table consisting of aggregate records of >> identical md5sums which count more than 1. >> >> 3 iterates down the cursor and with this information >> >> 4 updates the del column to True on all records in a batch of >> identical md5sums except for one. > If your recno column is a primary key, you can do this with a single SQL > UPDATE, without the initial update, temporary table or psycopg (with the > right indices this is blazing fast even on a gazzilion rows): > > UPDATE files_test SET del = T.n > 1 > FROM (SELECT recno, > row_number() OVER (PARTITION BY md5sum) AS n > FROM files_test) T > WHERE files_test.recno = T.recno; > > Hope this helps, > > federico (that *loves* UPDATE/FROM and window functions) > Thanks for the input all. Part of the purpose of this little project is for me to learn about both Python and Postgresql; I know very little about either. That is what brings me to psycopg2. I have read in the documents it is recommend that a programmer avoiding parameter substitution by any means other than the one supplied by psycopg2. I understand the reasons and will comply if I create code for public access. I appreciate the opportunity to study the alternative more efficient approach offered above. As I said, I am new to SQL and have much to learn. Thanks again, Bill
В списке psycopg по дате отправления: