Re: Capacity questions
От | Federico Di Gregorio |
---|---|
Тема | Re: Capacity questions |
Дата | |
Msg-id | 4FDEFD13.2040206@dndg.it обсуждение исходный текст |
Ответ на | Capacity questions (Bill House <wch-tech@house-grp.net>) |
Ответы |
Re: Capacity questions
|
Список | psycopg |
On 17/06/12 17:32, Bill House wrote: > I have a written a script which does what I want it to do on a table > with 999 records. > > I was wondering if there is anything I need to consider if I run this > script on a table with 1.1 million records? > > The purpose of the script is to flag records representing files if they > are redundant (duplicate md5sum). > > Later this table will be used to move or remove the files represented by > the flagged records. > > The script performs the following actions > > 1 resets the del column to false for all records > > 2 creates a temporary table consisting of aggregate records of > identical md5sums which count more than 1. > > 3 iterates down the cursor and with this information > > 4 updates the del column to True on all records in a batch of > identical md5sums except for one. If your recno column is a primary key, you can do this with a single SQL UPDATE, without the initial update, temporary table or psycopg (with the right indices this is blazing fast even on a gazzilion rows): UPDATE files_test SET del = T.n > 1 FROM (SELECT recno, row_number() OVER (PARTITION BY md5sum) AS n FROM files_test) T WHERE files_test.recno = T.recno; Hope this helps, federico (that *loves* UPDATE/FROM and window functions) -- Federico Di Gregorio federico.digregorio@dndg.it Studio Associato Di Nunzio e Di Gregorio http://dndg.it - Ma cos'ha il tuo pesce rosso, l'orchite? - Si, ha un occhio solo, la voce roca e mangia gli altri pesci.
В списке psycopg по дате отправления: