Re: Import large data set into a table and resolve duplicates?
От | Eugene Dzhurinsky |
---|---|
Тема | Re: Import large data set into a table and resolve duplicates? |
Дата | |
Msg-id | 20150215173644.GB4901@devbox обсуждение исходный текст |
Ответ на | Re: Import large data set into a table and resolve duplicates? (John McKown <john.archie.mckown@gmail.com>) |
Ответы |
Re: Import large data set into a table and resolve duplicates?
|
Список | pgsql-general |
On Sun, Feb 15, 2015 at 10:00:50AM -0600, John McKown wrote: > UPDATE patch_data SET already_exists=((SELECT TRUE FROM dictionary WHERE > dictionary.series = patch_data.series)); Since the "dictionary" already has an index on the "series", it seems that patch_data doesn't need to have any index here. > At this point, the table patch_data has been updated such that if the > series data in it already exists, the "already_exists" column is now TRUE > instead of the initial FALSE. This means that we need to insert all the > series data in "patch_data" which does not exist in "dictionary" ( i.e. > "already_exists" is FALSE in "patch_data") into "dictionary". > > INSERT INTO dictionary(series) SELECT series FROM patch_data WHERE > already_exists = FALSE; At this point "patch_data" needs to get an index on "already_exists = false", which seems to be cheap. > UPDATE patch_data SET id=((SELECT id FROM dictionary WHERE > dictionary.series = patch_data.series)); No index needed here except the existing one on "dictionary". That looks really promising, thank you John! I need only one index on the "patch_data" table, and I will re-use the existing index on the "dictionary". Thanks again! -- Eugene Dzhurinsky
Вложения
В списке pgsql-general по дате отправления: