Re: Bulkloading using COPY - ignore duplicates?
От | Lee Kindness |
---|---|
Тема | Re: Bulkloading using COPY - ignore duplicates? |
Дата | |
Msg-id | 15391.26134.348061.302157@elsick.csl.co.uk обсуждение исходный текст |
Ответ на | Re: Bulkloading using COPY - ignore duplicates? (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Bulkloading using COPY - ignore duplicates?
|
Список | pgsql-hackers |
Tom Lane writes:> Lee Kindness <lkindness@csl.co.uk> writes:> > You're right - I was meaning 'SELECT DISTINCT ON ()'. HoweverI'm only> > using it as an example of where the database is choosing (be it> > randomly) the data to discarded.> Nota good example to support your argument. The entire point of> DISTINCT ON (imho) is that the rows that are kept or discardedare> *not* random, but can be selected by the user by specifying additional> sort columns. DISTINCT ON would bepretty useless if it weren't for> that flexibility. The corresponding concept in COPY will need to> provide flexible meansfor deciding which row to keep and which to> drop, else it'll be pretty useless. At which point it becomes quicker to resort to INSERT... Here's the crux question - how can I get management to go with PostgreSQL when a core operation (import of data into a transient database) is at least 6 times slower than the current version? With a lot of work investigating the incoming data, the number of incoming duplicates has been massively reduced by fixing/tackling at source. However rouge values do still crop up (the data originates from a real-time system with multiple hardware inputs from multiple hardware vendors) and when they do (even just 1) the performance dies. Add to this terrabytes of legacy data... While you may see the option of ignoring duplicates in COPY as 'pretty useless', it obviously has its place/use otherwise every other database system wouldn't have support for it! (not that following the pack is always a good idea) In an ideal world 'COPY FROM' would only be used with data output by 'COPY TO' and it would be nice and sanitised. However in some fields this often is not a possibility due to performance constraints! Best regards, -- Lee Kindness, Senior Software Engineer, Concept Systems Limited.http://services.csl.co.uk/ http://www.csl.co.uk/
В списке pgsql-hackers по дате отправления: