Re: efficient data reduction (and deduping)
От | Claudio Freire |
---|---|
Тема | Re: efficient data reduction (and deduping) |
Дата | |
Msg-id | CAGTBQpY-e-TTnd-+7wWeKZ8ecYjdjqeRt9LRN15toxG0To_o4g@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: efficient data reduction (and deduping) (Claudio Freire <klaussfreire@gmail.com>) |
Список | pgsql-performance |
On Thu, Mar 1, 2012 at 4:39 PM, Claudio Freire <klaussfreire@gmail.com> wrote: >> Interesting solution. If I'm not mistaken, this does solve the problem of >> having two entries for the same user at the exact same time (which violates >> my pk constraint) but it does so by leaving both of them out (since there is >> no au1.hr_timestamp > au2.hr_timestamp in that case). Is that right? > > Yes, but it would have to be same *exact* time (not same hour). > > You can use more fields to desambiguate too, ie: > > au1.hr_timestamp > au2.hr_timestamp or (au1.hr_timestamp == > au2.hr_timestamp and au1.some_other_field > au2.some_other_field) > > If you have a sequential id to use in desambiguation, it would be best. Sorry for double posting - but you can also *generate* such an identifier: create sequence temp_seq; with identified_au as ( select nextval('temp_seq') as id, * from hourly_activity ) INSERT INTO hourly_activity SELECT ... everything from au1 ... FROM identified_au au1 LEFT JOIN identified_au au2 ON au2.user_id = au1.user_id AND date_trunc('hour', au2.hr_timestamp) = date_trunc('hour', au1.hr_timestamp) AND au2.hr_timestamp < au1.hr_timestamp OR (au2.hr_timestamp = au1.hr_timestamp AND au2.id < au1.id) WHERE au2.user_id is null; Should work if you have 9.x
В списке pgsql-performance по дате отправления: