hi , i hava a table: CREATE TABLE public.cdr_ama_stat ( id int4 NOT NULL DEFAULT nextval('cdr_ama_stat_id_seq'::regclass), abonado_a varchar(30) NULL, abonado_b varchar(30) NULL, fecha_llamada timestamp NULL, duracion int4 NULL, puerto_a varchar(4) NULL, puerto_b varchar(4) NULL, tipo_llamada char(1) NULL, processed int4 NULL, PRIMARY KEY(id) ) GO CREATE INDEX kpi_fecha_llamada ON public.cdr_ama_stat(fecha_llamada)
there should be unique values for abonado_a, abonado_b, fecha_llamada, duracion in every row, googling around i found how to delete duplicates in postgresonline site , so i run the following query (lets say i want to know how many duplicates exists for 2004-04-18, before delete them):
SELECT * FROM cdr_ama_stat WHERE id NOT IN (SELECT MAX(dt.id) FROM cdr_ama_stat As dt WHERE dt.fecha_llamada BETWEEN '2009-04-18' AND '2009-04-18'::timestamp + INTERVAL '1 day' GROUP BY dt.abonado_a, dt.abonado_b,dt.fecha_llamada,dt.duracion) AND fecha_llamada BETWEEN '2009-04-18' AND '2009-04-18'::timestamp + INTERVAL '1 day'
my problem is that the query take forever, number of rows: kpi=# select count(*) from cdr_ama_stat; count --------- 5908065 (1 row)
this the explain result on the above query:
QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using kpi_fecha_llamada on cdr_ama_stat (cost=115713.94..79528582.40 rows=140809 width=50) Index Cond: ((fecha_llamada >= '2009-04-18 00:00:00'::timestamp without time zone) AND (fecha_llamada <= '2009-04-19 00:00:00'::timestamp without time zone)) Filter: (NOT (subplan)) SubPlan -> Materialize (cost=115713.94..116202.56 rows=28162 width=34) -> GroupAggregate (cost=110902.49..115478.78 rows=28162 width=34) -> Sort (cost=110902.49..111606.53 rows=281618 width=34) Sort Key: dt.abonado_a, dt.abonado_b, dt.fecha_llamada, dt.duracion -> Bitmap Heap Scan on cdr_ama_stat dt (cost=8580.03..70970.30 rows=281618 width=34) Recheck Cond: ((fecha_llamada >= '2009-04-18 00:00:00'::timestamp without time zone) AND (fecha_llamada <= '2009-04-19 00:00:00'::timestamp without time zone)) -> Bitmap Index Scan on kpi_fecha_llamada (cost=0.00..8509.62 rows=281618 width=0) Index Cond: ((fecha_llamada >= '2009-04-18 00:00:00'::timestamp without time zone) AND (fecha_llamada <= '2009-04-19 00:00:00'::timestamp without time zone))
am i doing something wrong? I think several minutes should be ok, but not several hours as happens now, i do a bulk load (350k rows aprox) in that table every day, and yes, i did vacuum full analyze the database before posting this results. Server is Dual Xeon 3.0 Ghx, 2Gb RAM