Re: automated row deletion
От | John R Pierce |
---|---|
Тема | Re: automated row deletion |
Дата | |
Msg-id | 4ACCC951.80303@hogranch.com обсуждение исходный текст |
Ответ на | Re: automated row deletion (Dave Huber <DHuber@letourneautechnologies.com>) |
Ответы |
Re: automated row deletion
|
Список | pgsql-general |
Dave Huber wrote: > > A colleague gave me the following query to run: > > > > DELETE FROM data_log_20msec_table WHERE (log_id IN (SELECT log_id FROM > data_log_20msec_table ORDER BY log_id DESC OFFSET 10000000)) > > ... > > This query keeps the most recent 10 million rows and deletes the > remaining ones. If I call this once a minute, it would be deleting > 3000 rows each time. Is there a way to optimize this statement? > Postgres was setup with default configuration. Is there anything we > can change in the configuration to make this run more efficiently? The > table is defined as below: > > ... > > Is there anything we can do here that can optimize the deletion of rows? > > > as I previously wrote... I think you'll find row deletes would kill your performance. For time aged data like that, we use partitioned tables, we typically do it by the week (keeping 6 months of history), but you might end up doing it by N*1000 PK values or some such, so you can use your PK to determine the partition. With a partitioning scheme, its much faster to add a new one and drop the oldest at whatever interval you need. See http://www.postgresql.org/docs/current/static/ddl-partitioning.html based on the numbers you give above, I think I'd do it by 100000 log_id values, so you'd end up with 101 partition tables, and every half hour or so you'd truncate the oldest partition and start a new one (reusing the previously oldest in a round robin fashion). truncate is 1000s of times faster than delete.
В списке pgsql-general по дате отправления: