Hi,
I want to get some statistical information from people having experience
with vacuuming big tables:
I was a little bit confused why some vacuums take much much longer then
other and I was doing some tests with a test database. All tests have
been made with PostgreSQL 8.0.3. At all I made three tests with some
table structures and indexes which are problematic in some databases I
have made. I took the structure of the tables, anonymized the columns
and filled up the table with random data generated by a script.
In this test scenario the tables are filled up with data and NO CHANGES
are made to the table, because I just wanted to find out the time it
takes to just go through the table and indexes for vacuum.
First table
-----------
CREATE TABLE "public"."tbl1" (
"id" BIGSERIAL,
"col1" BIGINT NOT NULL,
"col2" VARCHAR NOT NULL,
"col3" BIGINT NOT NULL,
"col4" INTEGER NOT NULL,
"col5" VARCHAR NOT NULL,
"col6" TIMESTAMP WITHOUT TIME ZONE NOT NULL,
"col7" "public"."tsvector",
"col8" VARCHAR NOT NULL,
"col9" BOOLEAN NOT NULL,
"col10" BOOLEAN NOT NULL,
"col11" "public"."tsvector",
"col12" INTEGER,
CONSTRAINT "tbl1_pkey" PRIMARY KEY("id")
) WITH OIDS;
CREATE INDEX "idx_tbl1_col1" ON "public"."tbl1"
USING btree ("col1");
CREATE INDEX "idx_tbl1_col11" ON "public"."tbl1"
USING gist ("col11");
CREATE INDEX "idx_tbl1_col8" ON "public"."tbl1"
USING btree ("col8");
CREATE INDEX "idx_tbl1_col7" ON "public"."tbl1"
USING gist ("col7");
CREATE INDEX "idx_tbl1_col3" ON "public"."tbl1"
USING btree ("col3");
=> Filled up with 400.000 rows
LOG: duration: 148187.934 ms statement: VACUUM VERBOSE tbl1;
Second table
------------
CREATE TABLE "public"."tbl2" (
"id" BIGSERIAL,
"col1" BIGINT,
"col2" VARCHAR NOT NULL,
"col3" INTEGER NOT NULL,
"col4" BIGINT NOT NULL,
"col5" TIMESTAMP WITHOUT TIME ZONE,
"col6" BOOLEAN NOT NULL,
"col7" BIGINT,
CONSTRAINT "tbl2_pkey" PRIMARY KEY("id")
) WITH OIDS;
CREATE INDEX "idx_tbl2_col7" ON "public"."tbl2"
USING btree ("col7");
CREATE INDEX "idx_tbl2_col1" ON "public"."tbl2"
USING btree ("col1");
=> Filled up with 6.000.000 rows
LOG: duration: 138934.027 ms statement: VACUUM VERBOSE tbl2;
Third table
-----------
CREATE TABLE "public"."tbl3" (
"id" BIGSERIAL,
"col1" BIGINT NOT NULL,
"col2" BIGINT,
"col3" BIGINT NOT NULL,
"col4" VARCHAR,
"col5" VARCHAR,
"col6" INTEGER,
"col7" TIMESTAMP WITHOUT TIME ZONE,
CONSTRAINT "tbl3_pkey" PRIMARY KEY("id")
) WITH OIDS;
CREATE INDEX "idx_tbl3_col2" ON "public"."tbl3"
USING btree ("col2");
CREATE INDEX "idx_tbl3_col3" ON "public"."tbl3"
USING btree ("col3");
CREATE INDEX "idx_tbl3_col1_col3" ON "public"."tbl3"
USING btree ("col1", "col3");
=> Filled up with 100.000.000
Vacuum could not be performed - took longer then 24h and was aborted
then by me manually.
Postgresql.conf
---------------
listen_addresses = '*'
shared_buffers = 32768
maintenance_work_mem = 1048576
max_stack_depth = 81920
max_fsm_pages = 30000000
max_fsm_relations = 1000
vacuum_cost_delay = 0
bgwriter_percent = 0
bgwriter_maxpages = 0
fsync = false
enable_hashagg = true
enable_hashjoin = true
enable_indexscan = true
enable_mergejoin = true
enable_nestloop = true
enable_seqscan = true
enable_sort = true
enable_tidscan = true
cpu_tuple_cost = 0.01
cpu_index_tuple_cost = 0.001
cpu_operator_cost = 0.0025
effective_cache_size = 102400
geqo = false
default_statistics_target = 1
from_collapse_limit = 8
join_collapse_limit = 8
log_destination = 'stderr'
redirect_stderr = true
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_truncate_on_rotation = false
log_rotation_age = 1440
client_min_messages = log
log_min_messages = log
log_error_verbosity = default
log_min_duration_statement = 1
log_connections = false
log_disconnections = false
log_line_prefix = ''
stats_start_collector = true
stats_command_string = true
stats_block_level = true
stats_row_level = true
Problem
-------
In my opinion the VACUUM of the third table takes too long time. The
table is less complex then the other ones - so going on from the badest
case that every entry needs as long as for the most complex table (tbl1)
then I end up with this calculation:
148 seconds for 400.000 rows
= 2702 rows per second
100.000.000 rows = 37.009 seconds (=616 minutes, =10 hours)
But the VACUUM of the tbl3 needs more then 24h for beeing VACUUMed even
without changes.
Hardware + Software configuration
---------------------------------
SCSI RAID5 with usable space of 120GB
2 x Intel(R) Xeon(TM) CPU 3.20GHz
RAM: 4148840 kB (4GB)
Kernel: 2.6.12.2
PGDATA is running on a separate partition
There are running no other things on this server.
Question
--------
The duration of the VACUUM of tbl1 and tbl2 is okay - so how to speed up
now the VACUUM of tbl3?
--- Regards, Aldor