Обсуждение: Do we need vacuuming when tables are regularly dropped?
Hi, We have a number of automated performance tests (to test our own code) involving PostgreSQL. Test cases are supposed to drop and recreate tables each time they run. The problem is that some of the tests show a linear performance degradation overtime. (We have data for three months back in the past.) We have established that some element(s) of our test environment must be the culprit for the degradation. As rebooting the test machine didn't revert speeds to baselines recorded three months ago, we have turned our attention to the database as the only element of the environment which is persistent across reboots. Recreating the entire PGSQL cluster did cause speeds to revert to baselines. I understand that vacuuming solves performance problems related to "holes" in data files created as a result of tables being updated. Do I understand correctly that if tables are dropped and recreated at the beginning of each test case, holes in data files are reclaimed, so there is no need for vacuuming from a performance perspective? I will double check whether the problematic test cases do indeed always drop their tables, but assuming they do, are there any factors in the database (apart from table updates) that can cause a linear slow-down with repetitive tasks? Thanks Peter
PS: PGSQL version is: 8.2.7. (BTW, which catalog view contains the back-end version number?) On Mon, Sep 29, 2008 at 11:37 AM, Peter Kovacs <maxottovonstirlitz@gmail.com> wrote: > Hi, > > We have a number of automated performance tests (to test our own code) > involving PostgreSQL. Test cases are supposed to drop and recreate > tables each time they run. > > The problem is that some of the tests show a linear performance > degradation overtime. (We have data for three months back in the > past.) We have established that some element(s) of our test > environment must be the culprit for the degradation. As rebooting the > test machine didn't revert speeds to baselines recorded three months > ago, we have turned our attention to the database as the only element > of the environment which is persistent across reboots. Recreating the > entire PGSQL cluster did cause speeds to revert to baselines. > > I understand that vacuuming solves performance problems related to > "holes" in data files created as a result of tables being updated. Do > I understand correctly that if tables are dropped and recreated at the > beginning of each test case, holes in data files are reclaimed, so > there is no need for vacuuming from a performance perspective? > > I will double check whether the problematic test cases do indeed > always drop their tables, but assuming they do, are there any factors > in the database (apart from table updates) that can cause a linear > slow-down with repetitive tasks? > > Thanks > Peter >
"Peter Kovacs" <maxottovonstirlitz@gmail.com> writes: > We have a number of automated performance tests (to test our own code) > involving PostgreSQL. Test cases are supposed to drop and recreate > tables each time they run. > The problem is that some of the tests show a linear performance > degradation overtime. (We have data for three months back in the > past.) We have established that some element(s) of our test > environment must be the culprit for the degradation. As rebooting the > test machine didn't revert speeds to baselines recorded three months > ago, we have turned our attention to the database as the only element > of the environment which is persistent across reboots. Recreating the > entire PGSQL cluster did cause speeds to revert to baselines. What it sounds like to me is that you're not vacuuming the system catalogs, which are getting bloated with dead rows about all those dropped tables. regards, tom lane
On Mon, Sep 29, 2008 at 2:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Peter Kovacs" <maxottovonstirlitz@gmail.com> writes: >> We have a number of automated performance tests (to test our own code) >> involving PostgreSQL. Test cases are supposed to drop and recreate >> tables each time they run. > >> The problem is that some of the tests show a linear performance >> degradation overtime. (We have data for three months back in the >> past.) We have established that some element(s) of our test >> environment must be the culprit for the degradation. As rebooting the >> test machine didn't revert speeds to baselines recorded three months >> ago, we have turned our attention to the database as the only element >> of the environment which is persistent across reboots. Recreating the >> entire PGSQL cluster did cause speeds to revert to baselines. > > What it sounds like to me is that you're not vacuuming the system > catalogs, which are getting bloated with dead rows about all those > dropped tables. Wow, great! It is not immediately clear from the documentation, but the VACUUM command also deals with the system catalogs as well, correct? Thanks a lot! Peter > > regards, tom lane >
"Peter Kovacs" <maxottovonstirlitz@gmail.com> writes: > It is not immediately clear from the documentation, but the VACUUM > command also deals with the system catalogs as well, correct? If it's run without any argument by a superuser, then yes. (I think in recent versions we also allow a non-superuser database owner to do this.) regards, tom lane
Thank you! Peter On Mon, Sep 29, 2008 at 2:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Peter Kovacs" <maxottovonstirlitz@gmail.com> writes: >> It is not immediately clear from the documentation, but the VACUUM >> command also deals with the system catalogs as well, correct? > > If it's run without any argument by a superuser, then yes. > > (I think in recent versions we also allow a non-superuser database owner > to do this.) > > regards, tom lane >
>> What it sounds like to me is that you're not vacuuming the system >> catalogs, which are getting bloated with dead rows about all those >> dropped tables. >> > > Wow, great! > > It is not immediately clear from the documentation, but the VACUUM > command also deals with the system catalogs as well, correct? > > To expand on Tom's answer, rows in system tables are created not only for tables but for each column in the table, rules, indexes, etc. You can end up with a lot more row creation than you suspect. And temporary tables bloat the system tables just like regular tables. We discovered that cron scripts using temporary tables can cause very rapid system-table blotage. Cheers, Steve
On Mon, Sep 29, 2008 at 11:12 AM, Steve Crawford <scrawford@pinpointresearch.com> wrote: > >>> What it sounds like to me is that you're not vacuuming the system >>> catalogs, which are getting bloated with dead rows about all those >>> dropped tables. >>> >> >> Wow, great! >> >> It is not immediately clear from the documentation, but the VACUUM >> command also deals with the system catalogs as well, correct? >> >> > > To expand on Tom's answer, rows in system tables are created not only for > tables but for each column in the table, rules, indexes, etc. You can end > up with a lot more row creation than you suspect. And temporary tables bloat > the system tables just like regular tables. We discovered that cron scripts > using temporary tables can cause very rapid system-table blotage. Also, there was a time when you couldn't do vacuum full on system tables do to locking issues, and had to take the db down to single user mode to do so. Tom, is that still the case?
"Scott Marlowe" <scott.marlowe@gmail.com> writes: > Also, there was a time when you couldn't do vacuum full on system > tables do to locking issues, and had to take the db down to single > user mode to do so. There was a short period when *concurrent* vacuum fulls on just the wrong combinations of system catalogs could deadlock (because they both needed to look up stuff in the other one). AFAIK we fixed that. It's never been the case that it didn't work at all. regards, tom lane
Tom Lane wrote: > "Scott Marlowe" <scott.marlowe@gmail.com> writes: > >> Also, there was a time when you couldn't do vacuum full on system >> tables do to locking issues, and had to take the db down to single >> user mode to do so. >> > > There was a short period when *concurrent* vacuum fulls on just the > wrong combinations of system catalogs could deadlock (because they both > needed to look up stuff in the other one). AFAIK we fixed that. It's > never been the case that it didn't work at all. > > regards, tom lane > Never personally had trouble with vacuum full or reindex on system tables. CLUSTER, however, is another story. While I've never run across anything explicitly documenting that clustering system tables is forbidden, I've also never used a version of PostgreSQL that allows it (though I've never tried in single-user mode): postgres@[local]=> CLUSTER pg_class USING pg_class_oid_index ; ERROR: "pg_class" is a system catalog Should the docs (http://www.postgresql.org/docs/8.3/interactive/sql-cluster.html) be updated to note this restriction? Cheers, Steve
Steve Crawford <scrawford@pinpointresearch.com> writes: > postgres@[local]=> CLUSTER pg_class USING pg_class_oid_index ; > ERROR: "pg_class" is a system catalog I think the DB is probably protecting you from yourself here ;-). If memory serves there are some system indexes whose relfilenode numbers can't change, and pg_class_oid_index is one of them. If the CLUSTER had gone through you'd have hosed that database irretrievably. The protection check that is firing here is not so fine-grained as to know the difference between pg_class and catalogs that this might be safe for; but it does point up the moral that you need to know exactly what you're doing if you are going to do DDL stuff on the system catalogs. regards, tom lane
Tom Lane wrote: > Steve Crawford <scrawford@pinpointresearch.com> writes: > >> postgres@[local]=> CLUSTER pg_class USING pg_class_oid_index ; >> ERROR: "pg_class" is a system catalog >> > > I think the DB is probably protecting you from yourself here ;-). And elsewhere. :) I wasn't advocating for a change of behavior, just the addition of "Clustering is not permitted on system tables." to the documentation of the CLUSTER command. Cheers, Steve