Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system
От | Tomas Vondra |
---|---|
Тема | Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system |
Дата | |
Msg-id | 510F062F.1080001@fuzzy.cz обсуждение исходный текст |
Ответ на | Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system (Jeff Janes <jeff.janes@gmail.com>) |
Ответы |
Re: PATCH: Split stats file per database WAS: autovacuum
stress-testing our system
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system |
Список | pgsql-hackers |
On 3.2.2013 21:54, Jeff Janes wrote: > On Sat, Feb 2, 2013 at 2:33 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra <tv@fuzzy.cz> wrote: >>> On 3.1.2013 20:33, Magnus Hagander wrote: >>>> >>>> Yeah, +1 for a separate directory not in global. >>> >>> OK, I moved the files from "global/stat" to "stat". >> >> This has a warning: >> >> pgstat.c:5132: warning: 'pgstat_write_statsfile_needed' was used with >> no prototype before its definition >> >> I plan to do some performance testing, but that will take a while so I >> wanted to post this before I get distracted. > > Running "vacuumdb -a" on a cluster with 1000 db with 200 tables (x > serial primary key) in each, I get log messages like this: > > last_statwrite 23682-06-18 22:36:52.960194-07 is later than > collector's time 2013-02-03 12:49:19.700629-08 for db 16387 > > Note the bizarre year in the first time stamp. > > If it matters, I got this after shutting down the cluster, blowing > away $DATA/stat/*, then restarting it and invoking vacuumdb. I somehow expected that hash_search zeroes all the fields of a new entry, but looking at pgstat_get_db_entry that obviously is not the case. So stats_timestamp (which tracks timestamp of the last write for a DB) was random - that's where the bizarre year values came from. I've added a proper initialization (to 0), and now it works as expected. Although the whole sequence of errors I was getting was this: LOG: last_statwrite 11133-08-28 19:22:31.711744+02 is later than collector's time 2013-02-04 00:54:21.113439+01 for db 19093 WARNING: pgstat wait timeout LOG: last_statwrite 39681-12-23 18:48:48.9093+01 is later than collector's time 2013-02-04 00:54:31.424681+01 for db 46494 FATAL: could not find block containing chunk 0x2af4a60 LOG: statistics collector process (PID 10063) exited with exit code 1 WARNING: pgstat wait timeout WARNING: pgstat wait timeout I'm not entirely sure where the FATAL came from, but it seems it was somehow related to the issue - it was quite reproducible, although I don't see how exactly could this happen. There relevant block of code looks like this: char *writetime; char *mytime; /* Copy because timestamptz_to_str returns a static buffer */ writetime = pstrdup(timestamptz_to_str(dbentry->stats_timestamp)); mytime = pstrdup(timestamptz_to_str(cur_ts)); elog(LOG, "last_statwrite %s is later than collector's time %s for " "db %d", writetime, mytime, dbentry->databaseid); pfree(writetime); pfree(mytime); which seems quite fine to mee. I'm not sure how one of the pfree calls could fail? Anyway, attached is a patch that fixes all three issues, i.e. 1) the un-initialized timestamp 2) the "void" ommited from the signature 3) rename to "pg_stat" instead of just "stat" Tomas
Вложения
В списке pgsql-hackers по дате отправления: