Обсуждение: More vacuum stats

Поиск
Список
Период
Сортировка

More vacuum stats

От
Magnus Hagander
Дата:
I noticed that we were already tracking the information about when an
autovacuum worker was last started in a database, but this information
was not exposed. The attached patch puts this column in
pg_stat_database.

Was there any particular reason why this wasn't exposed before that
I've missed, making this a bad addition? :-)

Oh, and this time, I *have* included updates to the regression tests.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Вложения

Re: More vacuum stats

От
Euler Taveira de Oliveira
Дата:
Magnus Hagander escreveu:
> Was there any particular reason why this wasn't exposed before that
> I've missed, making this a bad addition? :-)
> 
Not that I know of. Good catch. ;)


--  Euler Taveira de Oliveira http://www.timbira.com/


Re: More vacuum stats

От
Tom Lane
Дата:
Magnus Hagander <magnus@hagander.net> writes:
> I noticed that we were already tracking the information about when an
> autovacuum worker was last started in a database, but this information
> was not exposed. The attached patch puts this column in
> pg_stat_database.

> Was there any particular reason why this wasn't exposed before that
> I've missed, making this a bad addition? :-)

I think that's an implementation detail.  If we expose it then we'll
be forced to track it forevermore, regardless of whether the AV launcher
actually needs it in the future.  (In particular, the assumption that
this is tracked per-database and not per-something-else seems like an
artifact of the current AV launching algorithm.)

So I'd like to see a positive argument why this is important for users
to know, rather than merely "we should expose every conceivable detail
by default".  Why wouldn't a user care more about last AV time for a
specific table, which we already do expose?
        regards, tom lane


Re: More vacuum stats

От
Magnus Hagander
Дата:
On Sun, Aug 22, 2010 at 17:29, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> I noticed that we were already tracking the information about when an
>> autovacuum worker was last started in a database, but this information
>> was not exposed. The attached patch puts this column in
>> pg_stat_database.
>
>> Was there any particular reason why this wasn't exposed before that
>> I've missed, making this a bad addition? :-)
>
> I think that's an implementation detail.  If we expose it then we'll
> be forced to track it forevermore, regardless of whether the AV launcher
> actually needs it in the future.  (In particular, the assumption that
> this is tracked per-database and not per-something-else seems like an
> artifact of the current AV launching algorithm.)

That's a good point. OTOH, if we removed the feature, it seems it
would be reasonable to remove the column from the statistics view as
well. That *could* happen in other stats views as well.

> So I'd like to see a positive argument why this is important for users
> to know, rather than merely "we should expose every conceivable detail
> by default".  Why wouldn't a user care more about last AV time for a
> specific table, which we already do expose?

You need to connect to every database to do that. If you have many
databases, that's a lot of overhead particularly if you're doing tihs
for regular monitoring. Plus, those views will only track when
autovacuum actually *did* something.

Being able to see that autovacuum hasn't even touched a database for
too long would be an early-indicator that you have some issues with
it.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: More vacuum stats

От
Tom Lane
Дата:
Magnus Hagander <magnus@hagander.net> writes:
> On Sun, Aug 22, 2010 at 17:29, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So I'd like to see a positive argument why this is important for users
>> to know, rather than merely "we should expose every conceivable detail
>> by default". �Why wouldn't a user care more about last AV time for a
>> specific table, which we already do expose?

> You need to connect to every database to do that. If you have many
> databases, that's a lot of overhead particularly if you're doing tihs
> for regular monitoring. Plus, those views will only track when
> autovacuum actually *did* something.

Well, the last-launch-time doesn't prove that autovacuum actually *did*
something ;-).

> Being able to see that autovacuum hasn't even touched a database for
> too long would be an early-indicator that you have some issues with
> it.

With the current AV launch algorithm, unless you have very serious
system-wide issues there will be a worker launched into each database
approximately every autovacuum_naptime seconds.  AFAICS this does not
tell you anything interesting about whether AV is getting its work done.
        regards, tom lane


Re: More vacuum stats

От
Magnus Hagander
Дата:
On Sun, Aug 22, 2010 at 18:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> On Sun, Aug 22, 2010 at 17:29, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> So I'd like to see a positive argument why this is important for users
>>> to know, rather than merely "we should expose every conceivable detail
>>> by default".  Why wouldn't a user care more about last AV time for a
>>> specific table, which we already do expose?
>
>> You need to connect to every database to do that. If you have many
>> databases, that's a lot of overhead particularly if you're doing tihs
>> for regular monitoring. Plus, those views will only track when
>> autovacuum actually *did* something.
>
> Well, the last-launch-time doesn't prove that autovacuum actually *did*
> something ;-).

Well, it would tell you it considered doing something ;)


>> Being able to see that autovacuum hasn't even touched a database for
>> too long would be an early-indicator that you have some issues with
>> it.
>
> With the current AV launch algorithm, unless you have very serious
> system-wide issues there will be a worker launched into each database
> approximately every autovacuum_naptime seconds.  AFAICS this does not
> tell you anything interesting about whether AV is getting its work done.

Well, if you have all your autovacuum workers tied up with vacuuming
large tables, then it wouldn't AFAIK. I'm not sure if that counts as
your "very serious system-wide issues", but it's certainly a case
that's interesting for the admin to know about.

But thinking more about that, you ca nfigure that out with a SELECT
count(*) FROM pg_stat_activity WHERE current_query LIKE 'autovacuum:
%' if I'm not mistaken.

It can also be used to find out if the launcher is somehoiw stuck, but
that would be a bug and we don't generally put counters in the stats
views to expose possible bugs, only to track interesting statistics.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: More vacuum stats

От
Tom Lane
Дата:
Magnus Hagander <magnus@hagander.net> writes:
> On Sun, Aug 22, 2010 at 18:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> With the current AV launch algorithm, unless you have very serious
>> system-wide issues there will be a worker launched into each database
>> approximately every autovacuum_naptime seconds. �AFAICS this does not
>> tell you anything interesting about whether AV is getting its work done.

> Well, if you have all your autovacuum workers tied up with vacuuming
> large tables, then it wouldn't AFAIK. I'm not sure if that counts as
> your "very serious system-wide issues", but it's certainly a case
> that's interesting for the admin to know about.

> But thinking more about that, you ca nfigure that out with a SELECT
> count(*) FROM pg_stat_activity WHERE current_query LIKE 'autovacuum:
> %' if I'm not mistaken.

> It can also be used to find out if the launcher is somehoiw stuck, but
> that would be a bug and we don't generally put counters in the stats
> views to expose possible bugs, only to track interesting statistics.

Yeah.  Given the current worker-launch algorithm, these times just don't
strike me as all that interesting in practice.  If we were to change to
a different algorithm, it's possible that it'd become worthwhile to
expose them --- but it's equally possible that some other data would be
useful instead.  So my feeling remains that we should leave well enough
alone.
        regards, tom lane


Re: More vacuum stats

От
Greg Smith
Дата:
Tom Lane wrote:
> So I'd like to see a positive argument why this is important for users
> to know, rather than merely "we should expose every conceivable detail
> by default".  Why wouldn't a user care more about last AV time for a
> specific table, which we already do expose?
>   

What I actually want here is for the time that the last table autovacuum 
started, adding to the finish time currently exposed by 
pg_stat_user_tables.  "How long did the last {auto}vacuum on <x> take to 
run?" is a FAQ on busy systems here.  If I could compute that from a 
pair of columns, it's a major step toward answering even more 
interesting questions like "how does this set of cost delay parameters 
turn into an approximate MB/s worth of processing rate on my tables?".  
This is too important of a difficult tuning exercise to leave to log 
scraping forever.

I'd rather have that and look at for "SELECT max(last_autovacuum_start) 
FROM pg_stat_user_tables" to diagnose the sort of problems this patch 
seems to aim at helping.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: More vacuum stats

От
Magnus Hagander
Дата:
On Mon, Aug 23, 2010 at 16:28, Greg Smith <greg@2ndquadrant.com> wrote:
> Tom Lane wrote:
>>
>> So I'd like to see a positive argument why this is important for users
>> to know, rather than merely "we should expose every conceivable detail
>> by default".  Why wouldn't a user care more about last AV time for a
>> specific table, which we already do expose?
>>
>
> What I actually want here is for the time that the last table autovacuum
> started, adding to the finish time currently exposed by pg_stat_user_tables.
>  "How long did the last {auto}vacuum on <x> take to run?" is a FAQ on busy
> systems here.  If I could compute that from a pair of columns, it's a major
> step toward answering even more interesting questions like "how does this
> set of cost delay parameters turn into an approximate MB/s worth of
> processing rate on my tables?".  This is too important of a difficult tuning
> exercise to leave to log scraping forever.

Now, that would be quite useful. That'd require another stats message,
since we don't send anything on autovacuum start, but I don't think
the overhead of that is anything we need to worry about - in
comparison to an actual vacuum...

Do we want that for both vacuum and autovacuum? vacuum and analyze?

We could also store last_autovacuum_vacuum_duration - is that better
or worse than start and end time?


> I'd rather have that and look at for "SELECT max(last_autovacuum_start) FROM
> pg_stat_user_tables" to diagnose the sort of problems this patch seems to
> aim at helping.

Agreed. Consider this patch withdrawn.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: More vacuum stats

От
Tom Lane
Дата:
Magnus Hagander <magnus@hagander.net> writes:
> On Mon, Aug 23, 2010 at 16:28, Greg Smith <greg@2ndquadrant.com> wrote:
>> What I actually want here is for the time that the last table autovacuum
>> started, adding to the finish time currently exposed by pg_stat_user_tables.

> Now, that would be quite useful. That'd require another stats message,
> since we don't send anything on autovacuum start, but I don't think
> the overhead of that is anything we need to worry about - in
> comparison to an actual vacuum...

No, you wouldn't really need an extra message, you could just send both
start and finish times in the completion message.  I'm not sure that
having last start time update before last end time would be a good idea
anyway.

But in any case it's true that an extra message wouldn't be a
significant cost.  What I'd be more concerned about is the stats table
bloat from adding yet another per-table field.  That could be a lot of
space on an installation with lots of tables.

> We could also store last_autovacuum_vacuum_duration - is that better
> or worse than start and end time?

No, I think you want to know the actual time not only the duration.
        regards, tom lane


Re: More vacuum stats

От
Magnus Hagander
Дата:
On Mon, Aug 23, 2010 at 16:38, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> On Mon, Aug 23, 2010 at 16:28, Greg Smith <greg@2ndquadrant.com> wrote:
>>> What I actually want here is for the time that the last table autovacuum
>>> started, adding to the finish time currently exposed by pg_stat_user_tables.
>
>> Now, that would be quite useful. That'd require another stats message,
>> since we don't send anything on autovacuum start, but I don't think
>> the overhead of that is anything we need to worry about - in
>> comparison to an actual vacuum...
>
> No, you wouldn't really need an extra message, you could just send both
> start and finish times in the completion message.  I'm not sure that
> having last start time update before last end time would be a good idea
> anyway.

Hmm, good point. We'd just need an extra field in that message.


> But in any case it's true that an extra message wouldn't be a
> significant cost.  What I'd be more concerned about is the stats table
> bloat from adding yet another per-table field.  That could be a lot of
> space on an installation with lots of tables.
>
>> We could also store last_autovacuum_vacuum_duration - is that better
>> or worse than start and end time?
>
> No, I think you want to know the actual time not only the duration.

Well, you could calculate one from the other - especially if one takes
less size, per your comment above.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: More vacuum stats

От
Tom Lane
Дата:
Magnus Hagander <magnus@hagander.net> writes:
> On Mon, Aug 23, 2010 at 16:38, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Magnus Hagander <magnus@hagander.net> writes:
>>> We could also store last_autovacuum_vacuum_duration - is that better
>>> or worse than start and end time?
>> 
>> No, I think you want to know the actual time not only the duration.

> Well, you could calculate one from the other - especially if one takes
> less size, per your comment above.

With alignment considerations, adding a field is going to cost 8 bytes;
whether it's a timestamp or a duration isn't going to matter.  I'd be
inclined to store the timestamp, it just seems more like the base datum.
        regards, tom lane