Обсуждение: Re: [DEFAULT] Daily digest v1.4346 (20 messages)

Поиск
Список
Период
Сортировка

Re: [DEFAULT] Daily digest v1.4346 (20 messages)

От
Josh Berkus
Дата:
Matthew,

I am replying to the below as a pg_autovacuum user for multiple client
databases.   My thoughts:

> Inability to customize thresholds on a per table basis

This hasn't been a big problem for me.   I would judge that 80% of my clients
would make no use of this feature.

> Inability to set default thresholds on a per database basis

This would be much more useful to us.

> Inability to exclude specific databases / tables from pg_autovacuum
monitoring

Same as above -- exclusion is much more needed than incremental raising/
lowering.   Of course, if one can set levels, one can set them to zero, so
perhaps it is the same thing.

> Inability to schedule vacuums during off-peak times

I don't think that this is the job of pg_autovacuum.   If a database requires
bulk loads and other burst activity, the DBA should schedule manual vacuums
around those and not use pg_autovacuum.

Also, bgwriter and slow vacuum should make this less of an issue for 7.5.

> Lack of integration related to startup and shutdown

Yes, this is a pain, especially from a security standpoint.

> Ignorance of VACUUM and ANALYZE operations performed outside pg_autovacuum
(requires backend integration? or can listen / notify can be used?)

Again, I think this is not crucial, personally.   Nice if there's some easy
way to do it, of course.

> Lack of logging options / syslog integration / log rotation options

Yep, this is a biggie.

Now, let me add my comments as to what my clients have complained about:

-- Lack of integrated security with the Postmaster
-- Inability to detect VACUUMs "backing up" due to too low vacuum mem or too
much activity and warn the DBA
-- Inability to Vacuum in parallel on high-capacity machines
-- No "timeout" for locked vacuums.

> I'm not sure how to address all of these concerns, or that they all should
be addressed right now.  One of my big questions is backend integration.  

I concur with the other commentors; backend integration would be nice if
pg_autovacuum is not to be permanently a seperate script/process.   It would
eliminate several of the above issues.

> Since many people do not like tools that clutter their databases by adding
tables, I think option 1 (adding a pg_autovacuum table to existing databases)
is right out.  

Personally, I like the idea of a pg_autovacuum table, and would supporrt it.
However, I have no strong objections to the other approaches.

> Right now pg_autovacuum has no memory of what was going on the last time it
was run.  So if significant changes have happened while pg_autovacuum is not
running, they will not be counted in the analysis of when to perform a vacuum
or analyze operation which can result in under vacuuming.  So, pg_autovacuum
should occasionally write down it's numbers to the database.  The data  will
be stored in an additional table called table_data

I think we've already had feedback about this.  If it's system information, it
should go in one of the existing tables, or it should be called something
more descriptive than "table_data", and should begin with pg_

Some consideraiton should also be given to the frequency of updating the
persistent data.   I would favor an asynchnous, infrequent updating that
would permit some loss of information over a synchrnous lossless approach.
The latter, while more accurate, would detract from server performance on
high-volume transction databases.

> 3.Single-Pass Mode (External Scheduling):
>
> I have received requests to be able to run pg_autovacuum only on request
(not as a daemon) making only one pass over all the tables (not looping
indefinately).  The advantage being that it will operate more like the
current vacuum command except that it will only vacuum tables that need to be
vacuumed.

I think this is a completely different utility from pg_autovacuum, and this
line of development need not be pursued unless it's easy to do.    I
certainly don't need it ....

> Syslog support.  I'm not sure this is really needed, but a simple patch was
submitted by one user and perhaps that can be reviewed / improved and
applied.
>

I need it, and am glad to hear there is a patch.   Several of my clients use
centralized syslog servers, and do *everything* through syslog.

--
-Josh BerkusAglio Database SolutionsSan Francisco



Re: [DEFAULT] Daily digest v1.4346 (20 messages)

От
"Matthew T. O'Connor"
Дата:
Josh Berkus wrote:

>>Inability to customize thresholds on a per table basis
>>    
>>
>
>This hasn't been a big problem for me.   I would judge that 80% of my clients 
>would make no use of this feature.
>
>  
>
Ok.

>>Inability to set default thresholds on a per database basis
>>    
>>
>
>This would be much more useful to us.
>
>  
>
So interesting, most uses request the per table  settings, guess there 
is sufficient demand for both.

>>Inability to schedule vacuums during off-peak times
>>    
>>
>
>I don't think that this is the job of pg_autovacuum.   If a database requires 
>bulk loads and other burst activity, the DBA should schedule manual vacuums 
>around those and not use pg_autovacuum.
>
>  
>
You might be missing the point, the advantage of using pg_autovacuum is 
that it wouldn't waste cycles doing vacuums on tables that don't need 
it.  If we have persistent data (saving state information periodically) 
then this is a very easy feature to add.

>>Lack of integration related to startup and shutdown
>>    
>>
>
>Yes, this is a pain, especially from a security standpoint.
>
>  
>
Yes, backend integration will make this go away.

>>Ignorance of VACUUM and ANALYZE operations performed outside pg_autovacuum 
>>    
>>
>(requires backend integration? or can listen / notify can be used?)
>
>Again, I think this is not crucial, personally.   Nice if there's some easy 
>way to do it, of course.
>
>  
>
What I'm thinking is that the VACUUM command could be modified to write 
down some data from the stats system at vacuum time.  Once the VACUUM 
command writes this down for itself then pg_autovacuum just uses that 
number to make its decision.  Again, we are trying to reduce as much as 
possible superfluous vacuums.  If an admin vacuums his whole cluster 
every Sunday night that may prevent lots of  vacuums occurring during 
business hours that effect processing.

>>Lack of logging options / syslog integration / log rotation options
>>    
>>
>
>Yep, this is a biggie.
>
>  
>
Agreed.  This is another issue that could be solved with backend 
integration.

>Now, let me add my comments as to what my clients have complained about:
>
>-- Lack of integrated security with the Postmaster
>-- Inability to detect VACUUMs "backing up" due to too low vacuum mem or too 
>much activity and warn the DBA
>-- Inability to Vacuum in parallel on high-capacity machines
>-- No "timeout" for locked vacuums.
>
>  
>
Backend integration should solve the 1st issue.  Parallel vacuums is 
something that could be worked on at some point.  Would it  make sense 
to incorporate this with tablespaces?  The vacuum daemon would only 
issue one vacuum command per tablespace, but could issue as many 
parallel vacuums as you have independent tablespaces.

I think timeout issue would need to be a part of vacuum proper, and I'm 
not sure about the "backing up" issue.

>>Since many people do not like tools that clutter their databases by adding 
>>    
>>
>tables, I think option 1 (adding a pg_autovacuum table to existing databases) 
>is right out.  
>
>Personally, I like the idea of a pg_autovacuum table, and would supporrt it.   
>However, I have no strong objections to the other approaches.
>
>  
>
I think I was unclear, I agree the creation of a pg_autovacuum system 
table is fine (if we really need it), but my initial post was talking 
about keeping pg_autovacuum as a client app, hence the autovacuum table 
would be added into (clutter up) the users table space, not the systems.

>I think we've already had feedback about this.  If it's system information, it 
>should go in one of the existing tables, or it should be called something 
>more descriptive than "table_data", and should begin with pg_
>
>  
>
I wasn't really suggesting table_data as the real name, but again this 
will be more straightforward once integrated in.

>Some consideraiton should also be given to the frequency of updating the 
>persistent data.   I would favor an asynchnous, infrequent updating that 
>would permit some loss of information over a synchrnous lossless approach.  
>The latter, while more accurate, would detract from server performance on 
>high-volume transction databases.
>
>  
>
Agreed, the performance impact of this should be negligible.

>>3.Single-Pass Mode (External Scheduling):
>>
>>I have received requests to be able to run pg_autovacuum only on request 
>>    
>>
>I think this is a completely different utility from pg_autovacuum, and this 
>line of development need not be pursued unless it's easy to do.    I 
>certainly don't need it ....
>
>  
>
The reason it's similar is that once pg_autovacuum data is persistent, 
it would be trivial to implement this feature, and the data that any 
tool would need to make these decisions is the same as what 
pg_autovacuum is already tracking.

>>Syslog support.  I'm not sure this is really needed, but a simple patch was 
>>    
>>
>I need it, and am glad to hear there is a patch.   Several of my clients use 
>centralized syslog servers, and do *everything* through syslog.
>
>  
>
I think the patch was submitted to either the hackers or patches list.  
If you can't find it, I'll look around and see if I still have a copy.  
The person who submitted said it was simple, but was working for him in 
production.





Re: [DEFAULT] Daily digest v1.4346 (20 messages)

От
"Matthew T. O'Connor"
Дата:
Josh Berkus wrote:

>>Syslog support.  I'm not sure this is really needed, but a simple patch was 
>>    
>>
>submitted by one user and perhaps that can be reviewed / improved and 
>applied.
>  
>
>
>I need it, and am glad to hear there is a patch.   Several of my clients use 
>centralized syslog servers, and do *everything* through syslog.
>
>  
>
FYI,  I just checked, I think it was sent to the hackers lists on Jan 
21, 2004 by Mark Hollow.


Re: pg_autovacuum

От
Josh Berkus
Дата:
Matt,

> So interesting, most uses request the per table  settings, guess there 
> is sufficient demand for both.

The reason for us is that multi-database installations generally have 
significantly different purposes for each database; for example, with a 
"reporting" database on the same server that needs no vacuuming at all.

> You might be missing the point, the advantage of using pg_autovacuum is 
> that it wouldn't waste cycles doing vacuums on tables that don't need 
> it.  If we have persistent data (saving state information periodically) 
> then this is a very easy feature to add.

OK, I can see that.

> What I'm thinking is that the VACUUM command could be modified to write 
> down some data from the stats system at vacuum time.  Once the VACUUM 
> command writes this down for itself then pg_autovacuum just uses that 
> number to make its decision.  Again, we are trying to reduce as much as 
> possible superfluous vacuums.  If an admin vacuums his whole cluster 
> every Sunday night that may prevent lots of  vacuums occurring during 
> business hours that effect processing.

Be nice, yes.    However, my experience is that mixing manual and autovacuums 
is bound to lead to endless support requests, because conflicts *will* arise.   
So in some ways you'd be working to please those who can't be pleased.

> Backend integration should solve the 1st issue.  Parallel vacuums is 
> something that could be worked on at some point.  Would it  make sense 
> to incorporate this with tablespaces?  The vacuum daemon would only 
> issue one vacuum command per tablespace, but could issue as many 
> parallel vacuums as you have independent tablespaces.

Hmmm ... that's an interesting idea.    I'd more been thinking about vacuums 
of small tables, where a high-end server under low load could vacuum several 
tables in parallel, one per CPU.   However, working through tablespaces would 
make a lot of sense.

> I think timeout issue would need to be a part of vacuum proper, and I'm 
> not sure about the "backing up" issue.

Well, we've discussed timeout for vacuum.   

Thing is, autovacuum changes the equation somewhat.   Imagine that the 
transaction rate of your tables accelerates so that autovacuum with a 0.3 
scale setting is triggered every 23 minutes.   But say that it takes 29 
minutes to vacuum through all of your tables ... or even 49 minutes if you 
have "slow vacuum" turned on!

You would get into a cycle where you are running vacuum continuously, all the 
time.  This is a very bad situation and the admin should be warned about it 
via the logs.

Hmmm ... thinkiing about that, are we changing the defaults for threshold and 
scale?  You and I have discussed this, yes?

> The reason it's similar is that once pg_autovacuum data is persistent, 
> it would be trivial to implement this feature, and the data that any 
> tool would need to make these decisions is the same as what 
> pg_autovacuum is already tracking.

Well, if it's easy to do, then go for it.   I can see how some would find it 
useful.   Once it's sufficently bulletproof, it could replace the standard 
VACUUM (whole db).

> I think the patch was submitted to either the hackers or patches list.  
> If you can't find it, I'll look around and see if I still have a copy.  
> The person who submitted said it was simple, but was working for him in 
> production.

Thanks for the forward.

-- 
-Josh BerkusAglio Database SolutionsSan Francisco



Re: pg_autovacuum

От
"Matthew T. O'Connor"
Дата:
On Tuesday 23 March 2004 12:32 am, Josh Berkus wrote:
> Matt,
>
> > What I'm thinking is that the VACUUM command could be modified to write
> > down some data from the stats system at vacuum time.  Once the VACUUM
> > command writes this down for itself then pg_autovacuum just uses that
> > number to make its decision.  Again, we are trying to reduce as much as
> > possible superfluous vacuums.  If an admin vacuums his whole cluster
> > every Sunday night that may prevent lots of  vacuums occurring during
> > business hours that effect processing.
>
> Be nice, yes.    However, my experience is that mixing manual and
> autovacuums is bound to lead to endless support requests, because conflicts
> *will* arise. So in some ways you'd be working to please those who can't be
> pleased.

Could you please expound on this.  What types of conflicts in what situations?  
Maybe I'm missing something, but I really don't see how this could cause any 
problems, and would only help prevent unneeded vacuums since pg_autovacuum 
would always know the stats as of the last vacuum no matter who issued it, 
because the vacuum itself would write it down.

> Hmmm ... that's an interesting idea.    I'd more been thinking about
> vacuums of small tables, where a high-end server under low load could
> vacuum several tables in parallel, one per CPU.   However, working through
> tablespaces would make a lot of sense.

The reason tablespaces might make sense is that vacuum is I/O bound, not CPU 
bound.  So having 4 processors isn't going to make multiple vacuums go any 
faster.  Now you might have enough I/O bandwidth such that multiple 
simultaneous vacuums are actually faster that serialized.  You could probably 
test this yourself, I would be curious to see some data on this.

> You would get into a cycle where you are running vacuum continuously, all
> the time.  This is a very bad situation and the admin should be warned
> about it via the logs.

Finding a way to log this would be good. 

> Hmmm ... thinkiing about that, are we changing the defaults for threshold
> and scale?  You and I have discussed this, yes?

Right now pg_autovacuum doesn't change the thresholds, but it does change the 
sleep time.  Just like the vacuum and analyze command options there is a 
sleep base value and a sleep scaling factor.  The purpose was to make sure 
that pg_autovacuum was never continuously issuing vacuum commands.  I think 
that if pg_autovacuum is vacuuming too often then the admin needs to tune and 
turn it down.


Matthew