Обсуждение: Details about pg_stat_bgwriter

Поиск

Список

Период

Сортировка

Details about pg_stat_bgwriter

От

Thomas Kellerer

Дата:

08 июня 2010 г., 12:34:27

Hi,

I'm trying to understand the information shown in pg_stat_bgwriter.

If I want to identify a possible bottleneck with the bgwriter, for what kind of numbers do I need to watch out?

Are the following values OK, or do they indicate that I need to tune my configuration?

checkpoints_timed  : 6099
checkpoints_req    : 291
buffers_checkpoint : 2767641
buffers_clean      : 4522
maxwritten_clean   : 1
buffers_backend    : 609933
buffers_alloc      : 1411947

I have read Greg's description (http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm), but to be honest I
don'tunderstand the explanation. He mentions a "LRU background writer", but I'm not sure to which column that relates. 

Thanks in advance.
Thomas

Re: Details about pg_stat_bgwriter

От

Greg Smith

Дата:

08 июня 2010 г., 14:30:51

Thomas Kellerer wrote:
> If I want to identify a possible bottleneck with the bgwriter, for
> what kind of numbers do I need to watch out?
You don't much with a single snapshot of pg_stat_bgwriter data.  Try
saving this instead:

select *,now() from pg_stat_bgwriter;

And then take another snapshot at least a few hours later, preferably
the next day.  With two snapshots and timestamps on them, then it's
possible to make some sense of the numbers.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Details about pg_stat_bgwriter

От

Thomas Kellerer

Дата:

08 июня 2010 г., 14:46:49

Greg Smith wrote on 08.06.2010 19:30:
> Thomas Kellerer wrote:
>> If I want to identify a possible bottleneck with the bgwriter, for
>> what kind of numbers do I need to watch out?
> You don't much with a single snapshot of pg_stat_bgwriter data. Try
> saving this instead:
>
> select *,now() from pg_stat_bgwriter;
>
> And then take another snapshot at least a few hours later, preferably
> the next day. With two snapshots and timestamps on them, then it's
> possible to make some sense of the numbers.
>
Thanks for the answer.

After taking a few snapshots, what would I look for?

Regards
Thomas

Re: Details about pg_stat_bgwriter

От

Greg Smith

Дата:

09 июня 2010 г., 02:14:45

Greg Smith wrote:
> You don't much with a single snapshot of pg_stat_bgwriter data.  Try
> saving this instead:
> select *,now() from pg_stat_bgwriter;
> And then take another snapshot at least a few hours later, preferably
> the next day.  With two snapshots and timestamps on them, then it's
> possible to make some sense of the numbers.

I probably should have explained the next part.  I've now shared what I
do with this information at
http://www.pgcon.org/2010/schedule/events/218.en.html

Basically, if you put the data from the two snapshots into one of the
Statistics Spreadsheet versions, you'll get several derived numbers that
pop out:

-Average checkpoint frequency
-Average size of each checkpoint
-Average rate at which new buffers are allocated
-Average rate of writes out of the buffer cache
-Percentage of writes done by checkpoints, the background writer LRU
cleaner, and client backends

These are the sort of things you can actually think about in useful
real-world terms.  And if you tune the database by doing things like
increasing checkpoint_segments/checkpoint_timeout or changing the size
of shared_buffers, you can see what impact that has on the overall
system response, by noting how it changes these numeric measures.  It's
a bit more exact of an approach for tuning such things than what people
have traditionally done here.  Increasing shared_buffers and watching
the total writes/second number drop significantly is more exact than
just nothing whether the server waiting for I/O percentage dropped or not.

I also highly recommend installing pg_buffercache and looking at what's
inside your buffer cache too--the data actually being cached by the
server never fails to surprise people, relative to what they expected
was being cached.  Source for how to analyze those things is in the
pg_buffercache samples shell script, the "Buffer contents summary, with
percentages" is the one that gives useful output.  One of the thing I
did in the presentation that does not come through on the slides is note
how the distribution of high usage count data in your buffer cache
determines whether you will benefit from making it large or not.  I've
reduced this to a single figure of merit now:  "cumulative % of buffers
that have a usage count of >=2".  On a system that benefits from having
a big buffer cache, that number will be quite large (86% in the first of
the samples in the spreadsheet).  On one that doesn't, it will be small
(35% on the last system listed there).  You really have to tune those
two types of workload differently.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Details about pg_stat_bgwriter

От

Scott Marlowe

Дата:

09 июня 2010 г., 03:02:32

On Tue, Jun 8, 2010 at 11:14 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> Greg Smith wrote:
>>
>> You don't much with a single snapshot of pg_stat_bgwriter data.  Try
>> saving this instead:
>> select *,now() from pg_stat_bgwriter;
>> And then take another snapshot at least a few hours later, preferably the
>> next day.  With two snapshots and timestamps on them, then it's possible to
>> make some sense of the numbers.
>
> I probably should have explained the next part.  I've now shared what I do
> with this information at
> http://www.pgcon.org/2010/schedule/events/218.en.html
>
> Basically, if you put the data from the two snapshots into one of the
> Statistics Spreadsheet versions, you'll get several derived numbers that pop
> out:
>
> -Average checkpoint frequency
> -Average size of each checkpoint
> -Average rate at which new buffers are allocated
> -Average rate of writes out of the buffer cache
> -Percentage of writes done by checkpoints, the background writer LRU
> cleaner, and client backends

I think you get all or most of that if you just log checkpoints.

Re: Details about pg_stat_bgwriter

От

Greg Smith

Дата:

09 июня 2010 г., 03:45:25

Scott Marlowe wrote:
>> -Average checkpoint frequency
>> -Average size of each checkpoint
>> -Average rate at which new buffers are allocated
>> -Average rate of writes out of the buffer cache
>> -Percentage of writes done by checkpoints, the background writer LRU
>> cleaner, and client backends
>>
>
> I think you get all or most of that if you just log checkpoints.
>

If you turned on log_checkpoints, and you went through a whole stack of
logs summarizing the data it writes out, you can derive the first two of
those from it.  I happen to think that sampling two data points and
pasting into a spreadsheet is less hassle to deal with, and the data is
definitely there by default.  You can answer questions like "what's the
biggest checkpoint I've seen?" from the stuff in the logs a bit easier
than from pg_stat_bgwriter.

After for the rest, checkpoints are one of the three possible ways that
buffers can be written out.  The other two are the background writer's
cleaning and client backends.  log_checkpoints gives you the size of one
of those sources of writes.  You can also compute an estimated
proportion between the three types of writes from a single snapshot of
pg_stat_bgwriter data.  But that doesn't help once you've reached the
point where you want to change something and measure how response
changes afterwards.  That requires more regularly sampling the data, so
you have a delta between two times.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Details about pg_stat_bgwriter

От

Thomas Kellerer

Дата:

09 июня 2010 г., 04:01:04

Greg Smith, 09.06.2010 07:14:
> I probably should have explained the next part. I've now shared what I
> do with this information at
> http://www.pgcon.org/2010/schedule/events/218.en.html
>
> Basically, if you put the data from the two snapshots into one of the
> Statistics Spreadsheet versions, you'll get several derived numbers that
> pop out:
>
> -Average checkpoint frequency
> -Average size of each checkpoint
> -Average rate at which new buffers are allocated
> -Average rate of writes out of the buffer cache
> -Percentage of writes done by checkpoints, the background writer LRU
> cleaner, and client backends
>
> These are the sort of things you can actually think about in useful
> real-world terms. And if you tune the database by doing things like
> increasing checkpoint_segments/checkpoint_timeout or changing the size
> of shared_buffers, you can see what impact that has on the overall
> system response, by noting how it changes these numeric measures. It's a
> bit more exact of an approach for tuning such things than what people
> have traditionally done here. Increasing shared_buffers and watching the
> total writes/second number drop significantly is more exact than just
> nothing whether the server waiting for I/O percentage dropped or not.
>
> I also highly recommend installing pg_buffercache and looking at what's
> inside your buffer cache too--the data actually being cached by the
> server never fails to surprise people, relative to what they expected
> was being cached. Source for how to analyze those things is in the
> pg_buffercache samples shell script, the "Buffer contents summary, with
> percentages" is the one that gives useful output. One of the thing I did
> in the presentation that does not come through on the slides is note how
> the distribution of high usage count data in your buffer cache
> determines whether you will benefit from making it large or not. I've
> reduced this to a single figure of merit now: "cumulative % of buffers
> that have a usage count of >=2". On a system that benefits from having a
> big buffer cache, that number will be quite large (86% in the first of
> the samples in the spreadsheet). On one that doesn't, it will be small
> (35% on the last system listed there). You really have to tune those two
> types of workload differently.

Thanks a lot for this detailed information!

I don't think we have a caching issue though, it's just one (regular) bulk import that does some heavy writes which we
needto tune. Read performance is absolutely fine, but everything around vacuum and checkpointing seems to slow down the
importmassively. 

So would taking a snapshot before and after this import be the right way to start the analysis?

Regards
Thomas

Re: Details about pg_stat_bgwriter

От

Greg Smith

Дата:

09 июня 2010 г., 04:10:04

Thomas Kellerer wrote:
> I don't think we have a caching issue though, it's just one (regular)
> bulk import that does some heavy writes which we need to tune. Read
> performance is absolutely fine, but everything around vacuum and
> checkpointing seems to slow down the import massively.
> So would taking a snapshot before and after this import be the right
> way to start the analysis?

Sure.  Might want to grab one at those two points, then at a point right
before the next big import happens, so that you have what a relatively
idle period looks like for comparison sake.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Details about pg_stat_bgwriter