Обсуждение: Re: pg_stat_bgwriter
On Sun, Oct 13, 2019 at 06:27:35PM -0700, dangal wrote:
>Dear I would like to share with you to see what you think about the
>statistics of pg_stat_bgwriter
>
>postgres = # select * from pg_stat_bgwriter;
> checkpoints_timed | checkpoints_req | checkpoint_write_time |
>checkpoint_sync_time | buffers_checkpoint | buffers_clean | maxwritten_clean
>| buffers_backend | buffers_
>backend_fsync | buffers_alloc | stats_reset
>------------------- + ----------------- + ------------ ----------- +
>---------------------- + --------------- ----- + --------------- +
>------------------ + --------- -------- + ---------
>-------------- + --------------- + ------------------- ------------
> 338 | 6 | 247061792 | 89418 | 2939561 | 19872289 | 54876 |
>6015787 |
> 0 | 710682240 | 2019-10-06 19: 25: 30.688186-03
>(1 row)
>
>postgres = # show bgwriter_delay;
> bgwriter_delay
>----------------
> 200ms
>(1 row)
>
>postgres = # show bgwriter_lru_maxpages;
> bgwriter_lru_maxpages
>-----------------------
> 100
>(1 row)
>
>postgres = # show bgwriter_lru_multiplier;
> bgwriter_lru_multiplier
>-------------------------
> 2
>(1 row)
>
>
>Do you think it should increase bgwriter_lru_maxpages due to the value of
>maxwritten_clean?
>Do you think it should increase bgwriter_lru_maxpages,
>bgwriter_lru_multiplier, and decrease bgwriter_delay due to the value of
>buffers_backend compared to buffers_alloc?
>Do you think a modification is necessary?
>What values would you recommend?
buffers_alloc does not really matter, here, IMO. You need to compare
buffers_checkpoint, buffers_backend and buffers_clean, and ideally you'd
have (checkpoints > clean > backend). In your case it's already
buffers_checkpoint | buffers_clean | buffers_backend
2939561 | 19872289 | 6015787
You could make bgwriter even more aggressive, but that's unlikely to be
a huge improvement. You should investigate why buffers_checkpoint is so
low. This is usually a sign of shared_buffers being too small for the
active set, so perhaps you need to increase shared_buffers, or see which
queries are causing this and optimize them.
Note: FWIW, a single snapshot of pg_stats* may be misleading, because
it's cumulative, so it's not clear how accurately it reflects current
state. Next time take two snapshots and subtract them.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thanks a lot, always helping
I attached a snapshot that I take every 12 hours of the pg_stat_bgwriter
select now,buffers_checkpoint,buffers_clean, buffers_backend from
pg_stat_bgwriter_snapshot;
now | buffers_checkpoint | buffers_clean |
buffers_backend
-------------------------------+--------------------+---------------+-----------------
2019-10-07 12:00:01.312067-03 | 288343 | 1182944 |
520101
2019-10-08 00:00:02.034129-03 | 475323 | 3890772 |
975454
2019-10-08 12:00:01.500756-03 | 616154 | 4774924 |
1205261
2019-10-09 00:00:01.520329-03 | 784840 | 7377771 |
1601278
2019-10-09 12:00:01.388113-03 | 1149560 | 8395288 |
2456249
2019-10-10 00:00:01.841054-03 | 1335747 | 11023014 |
2824740
2019-10-10 12:00:01.354555-03 | 1486963 | 11919462 |
2995211
2019-10-11 00:00:01.519538-03 | 1649066 | 14400593 |
3360700
2019-10-11 12:00:01.468203-03 | 1979781 | 15332086 |
4167663
2019-10-12 00:00:01.343714-03 | 2161116 | 17791871 |
4525957
2019-10-12 12:00:01.991429-03 | 2323194 | 18324723 |
5139418
2019-10-13 00:00:01.251191-03 | 2453939 | 19059149 |
5306894
2019-10-13 12:00:01.677379-03 | 2782606 | 19391676 |
5878981
2019-10-14 00:00:01.824249-03 | 2966021 | 19915346 |
6040316
2019-10-14 12:00:01.869126-03 | 3117659 | 20675018 |
6184214
I tell you that we have a server with 24 gb of ram and 6gb of shared_buffers
When you tell me that maybe I am running too low of shared_buffers, the
query I run to see what is happening is the following:
The first 10 are insert, update and an autovaccum
select calls, shared_blks_hit, shared_blks_read, shared_blks_dirtied
from pg_stat_statements
where shared_blks_dirtied> 0 order by shared_blks_dirtied desc
limit 10
calls | shared_blks_hit | shared_blks_read | shared_blks_dirtied
-----------+-----------------+------------------+---------------------
41526844 | 1524091324 | 74477743 | 40568348
22707516 | 1317743612 | 33153916 | 28106071
517309 | 539285911 | 24583841 | 24408950
23 | 23135504 | 187638126 | 15301103
11287105 | 383864219 | 18369813 | 13879956
2247661 | 275357344 | 9252598 | 6084363
13070036 | 244904154 | 5557321 | 5871613
54158879 | 324425993 | 5054200 | 4676472
24955177 | 125421833 | 5775788 | 4517367
142807488 | 14401507751 | 81965894 | 2661358
(10 filas)
Another query
SELECT pg_size_pretty(count(*) * 8192) as buffered,
round(100.0 * count(*) /
(SELECT setting FROM pg_settings WHERE name = 'shared_buffers')
::integer,
1) AS buffers_percent,
round(100.0 * count(*) * 8192 / pg_table_size(c.oid), 1) AS
percent_of_relation
FROM pg_class c
INNER JOIN pg_buffercache b
ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d
ON (b.reldatabase = d.oid AND d.datname = current_database())
GROUP BY c.oid, c.relname
ORDER BY 3 DESC LIMIT 10;
buffered buffers_percent percent_of_relation
3938 MB; 64.1; 53.2
479 MB; 7.8; 21.3
261 MB; 4.3; 99.3
163 MB; 2.6; 0.1
153 MB; 2.5; 6.7
87 MB; 1.4; 1.2
82 MB; 1.3; 81.6
65 MB; 1.1; 100.0
64 MB; 1.0; 0.1
53 MB; 0.9; 73.5
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html
On Mon, Oct 14, 2019 at 08:18:47PM +0200, Tomas Vondra wrote: > Note: FWIW, a single snapshot of pg_stats* may be misleading, because > it's cumulative, so it's not clear how accurately it reflects current > state. Next time take two snapshots and subtract them. For bonus points, capture it with timestamp and make RRD graphs. I took me awhile to get around to following this advice, but now I have 12+ months of history at 5 minute granularity across all our customers, and I've used my own implementation to track down inefficient queries being run periodically from cron, and notice other radical changes in writes/reads I recall seeing that the pgCluu project does this. http://pgcluu.darold.net/ Justin
On Mon, Oct 14, 2019 at 01:12:43PM -0700, dangal wrote: >Thanks a lot, always helping >I attached a snapshot that I take every 12 hours of the pg_stat_bgwriter > >select now,buffers_checkpoint,buffers_clean, buffers_backend from >pg_stat_bgwriter_snapshot; Please show us the deltas, i.e. subtract the preceding value (using a window function, or something). FWIW 12 hours may be a bit too coarse, but it's better than nothing. > now | buffers_checkpoint | buffers_clean | >buffers_backend >-------------------------------+--------------------+---------------+----------------- > 2019-10-07 12:00:01.312067-03 | 288343 | 1182944 | >520101 > 2019-10-08 00:00:02.034129-03 | 475323 | 3890772 | >975454 > 2019-10-08 12:00:01.500756-03 | 616154 | 4774924 | >1205261 > 2019-10-09 00:00:01.520329-03 | 784840 | 7377771 | >1601278 > 2019-10-09 12:00:01.388113-03 | 1149560 | 8395288 | >2456249 > 2019-10-10 00:00:01.841054-03 | 1335747 | 11023014 | >2824740 > 2019-10-10 12:00:01.354555-03 | 1486963 | 11919462 | >2995211 > 2019-10-11 00:00:01.519538-03 | 1649066 | 14400593 | >3360700 > 2019-10-11 12:00:01.468203-03 | 1979781 | 15332086 | >4167663 > 2019-10-12 00:00:01.343714-03 | 2161116 | 17791871 | >4525957 > 2019-10-12 12:00:01.991429-03 | 2323194 | 18324723 | >5139418 > 2019-10-13 00:00:01.251191-03 | 2453939 | 19059149 | >5306894 > 2019-10-13 12:00:01.677379-03 | 2782606 | 19391676 | >5878981 > 2019-10-14 00:00:01.824249-03 | 2966021 | 19915346 | >6040316 > 2019-10-14 12:00:01.869126-03 | 3117659 | 20675018 | >6184214 > >I tell you that we have a server with 24 gb of ram and 6gb of shared_buffers >When you tell me that maybe I am running too low of shared_buffers, the >query I run to see what is happening is the following: The question is how that compared to database size, and size of the active set (fraction of the database accessed by the application / queries). I suggest you also track & compute shared_buffers cache hit ratio. >The first 10 are insert, update and an autovaccum > >select calls, shared_blks_hit, shared_blks_read, shared_blks_dirtied > from pg_stat_statements > where shared_blks_dirtied> 0 order by shared_blks_dirtied desc > limit 10 > > > calls | shared_blks_hit | shared_blks_read | shared_blks_dirtied >-----------+-----------------+------------------+--------------------- > 41526844 | 1524091324 | 74477743 | 40568348 > 22707516 | 1317743612 | 33153916 | 28106071 > 517309 | 539285911 | 24583841 | 24408950 > 23 | 23135504 | 187638126 | 15301103 > 11287105 | 383864219 | 18369813 | 13879956 > 2247661 | 275357344 | 9252598 | 6084363 > 13070036 | 244904154 | 5557321 | 5871613 > 54158879 | 324425993 | 5054200 | 4676472 > 24955177 | 125421833 | 5775788 | 4517367 > 142807488 | 14401507751 | 81965894 | 2661358 >(10 filas) > Unfortunately, this has the same issue as the data you shared in the first message - it's a snapshot with data accumulated since the database was created. It's unclear whether the workload changed over time etc. But I guess you can use this to identify queries producing the most dirty buffers and maybe see if you can optimize that somehow (e.g. by removing unnecessary indexes or something). >Another query > >SELECT pg_size_pretty(count(*) * 8192) as buffered, > round(100.0 * count(*) / > (SELECT setting FROM pg_settings WHERE name = 'shared_buffers') > ::integer, > 1) AS buffers_percent, > round(100.0 * count(*) * 8192 / pg_table_size(c.oid), 1) AS >percent_of_relation > FROM pg_class c > INNER JOIN pg_buffercache b > ON b.relfilenode = c.relfilenode > INNER JOIN pg_database d > ON (b.reldatabase = d.oid AND d.datname = current_database()) > GROUP BY c.oid, c.relname > ORDER BY 3 DESC LIMIT 10; > >buffered buffers_percent percent_of_relation >3938 MB; 64.1; 53.2 >479 MB; 7.8; 21.3 >261 MB; 4.3; 99.3 >163 MB; 2.6; 0.1 >153 MB; 2.5; 6.7 >87 MB; 1.4; 1.2 >82 MB; 1.3; 81.6 >65 MB; 1.1; 100.0 >64 MB; 1.0; 0.1 >53 MB; 0.9; 73.5 > It's generally a good idea to explain what a query is supposed to do, instead of just leaving the users to figure that out. In any case, this is a snapshot at a particular moment in time, it's unclear how how that correlates to the activity. The fact that you've removed names of tables and even queries is does not really help either. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Tomas, restart the statistics and take 24-hour samples to see if you can
help me
24 gb server memory 6 gb sharred buffers
# select now,
# pg_size_pretty(buffers_checkpoint*8192)AS buffers_checkpoint,
# pg_size_pretty(buffers_clean*8192)AS buffers_clean,
# pg_size_pretty(buffers_backend*8192)AS buffers_backend,
#
(buffers_checkpoint*100)/(buffers_checkpoint+buffers_clean+buffers_backend)AS
buffers_checkpoint_pct,
# (buffers_clean*100)/(buffers_checkpoint+buffers_clean+buffers_backend)AS
buffers_clean_pct,
# (buffers_backend*100)/(buffers_checkpoint+buffers_clean+buffers_backend)AS
buffers_backend_pct,
# pg_size_pretty(buffers_checkpoint * 8192 /(checkpoints_timed +
checkpoints_req)) AS avg_checkpoint_write,
# pg_size_pretty(8192 *(buffers_checkpoint + buffers_clean +
buffers_backend)) AS total_write
# from pg_stat_bgwriter_snapshot
# ;
now | buffers_checkpoint | buffers_clean |
buffers_backend | buffers_checkpoint_pct | buffers_clean_pct |
buffers_backend_pct | avg_checkpoint_write | total_write
-------------------------------+--------------------+---------------+-----------------+------------------------+-------------------+---------------------+----------------------+-------------
2019-10-15 15:00:02.070105-03 | 33 MB | 1190 MB | 144 MB
| 2 | 87 | 10 | 33 MB
| 1367 MB
2019-10-15 16:00:01.477785-03 | 109 MB | 3543 MB | 393 MB
| 2 | 87 | 9 | 36 MB
| 4045 MB
2019-10-15 17:00:01.960162-03 | 179 MB | 6031 MB | 703 MB
| 2 | 87 | 10 | 36 MB
| 6913 MB
2019-10-15 18:00:01.558404-03 | 252 MB | 8363 MB | 1000
MB | 2 | 86 |
10 | 36 MB | 9615 MB
2019-10-15 19:00:01.170866-03 | 327 MB | 10019 MB | 1232
MB | 2 | 86 |
10 | 36 MB | 11 GB
2019-10-15 20:00:01.397473-03 | 417 MB | 11 GB | 1407
MB | 3 | 85 |
10 | 38 MB | 13 GB
2019-10-15 21:00:01.211047-03 | 522 MB | 12 GB | 1528
MB | 3 | 85 |
11 | 40 MB | 14 GB
2019-10-15 22:00:01.164853-03 | 658 MB | 12 GB | 1691
MB | 4 | 83 |
11 | 44 MB | 14 GB
2019-10-15 23:00:01.116564-03 | 782 MB | 13 GB | 1797
MB | 5 | 83 |
11 | 46 MB | 15 GB
2019-10-16 00:00:01.19203-03 | 887 MB | 13 GB | 2016
MB | 5 | 82 |
12 | 47 MB | 16 GB
2019-10-16 01:00:01.329851-03 | 1003 MB | 14 GB | 2104
MB | 5 | 81 |
12 | 48 MB | 17 GB
2019-10-16 02:00:01.518606-03 | 1114 MB | 14 GB | 2222
MB | 6 | 81 |
12 | 48 MB | 17 GB
2019-10-16 03:00:01.673498-03 | 1227 MB | 14 GB | 2314
MB | 6 | 80 |
12 | 49 MB | 18 GB
2019-10-16 04:00:01.936604-03 | 1354 MB | 15 GB | 2468
MB | 7 | 79 |
12 | 50 MB | 19 GB
2019-10-16 05:00:01.854888-03 | 1465 MB | 15 GB | 2518
MB | 7 | 79 |
13 | 51 MB | 19 GB
2019-10-16 06:00:01.804182-03 | 1585 MB | 15 GB | 2581
MB | 8 | 78 |
13 | 51 MB | 19 GB
2019-10-16 07:00:01.889345-03 | 1677 MB | 15 GB | 2649
MB | 8 | 78 |
13 | 51 MB | 20 GB
2019-10-16 08:00:01.248247-03 | 1756 MB | 16 GB | 2707
MB | 8 | 78 |
13 | 50 MB | 20 GB
2019-10-16 09:00:01.258408-03 | 1826 MB | 16 GB | 2763
MB | 8 | 78 |
13 | 49 MB | 21 GB
2019-10-16 10:00:01.418323-03 | 1881 MB | 17 GB | 2872
MB | 8 | 78 |
13 | 48 MB | 21 GB
2019-10-16 11:00:02.077084-03 | 1951 MB | 18 GB | 3140
MB | 8 | 78 |
13 | 48 MB | 23 GB
2019-10-16 12:00:01.83188-03 | 2026 MB | 20 GB | 3322
MB | 7 | 79 |
12 | 47 MB | 25 GB
2019-10-16 13:00:01.628877-03 | 2109 MB | 22 GB | 3638
MB | 7 | 79 |
12 | 47 MB | 28 GB
2019-10-16 14:00:02.351529-03 | 2179 MB | 24 GB | 3934
MB | 6 | 80 |
12 | 46 MB | 30 GB
(24 filas)
# SELECT
# sum(heap_blks_read) as heap_read,
# sum(heap_blks_hit) as heap_hit,
# sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
# FROM
# pg_statio_user_tables;
heap_read | heap_hit | ratio
-------------+---------------+------------------------
80203672248 | 4689023850651 | 0.98318308953328194824
(1 fila)
# SELECT
# sum(idx_blks_read) as idx_read,
# sum(idx_blks_hit) as idx_hit,
# (sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit) as ratio
# FROM
# pg_statio_user_indexes;
idx_read | idx_hit | ratio
------------+--------------+------------------------
3307622770 | 653969845259 | 0.99494223962468783241
(1 fila)
=# -- perform a "select pg_stat_reset();" when you want to reset counter
statistics
=# with
-# all_tables as
-# (
(# SELECT *
(# FROM (
(# SELECT 'all'::text as table_name,
(# sum( (coalesce(heap_blks_read,0) + coalesce(idx_blks_read,0) +
coalesce(toast_blks_read,0) + coalesce(tidx_blks_read,0)) ) as from_disk,
(# sum( (coalesce(heap_blks_hit,0) + coalesce(idx_blks_hit,0) +
coalesce(toast_blks_hit,0) + coalesce(tidx_blks_hit,0)) ) as from_cache
(# FROM pg_statio_all_tables --> change to pg_statio_USER_tables if
you want to check only user tables (excluding postgres's own tables)
(# ) a
(# WHERE (from_disk + from_cache) > 0 -- discard tables without hits
(# ),
-# tables as
-# (
(# SELECT *
(# FROM (
(# SELECT relname as table_name,
(# ( (coalesce(heap_blks_read,0) + coalesce(idx_blks_read,0) +
coalesce(toast_blks_read,0) + coalesce(tidx_blks_read,0)) ) as from_disk,
(# ( (coalesce(heap_blks_hit,0) + coalesce(idx_blks_hit,0) +
coalesce(toast_blks_hit,0) + coalesce(tidx_blks_hit,0)) ) as from_cache
(# FROM pg_statio_all_tables --> change to pg_statio_USER_tables if
you want to check only user tables (excluding postgres's own tables)
(# ) a
(# WHERE (from_disk + from_cache) > 0 -- discard tables without hits
(# )
-# SELECT table_name as "table name",
-# from_disk as "disk hits",
-# round((from_disk::numeric / (from_disk +
from_cache)::numeric)*100.0,2) as "% disk hits",
-# round((from_cache::numeric / (from_disk +
from_cache)::numeric)*100.0,2) as "% cache hits",
-# (from_disk + from_cache) as "total hits"
-# FROM (SELECT * FROM all_tables UNION ALL SELECT * FROM tables) a
-# ORDER BY (case when table_name = 'all' then 0 else 1 end), from_disk
desc
-# ;
table name | disk hits | % disk hits | %
cache hits | total hits
---------------------------------------------+-------------+-------------+--------------+---------------
all | 88000266877 | 1.60 |
98.40 | 5489558628019
b_e_i | 38269990257 | 2.88 |
97.12 | 1329542407426
n_c_r_o | 32839222402 | 1.44 | 98.56 |
2278801314997
b_e_i_a | 6372214550 | 4.76 |
95.24 | 133916822424
d_d | 2101245550 | 6.58 |
93.42 | 31936220932
pg_toast_550140 | 2055940284 | 32.63 |
67.37 | 6300424824
p_i | 1421254520 | 0.36 |
99.64 | 393348432350
n_c_e_s | 1164509701 | 27.85 | 72.15 |
4180714300
s_b_c_a | 1116814156 | 0.19 |
99.81 | 595617511928
b_e_i_l | 624945696 | 41.13 |
58.87 | 1519594743
p_e_i | 525580057 | 5.27 |
94.73 | 9968414493
=# select
-# s.relname,
-# pg_size_pretty(pg_relation_size(relid)),
-# coalesce(n_tup_ins,0) + 2 * coalesce(n_tup_upd,0) -
-# coalesce(n_tup_hot_upd,0) + coalesce(n_tup_del,0) AS total_writes,
-# (coalesce(n_tup_hot_upd,0)::float * 100 / (case when n_tup_upd > 0
(# then n_tup_upd else 1 end)::float)::numeric(10,2) AS hot_rate,
-# (select v[1] FROM regexp_matches(reloptions::text,E'fillfactor=(\\d+)')
as
(# r(v) limit 1) AS fillfactor
-# from pg_stat_all_tables s
-# join pg_class c ON c.oid=relid
-# order by total_writes desc limit 50;
relname | pg_size_pretty | total_writes | hot_rate
| fillfactor
----------------------------------+----------------+--------------+----------+------------
pg_toast_550140 | 1637 GB | 820414234 | 0.00
|
b_e_i_a | 168 GB | 454229502 | 0.00 |
s_b_c_a | 26 MB | 419253909 | 96.94 |
b_e_i_a_l | 71 GB | 305584644 | 0.00 |
s_b_c_a_l | 965 MB | 203361185 | 0.00 |
b_e_i | 7452 MB | 194861425 | 62.88 |
b_e_i_l | 57 GB | 144929408 | 0.00 |
o_i_n | 3344 kB | 98435081 | 99.38 |
r_h | 1140 MB | 33209351 | 0.11 |
b_e | 5808 kB | 29608085 | 99.65 |
=# select
calls,shared_blks_hit,shared_blks_read,shared_blks_dirtied,query--,
shared_blks_dirtied
-# from pg_stat_statements
-# where shared_blks_dirtied > 0 order by shared_blks_dirtied desc
-# limit 10;
calls | shared_blks_hit | shared_blks_read | shared_blks_dirtied |
query
43374691 | 1592513886 | 77060029 | 42096885 |
INSERT INTO b_e_i_a
23762881 | 1367338973 | 34351016 | 29131240 |
UPDATE b_e_i
541120 | 564550710 | 25726748 | 25551138 |
INSERT INTO d_d
23 | 23135504 | 187638126 | 15301103 |
VACUUM ANALYZE VERBOSE b_e_i;
11804481 | 401558460 | 19124307 | 14492182 |
UPDATE b_e_i_a
2352159 | 287732134 | 9462460 | 6250734 |
INSERT INTO b_e_i
13701688 | 256215340 | 5803881 | 6142119 |
INSERT into I_C_M
56582737 | 338943996 | 5272879 | 4882863 |
INSERT INTO b_e_i_a_l
26115040 | 131274217 | 6016404 | 4712060 |
INSERT INTO b_e_i_l
=# SELECT oid::REGCLASS::TEXT AS table_name,
-# pg_size_pretty(
(# pg_total_relation_size(oid)
(# ) AS total_size
-# FROM pg_class
-# WHERE relkind = 'r'
-# AND relpages > 0
-# ORDER BY pg_total_relation_size(oid) DESC
-# LIMIT 20;;
table_name | total_size
----------------------------------+------------
d_d | 1656 GB
b_e_i_a | 547 GB
b_e_i_a_l | 107 GB
b_e_i_l | 71 GB
b_e_i | 66 GB
n_c_e_s | 28 GB
p_e_i | 7807 MB
n_c_s | 7344 MB
e_i_n | 5971 MB
p_e_d_i | 3695 MB
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html
Excuse me, can you tell me how can I achieve this? "The question is how that compared to database size, and size of the active set (fraction of the database accessed by the application / queries)." -- Sent from: https://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html
thank you very much justin, i am seeing install the product you recommended me! -- Sent from: https://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html