Re: Maximum number of WAL files in the pg_xlog directory
От | Guillaume Lelarge |
---|---|
Тема | Re: Maximum number of WAL files in the pg_xlog directory |
Дата | |
Msg-id | CAECtzeWXY_v8-eBuC+mZRLs7y94z0ppLSHN2+2t3sJDkhyhb6g@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Maximum number of WAL files in the pg_xlog directory (Guillaume Lelarge <guillaume@lelarge.info>) |
Список | pgsql-hackers |
<p dir="ltr">Hi,<p dir="ltr">Le 15 oct. 2014 22:25, "Guillaume Lelarge" <<a href="mailto:guillaume@lelarge.info">guillaume@lelarge.info</a>>a écrit :<br /> ><br /> > 2014-10-15 22:11 GMT+02:00Jeff Janes <<a href="mailto:jeff.janes@gmail.com">jeff.janes@gmail.com</a>>:<br /> >><br /> >>On Fri, Aug 8, 2014 at 12:08 AM, Guillaume Lelarge <<a href="mailto:guillaume@lelarge.info">guillaume@lelarge.info</a>>wrote:<br /> >>><br /> >>> Hi,<br />>>><br /> >>> As part of our monitoring work for our customers, we stumbled upon an issue with our customers'servers who have a wal_keep_segments setting higher than 0.<br /> >>><br /> >>> We have a monitoringscript that checks the number of WAL files in the pg_xlog directory, according to the setting of three parameters(checkpoint_completion_target, checkpoint_segments, and wal_keep_segments). We usually add a percentage to theusual formula:<br /> >>><br /> >>> greatest(<br /> >>> (2 + checkpoint_completion_target)* checkpoint_segments + 1,<br /> >>> checkpoint_segments + wal_keep_segments + 1<br/> >>> )<br /> >><br /> >><br /> >> I think the first bug is even having this formula in thedocumentation to start with, and in trying to use it.<br /> >><br /> ><br /> > I agree. But we have customersasking how to compute the right size for their WAL file system partitions. Right size is usually a euphemism forsmallest size, and they usually tend to get it wrong, leading to huge issues. And I'm not even speaking of monitoring,and alerting.<br /> ><br /> > A way to avoid this issue is probably to erase the formula from the documentation,and find a new way to explain them how to size their partitions for WALs.<br /> ><br /> > Monitoringis another matter, and I don't really think a monitoring solution should count the WAL files. What actually reallymatters is the database availability, and that is covered with having enough disk space in the WALs partition.<br />><br /> >> "and will normally not be more than..."<br /> >><br /> >> This may be "normal" for a toysystem. I think that the normal state for any system worth monitoring is that it has had load spikes at some point inthe past. <br /> >><br /> ><br /> > Agreed.<br /> > <br /> >><br /> >> So it is the next partof the doc, which describes how many segments it climbs back down to upon recovering from a spike, which is the importantone. And that doesn't mention wal_keep_segments at all, which surely cannot be correct.<br /> >><br /> ><br/> > Agreed too.<br /> > <br /> >><br /> >> I will try to independently derive the correct formulafrom the code, as you did, without looking too much at your derivation first, and see if we get the same answer.<br/> >><br /> ><br /> > Thanks. I look forward reading what you found.<br /> ><br /> > What seemsclear to me right now is that no one has a sane explanation of the formula. Though yours definitely made sense, it didn'tseem to be what the code does.<br /> ><p dir="ltr">Did you find time to work on this? Any news?<p dir="ltr">Thanks.
В списке pgsql-hackers по дате отправления: