Re: Eager page freeze criteria clarification

Поиск
Список
Период
Сортировка
От Joe Conway
Тема Re: Eager page freeze criteria clarification
Дата
Msg-id 145ad2d0-8c9b-4dd8-9385-636c0e29708a@joeconway.com
обсуждение исходный текст
Ответ на Re: Eager page freeze criteria clarification  (Melanie Plageman <melanieplageman@gmail.com>)
Список pgsql-hackers
On 12/21/23 10:56, Melanie Plageman wrote:
> On Sat, Dec 9, 2023 at 9:24 AM Joe Conway <mail@joeconway.com> wrote:
>> However, even if we assume a more-or-less normal distribution, we should
>> consider using subgroups in a way similar to Statistical Process
>> Control[1]. The reasoning is explained in this quote:
>>
>>      The Math Behind Subgroup Size
>>
>>      The Central Limit Theorem (CLT) plays a pivotal role here. According
>>      to CLT, as the subgroup size (n) increases, the distribution of the
>>      sample means will approximate a normal distribution, regardless of
>>      the shape of the population distribution. Therefore, as your
>>      subgroup size increases, your control chart limits will narrow,
>>      making the chart more sensitive to special cause variation and more
>>      prone to false alarms.
> 
> I haven't read anything about statistical process control until you
> mentioned this. I read the link you sent and also googled around a
> bit. I was under the impression that the more samples we have, the
> better. But, it seems like this may not be the assumption in
> statistical process control?
> 
> It may help us to get more specific. I'm not sure what the
> relationship between "unsets" in my code and subgroup members would
> be.  The article you linked suggests that each subgroup should be of
> size 5 or smaller. Translating that to my code, were you imagining
> subgroups of "unsets" (each time we modify a page that was previously
> all-visible)?

Basically, yes.

It might not makes sense, but I think we could test the theory by 
plotting a histogram of the raw data, and then also plot a histogram 
based on sub-grouping every 5 sequential values in your accumulator.

If the former does not look very normal (I would guess most workloads it 
will be skewed with a long tail) and the latter looks to be more normal, 
then it would say we were on the right track.

There are statistical tests for "normalness" that could be applied too 
(<quickly looks> e.g. 
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6350423/#sec2-13title ) 
which be a more rigorous approach, but the quick look at histograms 
might be sufficiently convincing.

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: index prefetching
Следующее
От: Dean Rasheed
Дата:
Сообщение: Functions to return random numbers in a given range