Re: Re: sampling.c and potential divisions by 0 ang log(0) with tablesample and ANALYZE in 9.5
От | Michael Paquier |
---|---|
Тема | Re: Re: sampling.c and potential divisions by 0 ang log(0) with tablesample and ANALYZE in 9.5 |
Дата | |
Msg-id | CAB7nPqT+sJE8x3q7kuMk7FrDSCs8ZThJUwVFf-a-WuhS__MeYQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Re: sampling.c and potential divisions by 0 ang log(0) with tablesample and ANALYZE in 9.5 (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Re: sampling.c and potential divisions by 0 ang log(0)
with tablesample and ANALYZE in 9.5
|
Список | pgsql-bugs |
On Wed, Jul 1, 2015 at 1:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Petr Jelinek <petr@2ndquadrant.com> writes: >> On 2015-06-25 10:01, Michael Paquier wrote: >>> I think that we should change the returned double to be (0.0,1.0] > >> Agreed. > > I find this to be a pretty bad idea. That definition is simply weird; > where else in the world will you find a random number generator that does > that? What are the odds that any callers are actually designed for that > behavior? > > Another problem is that we consider anl_random_fract() to be an exported > API, and the very longstanding definition of that is that the result is > in (0,1), excluding both endpoints. Whatever we do with > sampler_random_fract(), we'd better make sure that anl_random_fract() > preserves that behavior, else we are likely to break third-party modules. Wait a minute... Yes you are right I clearly missed the fact that in ~9.4 the range of values returned by anl_random_fract() does not include 1. I thought it did, visibly I misread the code... > A simple fix would be to adjust sampler_random_fract to disallow 0 > as result, say by repeating the pg_erand48 call if it produces 0. > I'm not sure if that would throw off any of the math in the new > tablesample-related callers. If it would, I'm inclined to fix the > problem call-site-by-call-site, rather than inventing a definition > of sampler_random_fract() that fails to satisfy the POLA. Agreed. Disallowing 0 in sampler_random_fract looks like a good answer to that. Looking at the tablesample code, for the bernouilli trial I recall that the range of probability success and failure is actually (0,1), (if p is a success rate, the failure is 1 - p). I am not sure for Knuth Algo S though for the system sampling but that looks OK from a pure logical viewpoint. -- Michael
Вложения
В списке pgsql-bugs по дате отправления: