Re: [PATCH] random_normal function
От | Dean Rasheed |
---|---|
Тема | Re: [PATCH] random_normal function |
Дата | |
Msg-id | CAEZATCW=WOJqKNKsRWNjtW1fAQQvWg3KrBKxnfKADzKQz9K1XQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [PATCH] random_normal function (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [PATCH] random_normal function
|
Список | pgsql-hackers |
On Mon, 9 Jan 2023 at 18:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I pushed Paul's patch with the previously-discussed tests, but > the more I look at random.sql the less I like it. I propose > that we nuke the existing tests from orbit and replace with > something more or less as attached. Looks like a definite improvement. > (We could probably go further > than this, like trying to verify distribution properties. But > it's been too long since college statistics for me to want to > write such tests myself, and I'm not real sure we need them.) I played around with the Kolmogorov–Smirnov test, to test random() for uniformity. The following passes roughly 99.9% of the time: CREATE OR REPLACE FUNCTION ks_test_uniform_random() RETURNS boolean AS $$ DECLARE n int := 1000; -- Number of samples c float8 := 1.94947; -- Critical value for 99.9% confidence /* c float8 := 1.62762; -- Critical value for 99% confidence */ /* c float8 := 1.22385; -- Critical value for 90% confidence */ ok boolean; BEGIN ok := ( WITH samples AS ( SELECT random() r FROM generate_series(1, n) ORDER BY 1 ), indexed_samples AS ( SELECT (row_number() OVER())-1.0 i, r FROM samples ) SELECT max(abs(i/n-r)) < c / sqrt(n) FROM indexed_samples ); RETURN ok; END $$ LANGUAGE plpgsql; and is very fast. So that gives decent confidence that random() is indeed uniform. With a one-in-a-thousand chance of failing, if you wanted something with around a one-in-a-billion chance of failing, you could just try it 3 times: SELECT ks_test_uniform_random() OR ks_test_uniform_random() OR ks_test_uniform_random(); but it feels pretty hacky, and probably not really necessary. Rigorous tests for other distributions are harder, but also probably not necessary if we have confidence that the underlying PRNG is uniform. > BTW, if this does bring the probability of failure down to the > one-in-a-billion range, I think we could also nuke the whole > "ignore:" business, simplifying pg_regress and allowing the > random test to be run in parallel with others. I didn't check the one-in-a-billion claim, but +1 for that. Regards, Dean
В списке pgsql-hackers по дате отправления: