Re: random() generates collisions too early
От | Heikki Linnakangas |
---|---|
Тема | Re: random() generates collisions too early |
Дата | |
Msg-id | 526928D0.6090707@vmware.com обсуждение исходный текст |
Ответ на | Re: random() generates collisions too early (Honza Horak <hhorak@redhat.com>) |
Ответы |
Re: random() generates collisions too early
|
Список | pgsql-bugs |
On 22.10.2013 15:41, Honza Horak wrote: > On 10/21/2013 05:14 PM, Tom Lane wrote: >> Heikki Linnakangas <hlinnakangas@vmware.com> writes: >>> On 18.10.2013 14:55, Honza Horak wrote: >>>> The results covered only 181383 distinct values, and 68 values >>>> repeated four or five times each. We should at least consider using a >>>> higher-entropy seed. >> >>> Interesting. PostgreSQL's random() function just calls the underlying >>> libc random() function. I assume you tested this on with Linux and >>> glibc. >> >> I agree with the theory that this probably isn't the fault of the >> random() >> function as such, but with our code to reset the random seed when forking >> a postmaster child process. Note that the test case is only examining the >> first random value created in each process. So basically what this is >> measuring is the number of different seed values we use. >> >> regards, tom lane >> > > After playing a bit more with random() value and the seed defined in > src/backend/postmaster/postmaster.c:4038 it seems really like the seed > doesn't have very good characteristic. The PID number used there ensures > the value to be different in child processes, but when only xor-ed with > usecs, it doesn't enlarge the total data domain of the seed, which stays > 1M values. Then having in mind the birthday paradox and probably not > ideal PRNG, it doesn't seem unrealistic to get two same random numbers > in only hundreds/thousands of tries. > > In order to enlarge possible data domain of the seed we can include sec > count as well and shift some xor-ed variables to use the whole range or > unsigned int. The attached patch uses a way that gave me much better > results (collision found approx. after a hundred thousands of values): > > - srandom((unsigned int) (MyProcPid ^ usecs)); > + srandom((unsigned int) (MyProcPid ^ (usecs << 12) ^ secs)); Seems reasonable, committed. Thanks! - Heikki
В списке pgsql-bugs по дате отправления: