Re: pgbench - add pseudo-random permutation function
От | Dean Rasheed |
---|---|
Тема | Re: pgbench - add pseudo-random permutation function |
Дата | |
Msg-id | CAEZATCVgW-hT2L_eyighcBB3+pujuRmihXAE73_vwNF4WtDLVQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: pgbench - add pseudo-random permutation function (Fabien COELHO <coelho@cri.ensmp.fr>) |
Ответы |
Re: pgbench - add pseudo-random permutation function
|
Список | pgsql-hackers |
On Wed, 31 Mar 2021 at 18:53, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > While looking at it, I have some doubts on this part: > > m = (uint64) (pg_erand48(random_state.xseed) * (mask + 1)) | 1; > r = (uint64) (pg_erand48(random_state.xseed) * (mask + 1)); > r = (uint64) (pg_erand48(random_state.xseed) * size); > > I do not understand why the random values are multiplied by anything in > the first place… > These are just random integers in the range [0,mask] and [0,size-1], formed in exactly the same way as getrand(). > This one looks like a no-op : > > r = (uint64) (pg_erand48(random_state.xseed) * size); > v = (v + r) % size; > > v = (v + r) % size > = (v + rand * size) % size > =? (v % size + rand * size % size) % size > =? (v % size + 0) % size > = v % size > = v > rand * size % size is not zero because rand is a floating point number in the range [0,1), so actually rand * size % size = rand * size. Similarly in the other case, you're forgetting that rand is not an integer. Thinking more about our use of erand48(), the only real impact it has is to limit the number of possible permutations produced, and actually 2^48 is so huge (roughly 281 million million) that I can't ever see that being an issue in practice. (In a quick dummy test, replacing erand48() with a silly "erand8()" function that only returned one of 256 distinct values, permute() still worked fine at any size, except for the fact that only up to 256 distinct permutations were produced. In other words, limitations on the source of randomness don't prevent it from producing permutations of any size, they just limit the number of distinct permutations possible. And since 2^48 is so big, that shouldn't be an issue.) Also, I think the source of the input seed is most likely to be either manually hand-picked integers or pgbench's own random() function, so the only real issue I can see is that by ignoring the upper 16-bits, there's a very small chance of always using the same random sequence if some hand-picked numbers only vary in the top 16 bits, though I think that's highly unlikely in practice. Nonetheless, it's not much more effort to make another random state and use those remaining bits of the seed and get more internal random states, so here's an update doing that. I intentionally chose to reuse the lower 16 bits of the seed in the second random function (in a different slot of the random state), since those are probably the ones most likely to vary in practice. This doesn't actually make any measurable difference to any of the tests, but it closes that potential loophole of ignoring part of the seed. In all my tests, the biggest improvement was between v23 and v24 of the patch. By comparison, the later versions have been relatively small improvements, and it's probably now "random enough" for the intended purposes. Regards, Dean
Вложения
В списке pgsql-hackers по дате отправления: