Re: [PATCH] Introduce array_shuffle() and array_sample()
От | Martin Kalcher |
---|---|
Тема | Re: [PATCH] Introduce array_shuffle() and array_sample() |
Дата | |
Msg-id | f6267e99-4448-b47c-9fe8-ec23690b931f@aboutsource.net обсуждение исходный текст |
Ответ на | Re: [PATCH] Introduce array_shuffle() and array_sample() (Dean Rasheed <dean.a.rasheed@gmail.com>) |
Ответы |
Re: [PATCH] Introduce array_shuffle() and array_sample()
|
Список | pgsql-hackers |
Am 21.07.22 um 10:41 schrieb Dean Rasheed: > > A couple of quick comments on the current patch: Thank you for your feedback! > It's important to mark these new functions as VOLATILE, not IMMUTABLE, > otherwise they won't work as expected in queries. See > https://www.postgresql.org/docs/current/xfunc-volatility.html CREATE FUNCTION marks functions as VOLATILE by default if not explicitly marked otherwise. I assumed function definitions in pg_proc.dat have the same behavior. I will fix that. https://www.postgresql.org/docs/current/sql-createfunction.html > It would be better to use pg_prng_uint64_range() rather than rand() to > pick elements. Partly, that's because it uses a higher quality PRNG, > with a larger internal state, and it ensures that the results are > unbiased across the range. But more importantly, it interoperates with > setseed(), allowing predictable sequences of "random" numbers to be > generated -- something that's useful in writing repeatable regression > tests. I agree that we should use pg_prng_uint64_range(). However, in order to achieve interoperability with setseed() we would have to use drandom_seed (rather than pg_global_prng_state) as rng state, which is declared statically in float.c and exclusively used by random(). Do we want to expose drandom_seed to other functions? > Assuming these new functions are made to interoperate with setseed(), > which I think they should be, then they also need to be marked as > PARALLEL RESTRICTED, rather than PARALLEL SAFE. See > https://www.postgresql.org/docs/current/parallel-safety.html, which > explains why setseed() and random() are parallel restricted. As mentioned above, i assumed the default here is PARALLEL UNSAFE. I'll fix that. > In my experience, the requirement for sampling with replacement is > about as common as the requirement for sampling without replacement, > so it seems odd to provide one but not the other. Of course, we can > always add a with-replacement function later, and give it a different > name. But maybe array_sample() could be used for both, with a > "with_replacement" boolean parameter? We can also add a "with_replacement" boolean parameter which is false by default to array_sample() later. I do not have a strong opinion about that and will implement it, if desired. Any opinions about default/no-default? Martin
В списке pgsql-hackers по дате отправления: