Re: [PATCH] Introduce array_shuffle() and array_sample()
От | Martin Kalcher |
---|---|
Тема | Re: [PATCH] Introduce array_shuffle() and array_sample() |
Дата | |
Msg-id | f5af7698-d1eb-bdf4-9969-82b18ab4f8cf@aboutsource.net обсуждение исходный текст |
Ответ на | Re: [PATCH] Introduce array_shuffle() and array_sample() (Fabien COELHO <coelho@cri.ensmp.fr>) |
Ответы |
Re: [PATCH] Introduce array_shuffle() and array_sample()
|
Список | pgsql-hackers |
Am 24.07.22 um 10:15 schrieb Fabien COELHO: > > My 0,02€ about this patch: Thank you for your feedback. I attached a patch, that addresses most of your points. > Use (void) when declaring no parameters in headers or in functions. Done. > Should the exchange be skipped when i == k? The additional branch is actually slower (on my machine, tested with an 10M element int array) for 1D-arrays, which i assume is the most common case. Even with a 2D-array with a subarray size of 1M there is no difference in execution time. i == k should be relatively rare for large arrays, given a good prng. > I do not see the point of having *only* inline functions in a c file. > The external functions should not be inlined? Done. > The random and array shuffling functions share a common state. I'm > wondering whether it should ot should not be so. It seems ok, but then > ISTM that the documentation suggests implicitely that setseed applies to > random() only, which is not the case anymore after the patch. I really like the idea of a prng state owned by the user, that is used by all user facing random functions, but see that their might be concerns about this change. I would update the setseed() documentation, if this proposal is accepted. Do you think my patch should already include the documentation update? > If more samples are required than the number of elements, it does not > error out. I'm wondering whether it should. The second argument to array_sample() is treated like a LIMIT, rather than the actual number of elements. I updated the documentation. My use-case is: take max random items from an array of unknown size. > Also, the sampling should not return its result in order when the number > of elements required is the full array, ISTM that it should be shuffled > there as well. You are the second person asking for this. It's done. I am thinking about ditching array_sample() and replace it with a second signature for array_shuffle(): array_shuffle(array anyarray) -> anyarray array_shuffle(array anyarray, limit int) -> anyarray What do you think? > I must say that without significant knowledge of the array internal > implementation, the swap code looks pretty strange. ISTM that the code > would be clearer if pointers and array references style were not > intermixed. Done. Went with pointers. > Maybe you could add a test with a 3D array? Some sample with NULLs? Done. > Unrelated: I notice again that (postgre)SQL does not provide a way to > generate random integers. I do not see why not. Should we provide one? Maybe. I think it is outside the scope of this patch. Thank you, Martin
Вложения
В списке pgsql-hackers по дате отправления: