Re: Proposal to introduce a shuffle function to intarray extension
От | Martin Kalcher |
---|---|
Тема | Re: Proposal to introduce a shuffle function to intarray extension |
Дата | |
Msg-id | 7933c4a5-0e72-889e-1be9-e3574ffe333b@aboutsource.net обсуждение исходный текст |
Ответ на | Re: Proposal to introduce a shuffle function to intarray extension (Mladen Gogala <gogala.mladen@gmail.com>) |
Ответы |
Re: Proposal to introduce a shuffle function to intarray extension
|
Список | pgsql-general |
Am 16.07.22 um 18:53 schrieb Mladen Gogala: > On 7/15/22 04:36, Martin Kalcher wrote: >> Dear list, >> >> i am dealing with an application that processes fairly large arrays of >> integers. It makes heavy use of the intarray extension, which works >> great in most cases. However, there are two requirements that cannot >> be addressed by the extension and are rather slow with plain SQL. Both >> can be met with shuffling: >> >> - Taking n random members from an integer array >> - Splitting an array into n chunks, where each member is assigned to a >> random chunk >> >> Shuffling is currently implemented by unnesting the array, ordering >> the members by random() and aggregating them again. > > > Martin, have you considered PL/Python and NumPy module? Hey Mladen, thank you for your advice. Unfortunately the performance of shuffling with NumPy is about the same as with SQL. create function numpy_shuffle(arr int[]) returns int[] as $$ import numpy numpy.random.shuffle(arr) return arr $$ language 'plpython3u'; select arr[1:3]::text || ' ... ' || arr[3999998:4000000]::text from ( select numpy_shuffle(arr) arr from numbers ) shuffled; ------------------------------------------------------- {674026,3306457,1727170} ... {343875,3825484,1235246} Time: 2315.431 ms (00:02.315) Am i doing something wrong? Martin
В списке pgsql-general по дате отправления: