Re: Selecting K random rows - efficiently!
От | Paul Tillotson |
---|---|
Тема | Re: Selecting K random rows - efficiently! |
Дата | |
Msg-id | 471FFE28.2050405@adelphia.net обсуждение исходный текст |
Ответ на | Selecting K random rows - efficiently! (cluster <skrald@amossen.dk>) |
Список | pgsql-general |
cluster wrote: > It has been suggested [1] that a good way to select K random rows > efficiently from a table is to > 1) add a new column, random_number, to the table > and initialize it with random() > 2) perform the following query: > SELECT * > FROM mydata > WHERE random_number >= (SELECT RANDOM() OFFSET 0) > ORDER BY random_number ASC LIMIT <K> > Here <K> is the number of random rows to pick. E.g. 100. > > The benefit in this solution is that the "random_number" column can be > indexed, allowing the query to be performed using a simple and fast > index scan. > > However, there is a couple of major drawbacks in this method: > > 1) The smaller "random_number" is, the less likely is it that it will > be picked when using this method. Example: A random_number close to > zero will only have a very small probability to be selected. When the above query returns L rows (where L < K) then, you need to append the first K - L rows from the table to simulate a "ring" without start or end. (Conveniently, this also solves the problem of not finding K rows because the random start value was too large.) The second set of rows can certainly be fetched using a second SELECT statement. Whether this can be computed efficiently as a single SELECT statement I am not sure but you might try something like this: (SELECT 1 AS seq, * FROM mydata WHERE random_number >= (SELECT RANDOM() OFFSET 0) ORDER BY random_number ASC LIMIT <K>) UNION ALL (SELECT 2 AS seq, * FROM mydata ORDER BY random_number ASC LIMIT <K>) ORDER BY seq ASC, random_number ASC LIMIT K; This should provide each row with an equal chance of being selected while requiring the database to fetch at most 2 * K rows. Regards, Paul Tillotson
В списке pgsql-general по дате отправления: