Re: ORDER BY random() LIMIT 1 slowness
От | Tom Lane |
---|---|
Тема | Re: ORDER BY random() LIMIT 1 slowness |
Дата | |
Msg-id | 4144.1040103853@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | ORDER BY random() LIMIT 1 slowness ("Gavin M. Roy" <gmr@justsportsusa.com>) |
Ответы |
Re: ORDER BY random() LIMIT 1 slowness
|
Список | pgsql-general |
"Gavin M. Roy" <gmr@justsportsusa.com> writes: > SELECT * FROM poetry ORDER BY random() LIMIT 1; > [ is slow for 35000 rows ] Yeah. Basically this query is implemented as (a) select all 35000 rows of "poetry"; (b) compute a random() value for each row; (c) sort by the random() values; (d) take the first row, discard the rest. The good news: this gives you a pretty-durn-random selection. The bad news: you didn't really care about choosing a random ordering of the other 34999 rows, but it computed one anyway. This problem's been discussed before, but I've not seen any really ideal answer. SQL is not designed to offer unpredictable results ;-) If you have a reasonably-uniformly-distributed index key on the rows, you can try something like select * from poetry where key > (scale * random() + min) order by key limit 1; (where scale and min are chosen to make the result cover the range of index keys). Given an index on "key" this should pick a result quickly via an indexscan+limit plan. However, if the distribution of keys isn't real uniform then you won't get real random results --- keys just above larger gaps will be selected with unfairly high probability. You also have to know the min and max keys accurately. I recall some folks have suggested generating an actual random column (ie, something initialized with "default random()" and suitably indexed) but this seems like a lot of overhead ... you'd have to be mighty concerned with the exactness of your random selection to put up with that, I think. There are some other suggestions in the archives, IIRC. regards, tom lane
В списке pgsql-general по дате отправления: