Re: Selecting random rows efficiently
От | Tom Lane |
---|---|
Тема | Re: Selecting random rows efficiently |
Дата | |
Msg-id | 23400.1062254862@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Selecting random rows efficiently (Richard Jones <rj@last.fm>) |
Ответы |
Re: Selecting random rows efficiently
|
Список | pgsql-performance |
Richard Jones <rj@last.fm> writes: >>> i have a table of around 3 million rows from which i regularly (twice a >>> second at the moment) need to select a random row from > i was hoping there was some trickery with sequences that would allow me to > easily pick a random valid sequence number..? There is no magic bullet here, but if you expect that requirement to persist then it is worth your trouble to expend effort on a real solution. A real solution in my mind would look like 1. Add a column "random_id float8 default random()". The idea here is that you assign a random ID to each row as it is created. 2. Add an index on the above column. 3. Your query now looks like SELECT * FROM table WHERE random_id >= random() ORDER BY random_id LIMIT 1; This gives you a plan on the order of Limit (cost=0.00..0.17 rows=1 width=8) -> Index Scan using fooi on foo (cost=0.00..57.00 rows=334 width=8) Filter: (random_id >= random()) which is fast and gives a genuinely random result row. At least up until you have enough rows that there start being duplicate random_ids, which AFAIK would be 2 billion rows with a decent random() implementation. If you're concerned about that, you could periodically re-randomize with UPDATE table SET random_id = random(); so that any rows that were "hidden" because they had a duplicate random_id have another shot at being choosable. But with only a few mil rows I don't think you need to worry. regards, tom lane
В списке pgsql-performance по дате отправления: