Re: Selecting K random rows - efficiently!
От | Pavel Stehule |
---|---|
Тема | Re: Selecting K random rows - efficiently! |
Дата | |
Msg-id | 162867790710300152n276410f8l345cc401323e8be@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Selecting K random rows - efficiently! (ptjm@interlog.com (Patrick TJ McPhee)) |
Список | pgsql-general |
2007/10/26, Patrick TJ McPhee <ptjm@interlog.com>: > In article <ffnid8$1q2t$1@news.hub.org>, cluster <skrald@amossen.dk> wrote: > % > How important is true randomness? > % > % The goal is an even distribution but currently I have not seen any way > % to produce any kind of random sampling efficiently. Notice the word > > How about generating the ctid randomly? You can get the number of pages > from pg_class and estimate the number of rows either using the number > of tuples in pg_class or just based on what you know about the data. > Then just generate two series of random numbers, one from 0 to the number > of pages and the other from 1 to the number of rows per page, and keep > picking rows until you have enough numbers. Assuming there aren't too > many dead tuples and your estimates are good, this should retrieve n rows > with roughly n look-ups. > > If your estimates are low, there will be tuples which can never be selected, > and so far as I know, there's no way to construct a random ctid in a stock > postgres database, but apart from that it seems like a good plan. If > efficiency is important, you could create a C function which returns a > series of random tids and join on that. > -- > SELECT id, ... FROM data WHERE id = ANY(ARRAY( SELECT (random()*max_id)::int FROM generate_series(1,20))) LIMIT 1; -- max_id is external constant Pavel Stehule
В списке pgsql-general по дате отправления: