Re: Gsoc2012 Idea --- Social Network database schema
От | Andrew Dunstan |
---|---|
Тема | Re: Gsoc2012 Idea --- Social Network database schema |
Дата | |
Msg-id | 4F69FAF5.8000809@dunslane.net обсуждение исходный текст |
Ответ на | Re: Gsoc2012 Idea --- Social Network database schema (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
On 03/21/2012 11:49 AM, Robert Haas wrote: > On Wed, Mar 21, 2012 at 11:34 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> Robert Haas<robertmhaas@gmail.com> writes: >>> Well, the standard syntax apparently aims to reduce the number of >>> returned rows, which ORDER BY does not. Maybe you could do it with >>> ORDER BY .. LIMIT, but the idea here I think is that we'd like to >>> sample the table without reading all of it first, so that seems to >>> miss the point. >> I think actually the traditional locution is more like >> WHERE random()< constant >> where the constant is the fraction of the table you want. And yeah, >> the presumption is that you'd like it to not actually read every row. >> (Though unless the sampling density is quite a bit less than 1 row >> per page, it's not clear how much you're really going to win.) > Well, there's something mighty tempting about having a way to say > "just give me a random sample of the blocks and I'll worry about > whether that represents a random sample of the rows". > > It's occurred to me a few times that it's pretty unfortunate you can't > do that with a TID condition. > > rhaas=# explain select * from randomtext where ctid>= '(500,1)' and > ctid< '(501,1)'; > QUERY PLAN > -------------------------------------------------------------------- > Seq Scan on randomtext (cost=0.00..111764.90 rows=25000 width=31) > Filter: ((ctid>= '(500,1)'::tid) AND (ctid< '(501,1)'::tid)) > (2 rows) > > The last time this came up for me was when I was trying to find which > row in a large table as making the SELECT blow up; but it seems like > it could be used to implement a poor man's sampling method, too... it > would be nicer, in either case, to be able to specify the block > numbers you'd like to be able to read, rather than bounding the CTID > from both ends as in the above example. That would rapidly get unmanageable when you wanted lots of pages. Maybe we could do something like a pagenum pseudovar, or a wildcard match for ctid against '(123,*)'. cheers andrew
В списке pgsql-hackers по дате отправления: