Re: Gsoc2012 Idea --- Social Network database schema

Поиск

Список

Период

Сортировка

От	Andrew Dunstan
Тема	Re: Gsoc2012 Idea --- Social Network database schema
Дата	21 марта 2012 г. 13:00:22
Msg-id	4F69FAF5.8000809@dunslane.net обсуждение исходный текст
Ответ на	Re: Gsoc2012 Idea --- Social Network database schema (Robert Haas <robertmhaas@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения


On 03/21/2012 11:49 AM, Robert Haas wrote:
> On Wed, Mar 21, 2012 at 11:34 AM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> Robert Haas<robertmhaas@gmail.com>  writes:
>>> Well, the standard syntax apparently aims to reduce the number of
>>> returned rows, which ORDER BY does not.  Maybe you could do it with
>>> ORDER BY .. LIMIT, but the idea here I think is that we'd like to
>>> sample the table without reading all of it first, so that seems to
>>> miss the point.
>> I think actually the traditional locution is more like
>>         WHERE random()<  constant
>> where the constant is the fraction of the table you want.  And yeah,
>> the presumption is that you'd like it to not actually read every row.
>> (Though unless the sampling density is quite a bit less than 1 row
>> per page, it's not clear how much you're really going to win.)
> Well, there's something mighty tempting about having a way to say
> "just give me a random sample of the blocks and I'll worry about
> whether that represents a random sample of the rows".
>
> It's occurred to me a few times that it's pretty unfortunate you can't
> do that with a TID condition.
>
> rhaas=# explain select * from randomtext where ctid>= '(500,1)' and
> ctid<  '(501,1)';
>                               QUERY PLAN
> --------------------------------------------------------------------
>   Seq Scan on randomtext  (cost=0.00..111764.90 rows=25000 width=31)
>     Filter: ((ctid>= '(500,1)'::tid) AND (ctid<  '(501,1)'::tid))
> (2 rows)
>
> The last time this came up for me was when I was trying to find which
> row in a large table as making the SELECT blow up; but it seems like
> it could be used to implement a poor man's sampling method, too... it
> would be nicer, in either case, to be able to specify the block
> numbers you'd like to be able to read, rather than bounding the CTID
> from both ends as in the above example.


That would rapidly get unmanageable when you wanted lots of pages.


Maybe we could do something like a pagenum pseudovar, or a wildcard 
match for ctid against '(123,*)'.

cheers

andrew

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Gsoc2012 Idea --- Social Network database schema