Re: Query only slow on first run
От | tmp |
---|---|
Тема | Re: Query only slow on first run |
Дата | |
Msg-id | 474CBF98.6070508@amossen.dk обсуждение исходный текст |
Ответ на | Re: Query only slow on first run (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Query only slow on first run
Re: Query only slow on first run Re: Query only slow on first run |
Список | pgsql-performance |
> The query's spending nearly all its time in the scan of "posts", and > I'm wondering why --- doesn't seem like it should take 6400msec to fetch > 646 rows, unless perhaps the data is just horribly misordered relative > to the index. Which may in fact be the case ... Yes, they probably are. I use the random_number column in order to receive a semi random sample subset from the large amount of rows. The technique is described in [1]. This subset is later used for some statistical investigation, but this is somewhat irrelevant here. In order to receive the sample fast, I have made an index on the random_number column. > what exactly is that > "random_number" column A random float that is initialized when the row is created and never modified afterwards. The physical row ordering will clearly not match the random_number ordering. However, other queries uses a row ordering by the primary key so I don't think it would make much sense to make the index on random_number a clustering index just in order to speed up this single query. > and why are you desirous of ordering by it? In order to simulate a random pick of K rows. See [1]. > For that matter, if it is what it sounds like, why is it sane to group > by it? You'll probably always get groups of one row ... For each random_number, another table (question_tags) holds zero or more rows satisfying a number of constraints. I need to count(*) the number of corresponding question_tag rows for each random_number. We have primarily two tables of interest here: questions (~100k rows) and posts (~400k rows). Each post refers to a question, but only the "posts" rows for which the corresponding "question.status = 1" are relevant. This reduces the number of relevant question rows to about 10k. Within the post rows corresponding to these 10k questions I would like to pick a random sample of size K. [1] http://archives.postgresql.org/pgsql-general/2007-10/msg01240.php
В списке pgsql-performance по дате отправления: