Re: Performance problem with low correlation data
От | m_lists@yahoo.it |
---|---|
Тема | Re: Performance problem with low correlation data |
Дата | |
Msg-id | 903539.75439.qm@web24615.mail.ird.yahoo.com обсуждение исходный текст |
Ответ на | Performance problem with low correlation data (Scara Maccai <m_lists@yahoo.it>) |
Список | pgsql-general |
> > testinsert contains t values between '2009-08-01' and '2009-08-09', and ne_id > from 1 to 20000. But only 800 out of 20000 ne_id have to be read; there's no > need for a table scan! > > I guess this is a reflection of the poor "correlation" on ne_id; but, as I > said, I don't really think ne_id is so bad correlated. > > In fact, doing a "select ne_id, t from testinsert limit 100000" I can see > that data is laid out pretty much by "ne_id, t", grouped by day (that is, same > ne_id for one day, then next ne_id and so on until next day). > > How is the "correlation" calculated? Can someone explain to me why, after the > procedure above,correlation is so low??? > > Did you run ANALYZE after the procedure above? Yes I did; the correlation on that column stays low. Of course, I didn't expect a correlation = 1, since data is layed out (pretty much) like this: (ne_id1) (t1 day1) (ne_id1) (t2 day1) ... (ne_id1) (tn day1) (ne_id2) (t1 day1) (ne_id2) (t2 day1) ... (ne_id2) (tn day1) ... (pretty much all the ne_ids) (ne_id1) (t1 day2) (ne_id1) (t2 day2) ... (ne_id1) (tn day2) (ne_id2) (t1 day2) (ne_id2) (t2 day2) ... (ne_id2) (tn day2) ... and so on so I ne_id is not strictly incrementing, but it is pretty much the same (sequencially) for a whole whole day...
В списке pgsql-general по дате отправления: