Re: new correlation metric
От | Greg Stark |
---|---|
Тема | Re: new correlation metric |
Дата | |
Msg-id | 4B694644-E4C2-4F82-B661-9F0CEEAAC5F6@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: new correlation metric (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
I haven't look at the patch yet -- I'm actually on a train now. I'm sorry if these questions are answered in the patch. I think there are three questions here: A) what metric are we going to use B) how do we estimate thy metric given a sample C) how do we draw the conclusions we need based on this metric I looked recently on this topic and I found papers on estimating two metrics based on samples: longest sorted subsequence and number of sorted subsequence. The latter of which sounds like it might be equivalent to what you have. I didn't did any papers on drawing conclusions from either of these metrics though. I think we can just look at block number to avoid intra-block disorder confusing us. It does mean we have effectively a much smaller sample though. I don't think the other objection is worth worrying about though. If our sample has 15263748 then it may be that readahead will save us in that particular instance but it's clear the table isn't well clustered and it's just luck that those blocks were intermixed. In a very particular way. The rest of the table is unlikely to be so lucky. greg On 26 Oct 2008, at 03:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: >> On Sun, Oct 26, 2008 at 01:38:02AM -0700, Jeff Davis wrote: >>> I worked with Nathan Boley to come up with what we think is a better >>> metric for measuring this cost. > >> I think the code is in the right direction, but I think want you want >> is some kind of estimate of "given I've looked for tuple X, how many >> tuples in the next k pages are near this one". > > ISTM that some experimental studies would be required to justify any > proposal in this area. > > Having said that ... one of the things I never liked about the > existing > code is that it pays no attention to block-boundary effects. It > doesn't > matter to an indexscan how badly tuples within a single block are > misordered --- what matters is how many block reads you have to do. > So I think that any changes here ought to include measuring the > correlation on the basis of block numbers not tuple numbers. But > what's > not too clear to me is whether our sampling methods would mess that > up. > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
В списке pgsql-hackers по дате отправления: