Re: TODO item: adding VERBOSE option to CLUSTER [with patch]
От | Gregory Stark |
---|---|
Тема | Re: TODO item: adding VERBOSE option to CLUSTER [with patch] |
Дата | |
Msg-id | 87skr09sgo.fsf@oxford.xeocode.com обсуждение исходный текст |
Ответ на | Re: TODO item: adding VERBOSE option to CLUSTER [with patch] (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Ответы |
Re: TODO item: adding VERBOSE option to CLUSTER [with patch]
|
Список | pgsql-hackers |
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Jim Cox wrote: >> On Mon, Oct 13, 2008 at 8:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >>> >>> It'd be possible to count the number of order reversals during the >>> indexscan, ie the number of tuples with CTID lower than the previous >>> one's. But I'm not sure how useful that number really is. Incidentally it finally occurred to me that "sortedness" is actually a pretty good term to search on. I found several papers for estimating metrics of sortedness from samples even. Though the best looks like it requires a sample of size O(sqrt(n)) which is more than we currently take. The two metrics which seem popular is either the length of the longest subsequence which is sorted or the number of sorted subsequences. I think the latter is equivalent to counting the inversions. I didn't find any papers which claimed to present good ways to draw conclusions based on these metrics but I only did a quick search. I imagine if everyone is looking for ways to estimate them they they must be useful for something... For some reason my access to the ACM digital library stopped working. Does anyone else have access? > It will look bad for patterns like: > 2 > 1 > 4 > 3 > 6 > 5 > .. Hm, you could include some measure of how far the inversion goes -- but I think that's counter-productive. Sure some of them will be cached but others won't and that'll be equally bad regardless of how far back it goes. > Until we have a better metric for "sortedness", my earlier suggestion to print > it was probably a bad idea. If anything, should probably print the same > correlation metric that ANALYZE calculates, so that it would at least match > what the planner uses for decision-making. I agree with that. I like the idea of printing a message though -- we should just have it print the correlation for now and when we improve the stats we'll print the new metric. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning
В списке pgsql-hackers по дате отправления: