Re: O(samplesize) tuple sampling, proof of concept
От | Manfred Koizar |
---|---|
Тема | Re: O(samplesize) tuple sampling, proof of concept |
Дата | |
Msg-id | s4j370l6ra56tvodcbg9baaf682q6cu3pn@email.aon.at обсуждение исходный текст |
Ответ на | Re: O(samplesize) tuple sampling, proof of concept (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-patches |
On Mon, 05 Apr 2004 15:37:07 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: >I wouldn't bother with a GUC variable for the production patch. Among other things the GUC variable will be thrown out for the final version. >> Once a block is selected for inspection, all tuples of this >> block are accessed to get a good estimation of the live : dead row >> ratio. > >Why are you bothering to compute the live:dead ratio? That was basically bad wording. It should have been "... to get a good estimation of live rows per page." Counting dead rows turned out to be trivial, so I just did it and included the number in the debug messages. Then it happened to be useful for method 2. >> Because I was afraid that method 1 might be too expensive in terms of >> CPU cycles, I implemented a small variation that skips tuples without >> checking them for visibility; this is sampling_method 2. > >And? Does it matter? There's a clearly visible difference for mid-size relations. I'm not sure whether this can be attributed to visibility bit updating or other noise-contributing factors. Method 2 gives a row count estimation error between 10 and 17% where method 1 is less than 1% off. (My updates generated dead tuples at very evenly distributed locations by using conditions like WHERE mod(twenty, 7) = 0). >If that's as bad as it gets I think we are OK. You should redo the test >with larger sample size though (try stats target = 100) to see if the >answer changes. Will do. >I find -u diffs close to unreadable for reviewing purposes. Please >submit diffs in -c format in future. De gustibus non est disputandum :-) Fortunately this patch wouldn't look much different. There is just a bunch of "+" lines. >AFAICS the rows will *always* be sorted already, and so this qsort is an >utter waste of cycles. No? I don't think so. We get the *blocks* in the correct order. But tuples are still sampled by the Vitter method which starts to replace random tuples after the pool is filled. BTW, thanks for the paper! Servus Manfred
В списке pgsql-patches по дате отправления: