Re: [HACKERS] Hash support for grouping sets
От | Andrew Gierth |
---|---|
Тема | Re: [HACKERS] Hash support for grouping sets |
Дата | |
Msg-id | 87y3vw7z9g.fsf@news-spur.riddles.org.uk обсуждение исходный текст |
Ответ на | Re: [HACKERS] Hash support for grouping sets (Mark Dilger <hornschnorter@gmail.com>) |
Ответы |
Re: [HACKERS] Hash support for grouping sets
|
Список | pgsql-hackers |
>>>>> "Mark" == Mark Dilger <hornschnorter@gmail.com> writes: Mark> You define DiscreteKnapsack to take integer weights and doubleMark> values, and perform the usual Dynamic Programmingalgorithm toMark> solve. But the only place you call this, you pass in NULL forMark> the values, indicating thatall the values are equal. I'mMark> confused why in this degenerate case you bother with the DP atMark> all. Can't youjust load the knapsack from lightest to heaviestMark> after sorting, reducing the runtime complexity to O(nlogn)? It'sMark>been a long day. Sorry if I am missing something obvious. That's actually a very good question. (Though in passing I should point out that the runtime cost is going to be negligible in all practical cases. Even an 8-way cube (256 grouping sets) has only 70 rollups, and a 4-way cube has only 6; the limit of 4096 grouping sets was only chosen because it was a nice round number that was larger than comparable limits in other database products. Other stuff in the grouping sets code has worse time bounds; the bipartite-match code used to minimize the number of rollups is potentially O(n^2.5) in the number of grouping sets.) Thinking about it, it's not at all obvious what we should be optimizing for here in the absence of accurate sort costs. Right now what matters is saving as many sort steps as possible, since that's a saving likely to be many orders of magnitude greater than the differences between two sorts. The one heuristic that might be useful is to prefer the smaller estimated size if other factors are equal, just on memory usage grounds, but even that seems a bit tenuous. I'm inclined to leave things as they are in the current patch, and maybe revisit it during beta if we get any relevant feedback from people actually using grouping sets? Mark> I'm guessing you intend to extend the code at some later date toMark> have different values for items. Is that right? Well, as I mentioned in a reply to Andres, we probably should eventually optimize for greatest reduction in cost, and the code as it stands would allow that to be added easily, but that would require having more accurate cost info than we have now. cost_sort doesn't take into consideration the number or types of sort columns or the costs of their comparison functions, so all sorts of the same input data are costed equally. -- Andrew (irc:RhodiumToad)
В списке pgsql-hackers по дате отправления: