Re: [HACKERS] longer-term optimizer musings
От | Erik Riedel |
---|---|
Тема | Re: [HACKERS] longer-term optimizer musings |
Дата | |
Msg-id | sqy7Fd600anI01i2E0@andrew.cmu.edu обсуждение исходный текст |
Ответ на | Re: [HACKERS] longer-term optimizer musings (Bruce Momjian <maillist@candle.pha.pa.us>) |
Список | pgsql-hackers |
> Interesting. The problem I see is that trying to do a char(20) column > with min(A) and max(B) can have 256^19 possible unique values from A to > B, so it kind if kills many general cases. Floats have the same > problem. > Right, in most general cases, there isn't much you can do. Although, if this seemed like an important thing, one could imagine an extension to 'vacuum analyze' and pg_statistic that tried to track the number of unique values while it finds the min and max. Maybe tracking some fixed number (10?) of unique attr values and stop searching once it exceeds 10 different values (or maybe some tiny fraction of the tuples in the relation, whatever gives a decent balance of memory and CPU at analyze time). Basically to find out if it might be .01% instead of the 10% default you suggest below. This would work for a database that tracks all the CDs owned by "Bob" and "Alice" even with char(20) first names. For floats, it wouldn't be very good for prices at Tiffany's, but should work pretty well for the Everything's $1 store. > A nice general fix would be to assume GROUP BY/AGG returns only 10% of > the existing rows. I don't even know if an Aggregate without a group by > knows it only returns one row. Oops, I guess not: > > test=> explain select max(relpages) from pg_class; > NOTICE: QUERY PLAN: > > Aggregate (cost=2.58 size=0 width=0) > -> Seq Scan on pg_class (cost=2.58 size=48 width=4) > Yup, this would be easy to add (both the 10% and 1 for non-group aggs). The current code just passes along the cost and zeros the size and width in all Sort, Group, and Aggregate nodes (this was the issue flagged as Problem 1 in my message - and I tried to give line numbers where that would have to be fixed). Note that cost_sort() seems to work reasonably enough, but has this non-obvious "sortkeys" argument that it does nothing with. > So I guess I am saying that your ideas are good, but we need to walk > before we can run with this optimizer. > Understood. I am not discouraged and will continue throwing these things out as I see them and think I have a reasonable explanation. Erik
В списке pgsql-hackers по дате отправления: