WIP: further sorting speedup
От | Tom Lane |
---|---|
Тема | WIP: further sorting speedup |
Дата | |
Msg-id | 15464.1140403246@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: WIP: further sorting speedup
Re: WIP: further sorting speedup Re: WIP: further sorting speedup |
Список | pgsql-patches |
After applying Simon's recent sort patch, I was doing some profiling and noticed that sorting spends an unreasonably large fraction of its time extracting datums from tuples (heap_getattr or index_getattr). The attached patch does something about this by pulling out the leading sort column of a tuple when it is received by the sort code or re-read from a "tape". This increases the space needed by 8 or 12 bytes (depending on sizeof(Datum)) per in-memory tuple, but doesn't cost anything as far as the on-disk representation goes. The effort needed to extract the datum at this point is well repaid because the tuple will normally undergo multiple comparisons while it remains in memory. In some quick tests the patch seemed to make for a significant speedup, on the order of 30%, despite increasing the number of runs emitted because of the smaller available memory. The choice to pull out just the leading column, rather than all columns, is driven by concerns of (a) code complexity and (b) memory space. Having the extra columns pre-extracted wouldn't buy anything anyway in the common case where the leading key determines the result of a comparison. This is still WIP because it leaks memory intra-query (I need to fix it to clean up palloc'd space better). I thought I'd post it now in case anyone wants to try some measurements for their own favorite test cases. In particular it would be interesting to see what happens for a multi-column sort with lots of duplicated keys in the first column, which is the case where the least advantage would be gained. Comments? regards, tom lane
Вложения
В списке pgsql-patches по дате отправления: