Re: B-Tree support function number 3 (strxfrm() optimization)
От | Peter Geoghegan |
---|---|
Тема | Re: B-Tree support function number 3 (strxfrm() optimization) |
Дата | |
Msg-id | CAM3SWZSAtQOVbPZmVMcaj-_fQEvKcZ5MKeaAUyH6Wz40hmvL3A@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: B-Tree support function number 3 (strxfrm() optimization) (Peter Geoghegan <pg@heroku.com>) |
Ответы |
Re: B-Tree support function number 3 (strxfrm() optimization)
Re: B-Tree support function number 3 (strxfrm() optimization) |
Список | pgsql-hackers |
On Wed, Sep 3, 2014 at 2:44 PM, Peter Geoghegan <pg@heroku.com> wrote: > I guess it should still be a configure option, then. Or maybe there > should just be a USE_ABBREV_KEYS macro within pg_config_manual.h. Attached additional patches are intended to be applied on top off most of the patches posted on September 2nd [1]. Note that you should not apply patch 0001-* from that set to master, since it has already been committed to master [2]. However, while rebasing I revised patch/commit 0005-* to abbreviation used on all platforms, including 32-bit platforms (the prior 0005-* patch just re-enabled the optimization on Darwin/Apple), so you should discard the earlier 0005-* patch. In a later commit I also properly formalize the idea that we always do opportunistic "memcmp() == 0" checks, no matter what context a sortsupport-accelerated text comparison occurs in. That seems like a good idea, but it's broken out in a separate commit in case you are not in agreement. While I gave serious consideration to your idea of having a dedicated abbreviation comparator, and not duplicating sortsupport state when abbreviated keys are used (going so far as to almost fully implement the idea), I ultimately decided that my vote says we don't do that. It seemed to me that there were negligible benefits for increased complexity. In particular, I didn't want to burden tuplesort with having to worry about whether or not abbreviation was aborted during tuple copying, or was not used by the opclass in the first place - implementing your scheme makes that distinction relevant. It's very convenient to have comparetup_heap() "compare the leading sort key" (that specifically looks at SortTuple.datum1 pairs) indifferently, using the same comparator for "abbreviated" and "not abbreviated" cases indifferently. comparetup_heap() does not seem like a great place to burden with caring about each combination any more than strictly necessary. I like that I don't have to care about every combination, and can treat abbreviation abortion as the special case with the extra step, in line with how I think of the optimization conceptually. Does that make sense? Otherwise, there'd have to be a ApplySortComparator() *and* "ApplySortComparatorAbbreviated()" call with SortTuple.datum1 pairs passed, as appropriate for each opclass (and abortion state), as well as a heap_getattr() tie-breaker call for the latter case alone (when we got an inconclusive answer, OR when abbreviation was aborted). Finally, just as things are now, there'd have to be a loop where the second or subsequent attributes are dealt with by ApplySortComparator()'ing. So AFAICT under your scheme there are 4 ApplySortComparator* call sites required, rather than 3 as under mine. Along similar lines, I thought about starting from nkey = 0 within comparetup_heap() when abortion occurs (so that there'd only be 2 ApplySortComparator() call sites - no increase from master) , but that turns out to be messy, plus I like those special tie-breaker assertions. I will be away for much of next week, and will have limited access to e-mail. I will be around tomorrow, though. I hope that what I've posted is suitable to commit without further input from me. [1] http://www.postgresql.org/message-id/CAM3SWZTEtQcKc24LhWKDLasJf-b-cCNn4q0OYjhGBX+NcpNRpg@mail.gmail.com [2] http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d8d4965dc29263462932be03d4206aa694e2cd7e -- Peter Geoghegan
Вложения
В списке pgsql-hackers по дате отправления: