Re: avoiding tuple copying in btree index builds
От | Robert Haas |
---|---|
Тема | Re: avoiding tuple copying in btree index builds |
Дата | |
Msg-id | CA+Tgmoao0WNzZ+bc7nAKy9yLpsHknWZaXSPkkw2KcwXZzWGVKw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: avoiding tuple copying in btree index builds (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: avoiding tuple copying in btree index builds
|
Список | pgsql-hackers |
On Tue, Jun 3, 2014 at 4:38 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Jun 1, 2014 at 3:26 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Tue, May 6, 2014 at 12:04 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Mon, May 5, 2014 at 2:13 PM, Andres Freund <andres@2ndquadrant.com> >>> wrote: >>> > On 2014-05-05 13:52:39 -0400, Robert Haas wrote: >>> >> Today, I discovered that when building a btree index, the btree code >>> >> uses index_form_tuple() to create an index tuple from the heap tuple, >>> >> calls tuplesort_putindextuple() to copy that tuple into the sort's >>> >> memory context, and then frees the original one it built. This seemed >>> >> inefficient, so I wrote a patch to eliminate the tuple copying. It >>> >> works by adding a function tuplesort_putindextuplevalues(), which >>> >> builds the tuple in the sort's memory context and thus avoids the need >>> >> for a separate copy. I'm not sure if that's the best approach, but >>> >> the optimization seems wortwhile. >>> > >>> > Hm. It looks like we could quite easily just get rid of >>> > tuplesort_putindextuple(). The hash usage doesn't look hard to convert. >>> >>> I glanced at that, but it wasn't obvious to me how to convert the hash >>> usage. If you have an idea, I'm all ears. >> >> I also think it's possible to have similar optimization for hash index >> incase it has to spool the tuple for sorting. >> >> In function hashbuildCallback(), when buildstate->spool is true, we >> can avoid to form index tuple. To check for nulls before calling >> >> _h_spool(), we can traverse the isnull array. > > Hmm, that might work. Arguably it's less efficient, but on the other > hand if it avoids forming the tuple sometimes it might be MORE > efficient. And anyway the difference might not be enough to matter. On further review, this is definitely the way to go: it's a straight-up win. The isnull array is never more than one element in length, so testing the single element is quite trivial. The attached, revised patch provides a modest but useful speedup for both hash and btree index builds. Anyone see any reason NOT to do this? So far it looks like a slam-dunk from here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления: