Re: [HACKERS] Write Ahead Logging for Hash Indexes
От | Amit Kapila |
---|---|
Тема | Re: [HACKERS] Write Ahead Logging for Hash Indexes |
Дата | |
Msg-id | CAA4eK1+TuLSo0k9DCFi8U4xTxCe6trUHauxTDp06Ng6U=Bs=Cg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Write Ahead Logging for Hash Indexes (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
On Sun, Mar 12, 2017 at 8:06 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Mar 11, 2017 at 12:20 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> /* >>> + * Change the shared buffer state in critical section, >>> + * otherwise any error could make it unrecoverable after >>> + * recovery. >>> + */ >>> + START_CRIT_SECTION(); >>> + >>> + /* >>> * Insert tuple on new page, using _hash_pgaddtup to ensure >>> * correct ordering by hashkey. This is a tad inefficient >>> * since we may have to shuffle itempointers repeatedly. >>> * Possible future improvement: accumulate all the items for >>> * the new page and qsort them before insertion. >>> */ >>> (void) _hash_pgaddtup(rel, nbuf, itemsz, new_itup); >>> >>> + END_CRIT_SECTION(); >>> >>> No way. You have to start the critical section before making any page >>> modifications and keep it alive until all changes have been logged. >>> >> >> I think what we need to do here is to accumulate all the tuples that >> need to be added to new bucket page till either that page has no more >> space or there are no more tuples remaining in an old bucket. Then in >> a critical section, add them to the page using _hash_pgaddmultitup and >> log the entire new bucket page contents as is currently done in patch >> log_split_page(). > > I agree. > Okay, I have changed like that in the patch. >> Now, here we can choose to log the individual >> tuples as well instead of a complete page, however not sure if there >> is any benefit for doing the same because XLogRecordAssemble() will >> anyway remove the empty space from the page. Let me know if you have >> something else in mind. > > Well, if you have two pages that are 75% full, and you move a third of > the tuples from one of them into the other, it's going to be about > four times more efficient to log only the moved tuples than the whole > page. > I don't see how this could happen during split? I mean if you are moving 25% tuples from old bucket page to a new bucket page which is 75% full, it will log the new bucket page only after it gets full. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: