Re: [PoC] Improve dead tuple storage for lazy vacuum

Поиск
Список
Период
Сортировка
От John Naylor
Тема Re: [PoC] Improve dead tuple storage for lazy vacuum
Дата
Msg-id CAFBsxsFps1MReb3HQYEGc=M=1NXRjSu3TSaEVEXkVrCR3g-hxg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers
I didn't get any closer to radix-tree regression, but I did find some inefficiencies in tidstore_add_tids() that are worth talking about first, addressed in a rough fashion in the attached .txt addendums that I can clean up and incorporate later.

To start, I can reproduce the regression with this test as well:

select * from bench_tidstore_load(0, 10 * 1000 * 1000);

v15 + v26 store + adjustments:
 mem_allocated | load_ms
---------------+---------
      98202152 |    1676

v26 0001-0008
 mem_allocated | load_ms
---------------+---------
      98202032 |    1826

...and reverting to the alternate way to update the parent didn't help:

v26 0001-6, 0008, insert_inner w/ null parent

 mem_allocated | load_ms
---------------+---------
      98202032 |    1825

...and I'm kind of glad that wasn't the problem, because going back to that would be a pain for the shmem case.

Running perf doesn't show anything much different in the proportions (note that rt_set must have been inlined when declared locally in v26):

v15 + v26 store + adjustments:
  65.88%  postgres  postgres             [.] tidstore_add_tids
  10.74%  postgres  postgres             [.] rt_set
   9.20%  postgres  postgres             [.] palloc0
   6.49%  postgres  postgres             [.] rt_node_insert_leaf

v26 0001-0008
  78.50%  postgres  postgres             [.] tidstore_add_tids
   8.88%  postgres  postgres             [.] palloc0
   6.24%  postgres  postgres             [.] local_rt_node_insert_leaf

v2699-0001: The first thing I noticed is that palloc0 is taking way more time than it should, and it's because the compiler doesn't know the values[] array is small. One reason we need to zero the array is to make the algorithm agnostic about what order the offsets come in, as I requested in a previous review. Thinking some more, I was way too paranoid about that. As long as access methods scan the line pointer array in the usual way, maybe we can just assert that the keys we create are in order, and zero any unused array entries as we find them. (I admit I can't actually think of a reason we would ever encounter offsets out of order.) Also, we can keep track of the last key we need to consider for insertion into the radix tree, and ignore the rest. That might shave a few cycles during the exclusive lock when the max offset of an LP_DEAD item < 64 on a given page, which I think would be common in the wild. I also got rid of the special case for non-encoding, since shifting by zero should work the same way. These together led to a nice speedup on the v26 branch:

 mem_allocated | load_ms
---------------+---------
      98202032 |    1386

v2699-0002: The next thing I noticed is forming a full ItemIdPointer to pass to tid_to_key_off(). That's bad for tidstore_add_tids() because ItemPointerSetBlockNumber() must do this in order to allow the struct to be SHORTALIGN'd:

static inline void
BlockIdSet(BlockIdData *blockId, BlockNumber blockNumber)
{
blockId->bi_hi = blockNumber >> 16;
blockId->bi_lo = blockNumber & 0xffff;
}

Then, tid_to_key_off() calls ItemPointerGetBlockNumber(), which must reverse the above process:

static inline BlockNumber
BlockIdGetBlockNumber(const BlockIdData *blockId)
{
return (((BlockNumber) blockId->bi_hi) << 16) | ((BlockNumber) blockId->bi_lo);
}

There is no reason to do any of this if we're not reading/writing directly to/from an on-disk tid etc. To avoid this, I created a new function encode_key_off() [name could be better], which deals with the raw block number that we already have. Then turn tid_to_key_off() into a wrapper around that, since we still need the full conversion for tidstore_lookup_tid().

v2699-0003: Get rid of all the remaining special cases for encoding/or not. I am unaware of the need to optimize that case or treat it in any way differently. I haven't tested this on an installation with non-default blocksize and didn't measure this separately, but 0002+0003 gives:

 mem_allocated | load_ms
---------------+---------
      98202032 |    1259

If these are acceptable, I can incorporate them into a later patchset. In any case, speeding up tidstore_add_tids() will make any regressions in the backing radix tree more obvious. I will take a look at that next week.

--
John Naylor
EDB: http://www.enterprisedb.com
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: ICU locale validation / canonicalization
Следующее
От: "Takamichi Osumi (Fujitsu)"
Дата:
Сообщение: RE: Time delayed LR (WAS Re: logical replication restrictions)