Batch TIDs lookup in ambulkdelete
От | Masahiko Sawada |
---|---|
Тема | Batch TIDs lookup in ambulkdelete |
Дата | |
Msg-id | CAD21AoBK-wBFimEx=4HyTa1HiCWrg1K4QRjCjVT62gj4BY6zSg@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Batch TIDs lookup in ambulkdelete
|
Список | pgsql-hackers |
Hi all, During index bulk-deletion in lazy vacuum, we currently check the deletability of each index tuple individually using the vac_tid_reaped() function. The attached proof-of-concept patches propose batching multiple TID lookups for deletability checks to reduce overhead. This optimization aims to minimize redundant function calls and repeated TidStore entry retrievals for TIDs on the same page. I have conducted benchmarks across several scenarios to evaluate the performance impact. # Case-1 (btree tuples are regular tuples and dead TIDs are concentrated): create unlogged table test (c int) with (autovacuum_enabled = off); insert into test select generate_series(1, ${NROWS}); create index on test (c); delete from test where c < ${NROWS} * 0.3; # Case-2 (btree tuples are regular tuples and dead TIDs are sparsed): create unlogged table test (c int) with (autovacuum_enabled = off); insert into test select generate_series(1, ${NROWS}); create index on test (c); delete from test where random() < 0.3; # Case-3 (btree tuples are deduplicated tuples): create unlogged table test (c int) with (autovacuum_enabled = off); insert into test select c % 1000 from generate_series(1, ${NROWS}) c; create index on test (c); select pg_relation_size('test') / 8192 as relpages \gset delete from test where (ctid::text::point)[0] < ((:'relpages')::int * 0.3); # Case-4 (btree tuples are deduplicated tuples and table is clustered): create unlogged table test (c int) with (autovacuum_enabled = off); insert into test select c % 1000 from generate_series(1, ${NROWS}) c; create index on test (c); cluster test using test_c_idx; select pg_relation_size('test') / 8192 as relpages \gset delete from test where (ctid::text::point)[0] < ((:'relpages')::int * 0.3); # Case-5 (creating btree index on UUID column) create unlogged table test (c uuid) with (autovacuum_enabled = off); insert into test select uuidv4() from generate_series(1, ${NROWS}) c; create index on test (c); select pg_relation_size('test') / 8192 as relpages \gset delete from test where (ctid::text::point)[0] < ((:'relpages')::int * 0.3); Here are the results (NROWS = 50000000): HEAD PATCHED DIFF case-1: 3,021 ms 2.818 ms 93.29% case-2: 5, 697 ms 5.545 ms 97.34% case-3: 2,833 ms 2.790 ms 98.48% case-4: 2,564 ms 2.279 ms 88.86% case-5: 4,657 ms 4.706 ms 101.04% I've measured 6 ~ 11% improvement in btree bulk-deletion. Here are the summary of each attached patch: 0001: Introduce TIdStoreIsMemberMulti() where we can do IsMember check for multiple TIDs in one function call. If the given TIDs are sorted (at least in block number), we can save radix tree lookup for the same page entry. 0002: Convert IndexBUlkDeleteCallback() to batched operation. 0003: Use batch TIDs lookup in btree index bulk-deletion. In patch 0003, we implement batch TID lookups for both each deduplicated index tuple and remaining all regular index tuples, which seems to be the most straightforward approach. While further optimizations are possible, such as performing batch TID lookups for all index tuples on a single page, these could introduce additional overhead from sorting and re-sorting TIDs. Moreover, considering that radix tree lookups are relatively inexpensive, the benefits of sorting TIDs and using TidStoreIsMemberMulti() might be minimal. Nevertheless, these potential optimizations warrant further evaluation to determine their actual impact on performance. Also, the patch includes the batch TIDs lookup support only for btree indexes but we potentially can improve other index AMs too. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Вложения
В списке pgsql-hackers по дате отправления: