Обсуждение: Switching to 64-bit Bitmapsets

Поиск
Список
Период
Сортировка

Switching to 64-bit Bitmapsets

От
David Rowley
Дата:
As many of you will know, part of my planned work for PG12 is to
further improve the performance of querying partitioned tables. Amit
Langote is doing quite a bit of work on the planner side of things to
remove the per-partition overhead to reduce that to just for
partitions that survive partition pruning.  I've mostly turned my
attention to tuning the executor.

Back in PG11, we got faster partition pruning and also run-time
partition pruning. The pruning code uses Bitmapsets to communicate
which partitions survived partition pruning.  When many partitions
exist, these Bitmapsets can become quite large, quite a bit larger
than Bitmapsets were designed to be.

A comment in bitmapset.c mentions:

 * A bitmap set can represent any set of nonnegative integers, although
 * it is mainly intended for sets where the maximum value is not large,
 * say at most a few hundred.

I imagine the reason for this comment is that we must allocate storage
space for all bits below the more significant set bit. 64-bit sets
don't change that, but they do speed up various operations that are
commonly performed on Bitmapsets:

* Reduces the number of memory allocations by half for incrementally built sets.
* Faster skipping of 0 words. This improves bms_num_members,
bms_next_member, bms_prev_member for sparsely populated sets.
* Sets can be copied more quickly with bms_copy()
* Improvements to various other operations; bms_overlap,
bms_nonempty_difference, bms_singleton_member, to name a few.

If I patch master with [1] to delay the locking of partitions until
after run-time pruning, then I'm able to see a small gain in
performance with a 10k partition partitioned table. Currently, the
improvement is only around 1%, but at the moment the TPS of the query
is just around 900 tps. I have local patches which push this to around
25k tps, so that 1% overhead becomes much larger in that case.

I tried to design a benchmark that exercises a large Bitmapset. The
32-bit version (master) will use 313 bitmapwords to store this and the
patched version uses just 157 words.

create table listp (a int) partition by list(a);
select 'create table listp'||x::text || ' partition of listp for
values in('||x::text||');' from generate_Series(1,10000)x;
\gexec

select.sql:
\set p_a 5000
select * from listp where a = :p_a and a > (select 0);

Patched:

$ pgbench -n -f select.sql -T 60 -M prepared postgres

tps
901.570528
976.756117
906.503725
932.156778
942.404224
960.889453
962.468714
907.756816
940.468028
906.351704
978.957677
Unpatched:

$ pgbench -n -f select.sql -T 60 -M prepared postgres

tps
903.4159
941.853886
913.956495
949.313422
907.267227
964.118809
941.43603
952.701788
916.757846
929.572794
891.220856
917.235864

Average increase = 0.89%, median = 1.17%

I've attached the trivial patch which implements 64-bit Bitmapsets.

One caveat about this may be that it will likely slow performance for
32-bit machines.  My current thinking about that is that such a
platform is likely not that common a target for the latest version of
PostgreSQL. Even mobile phones and Raspberry PIs are all 64-bit these
days. However, I doubt it would take much more effort to maintain
using 32-bit sets on 32-bit machines.  If someone feels strongly about
that then I can adjust the patch to allow that.

I'm going to add this to the January commitfest.

[1] https://commitfest.postgresql.org/21/1897/

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Switching to 64-bit Bitmapsets

От
Tom Lane
Дата:
David Rowley <david.rowley@2ndquadrant.com> writes:
> I've attached the trivial patch which implements 64-bit Bitmapsets.

> One caveat about this may be that it will likely slow performance for
> 32-bit machines.  My current thinking about that is that such a
> platform is likely not that common a target for the latest version of
> PostgreSQL.

Yeah, I think we've been optimizing for 64-bit platforms for awhile.
I'm not prepared to abandon "it works on 32-bit", but I don't see a
reason to sacrifice 64-bit performance to improve the 32-bit case.

> However, I doubt it would take much more effort to maintain
> using 32-bit sets on 32-bit machines.  If someone feels strongly about
> that then I can adjust the patch to allow that.

Hm, are you thinking of making BITS_PER_BITMAPWORD match sizeof(Pointer)
or something like that?  That seems like a good compromise from here.

            regards, tom lane


Re: Switching to 64-bit Bitmapsets

От
David Rowley
Дата:
On Thu, 20 Dec 2018 at 17:50, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Rowley <david.rowley@2ndquadrant.com> writes:
> > However, I doubt it would take much more effort to maintain
> > using 32-bit sets on 32-bit machines.  If someone feels strongly about
> > that then I can adjust the patch to allow that.
>
> Hm, are you thinking of making BITS_PER_BITMAPWORD match sizeof(Pointer)
> or something like that?  That seems like a good compromise from here.

Yeah, something along those lines.  I've implemented that in the attached.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Switching to 64-bit Bitmapsets

От
Tom Lane
Дата:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On Thu, 20 Dec 2018 at 17:50, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hm, are you thinking of making BITS_PER_BITMAPWORD match sizeof(Pointer)
>> or something like that?  That seems like a good compromise from here.

> Yeah, something along those lines.  I've implemented that in the attached.

Pushed with some fiddling with the comment.

I wasn't excited about the test case you offered --- on HEAD, it pretty
much all devolves to file access operations (probably, checking the
current length of all the child relations).  Instead I experimented
with an old test case I had for a complex-to-plan query, and got these
results:

HEAD, asserts off:

pgbench reports:
latency average = 11704.992 ms
tps = 0.085434 (including connections establishing)
tps = 0.085437 (excluding connections establishing)

Top hits according to perf:
+   65.42%    65.16%        158499  postmaster       postgres                       [.]
match_eclasses_to_foreign_key_col
+    6.85%     6.82%         16634  postmaster       postgres                       [.] bms_overlap
+    6.25%     6.22%         15173  postmaster       postgres                       [.]
generate_join_implied_equalities_for_ecs
+    3.91%     3.88%          9465  postmaster       postgres                       [.] make_canonical_pathkey
+    3.04%     3.01%          7351  postmaster       postgres                       [.] have_relevant_eclass_joinclause
+    2.06%     2.05%          4992  postmaster       postgres                       [.] bms_is_subset
+    1.19%     1.18%          2887  postmaster       postgres                       [.] equal
+    0.95%     0.95%          2309  postmaster       postgres                       [.] get_eclass_for_sort_expr
+    0.90%     0.90%          2189  postmaster       postgres                       [.] add_paths_to_joinrel

With patch:

latency average = 10741.595 ms
tps = 0.093096 (including connections establishing)
tps = 0.093099 (excluding connections establishing)

+   69.03%    68.76%        178278  postmaster       postgres                     [.] match_eclasses_to_foreign_key_col
+    5.85%     5.82%         15138  postmaster       postgres                     [.]
generate_join_implied_equalities_for_ecs
+    4.58%     4.55%         11842  postmaster       postgres                     [.] bms_overlap
+    4.38%     4.36%         11338  postmaster       postgres                     [.] make_canonical_pathkey
+    2.77%     2.75%          7155  postmaster       postgres                     [.] have_relevant_eclass_joinclause
+    1.30%     1.29%          3364  postmaster       postgres                     [.] equal
+    1.26%     1.25%          3261  postmaster       postgres                     [.] bms_is_subset
+    1.06%     1.05%          2722  postmaster       postgres                     [.] get_eclass_for_sort_expr
+    0.70%     0.70%          1813  postmaster       postgres                     [.] add_paths_to_joinrel

Ignoring the, um, elephant in the room, this shows a pretty clear win
for the performance of bms_overlap and bms_is_subset, confirming
the thesis that this change is worthwhile.

I'll start a separate thread about match_eclasses_to_foreign_key_col.

            regards, tom lane


Re: Switching to 64-bit Bitmapsets

От
David Rowley
Дата:
On Fri, 21 Dec 2018 at 06:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Pushed with some fiddling with the comment.

Great. Thanks!

> I wasn't excited about the test case you offered --- on HEAD, it pretty
> much all devolves to file access operations (probably, checking the
> current length of all the child relations).  Instead I experimented
> with an old test case I had for a complex-to-plan query, and got these
> results:

oh meh. I forgot to mention I was running with plan_cache_mode =
force_generic_plan and max_parallel_workers_per_gather = 0. If you
tried without those then you'd have seen a massively different result.

Your test seems better, regardless.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services