Обсуждение: pgsql: Support parallel btree index builds.

Поиск
Список
Период
Сортировка

pgsql: Support parallel btree index builds.

От
Robert Haas
Дата:
Support parallel btree index builds.

To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds.  Testing
to date shows that this can often be 2-3x faster than a serial
index build.

The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature.  We can
refine it as we get more experience.

Peter Geoghegan with some help from Rushabh Lathia.  While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature.  Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.

Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/9da0cc35284bdbe8d442d732963303ff0e0a40bc

Modified Files
--------------
contrib/bloom/blinsert.c                      |   3 +-
doc/src/sgml/config.sgml                      |  44 +-
doc/src/sgml/monitoring.sgml                  |  12 +-
doc/src/sgml/ref/create_index.sgml            |  58 ++
doc/src/sgml/ref/create_table.sgml            |   4 +-
src/backend/access/brin/brin.c                |   4 +-
src/backend/access/gin/gininsert.c            |   2 +-
src/backend/access/gist/gistbuild.c           |   2 +-
src/backend/access/hash/hash.c                |   2 +-
src/backend/access/hash/hashsort.c            |   1 +
src/backend/access/heap/heapam.c              |  28 +-
src/backend/access/nbtree/nbtree.c            | 134 +---
src/backend/access/nbtree/nbtsort.c           | 878 +++++++++++++++++++++++++-
src/backend/access/spgist/spginsert.c         |   3 +-
src/backend/access/transam/parallel.c         |  12 +-
src/backend/bootstrap/bootstrap.c             |   2 +-
src/backend/catalog/heap.c                    |   2 +-
src/backend/catalog/index.c                   | 123 +++-
src/backend/catalog/toasting.c                |   1 +
src/backend/commands/cluster.c                |   3 +-
src/backend/commands/indexcmds.c              |   7 +-
src/backend/executor/execParallel.c           |   2 +-
src/backend/executor/nodeAgg.c                |   6 +-
src/backend/executor/nodeSort.c               |   2 +-
src/backend/optimizer/path/allpaths.c         |  18 +-
src/backend/optimizer/path/costsize.c         |   4 +-
src/backend/optimizer/plan/planner.c          | 136 ++++
src/backend/postmaster/pgstat.c               |   3 +
src/backend/storage/file/buffile.c            |  61 +-
src/backend/storage/file/fd.c                 |  10 +
src/backend/utils/adt/orderedsetaggs.c        |   2 +
src/backend/utils/init/globals.c              |   1 +
src/backend/utils/misc/guc.c                  |  10 +
src/backend/utils/misc/postgresql.conf.sample |   3 +-
src/backend/utils/probes.d                    |   2 +-
src/backend/utils/sort/logtape.c              | 199 +++++-
src/backend/utils/sort/tuplesort.c            | 595 ++++++++++++++---
src/include/access/nbtree.h                   |  14 +-
src/include/access/parallel.h                 |   4 +-
src/include/access/relscan.h                  |   1 +
src/include/catalog/index.h                   |   9 +-
src/include/miscadmin.h                       |   1 +
src/include/nodes/execnodes.h                 |   6 +-
src/include/optimizer/paths.h                 |   2 +-
src/include/optimizer/planner.h               |   1 +
src/include/pgstat.h                          |   1 +
src/include/storage/buffile.h                 |   2 +
src/include/storage/fd.h                      |   1 +
src/include/utils/logtape.h                   |  39 +-
src/include/utils/tuplesort.h                 | 132 +++-
src/tools/pgindent/typedefs.list              |   6 +
51 files changed, 2237 insertions(+), 361 deletions(-)


Re: pgsql: Support parallel btree index builds.

От
Andres Freund
Дата:
Hi,

On 2018-02-02 18:37:11 +0000, Robert Haas wrote:
> Support parallel btree index builds.

Wheee! Congrats Peter, Rushash, and everyone else involved!

- Andres


Re: pgsql: Support parallel btree index builds.

От
Peter Geoghegan
Дата:
On Sun, Feb 4, 2018 at 9:42 AM, Andres Freund <andres@anarazel.de> wrote:
> Wheee! Congrats Peter, Rushash, and everyone else involved!

Thanks!

-- 
Peter Geoghegan


Re: pgsql: Support parallel btree index builds.

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-02-02 18:37:11 +0000, Robert Haas wrote:
>> Support parallel btree index builds.

> Wheee! Congrats Peter, Rushash, and everyone else involved!

I'll be happier about it when the valgrind buildfarm animals are
happy.

            regards, tom lane


Re: pgsql: Support parallel btree index builds.

От
Peter Geoghegan
Дата:
On Sun, Feb 4, 2018 at 10:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'll be happier about it when the valgrind buildfarm animals are
> happy.

I don't know if you noticed, but I did post a patch for that on Friday.

-- 
Peter Geoghegan


Re: pgsql: Support parallel btree index builds.

От
Robert Haas
Дата:
On Sun, Feb 4, 2018 at 1:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'll be happier about it when the valgrind buildfarm animals are
> happy.

Me too, but it's not clear what the right fix is.  One thing that
would help is if you put in an appearance on the thread where this is
being discussed and cast a vote.  (Ditto to Andres.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pgsql: Support parallel btree index builds.

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Feb 4, 2018 at 1:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'll be happier about it when the valgrind buildfarm animals are
>> happy.

> Me too, but it's not clear what the right fix is.  One thing that
> would help is if you put in an appearance on the thread where this is
> being discussed and cast a vote.  (Ditto to Andres.)

If you mean do I like fixing this by adding a valgrind suppression,
no I do not.  Valgrind suppressions are last-resort band-aids IMO,
to be applied only when it's clearly understood what behavior we're
masking and why it's more reasonable to mask it than make it better
defined.  I, at least, don't have that understanding from looking
at the thread.  For one thing, Peter has not explained why this issue
appears now with parallel index build when it did not before; it's
not like logtape.c isn't old enough to vote.

Even granting that a suppression is the way to fix it, the proposed
suppression seems pretty darn broad, and hence likely to mask things
we'd wish it hadn't.

            regards, tom lane


Re: pgsql: Support parallel btree index builds.

От
Robert Haas
Дата:
On Tue, Feb 6, 2018 at 10:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sun, Feb 4, 2018 at 1:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I'll be happier about it when the valgrind buildfarm animals are
>>> happy.
>
>> Me too, but it's not clear what the right fix is.  One thing that
>> would help is if you put in an appearance on the thread where this is
>> being discussed and cast a vote.  (Ditto to Andres.)
>
> If you mean do I like fixing this by adding a valgrind suppression,
> no I do not.  Valgrind suppressions are last-resort band-aids IMO,
> to be applied only when it's clearly understood what behavior we're
> masking and why it's more reasonable to mask it than make it better
> defined.  I, at least, don't have that understanding from looking
> at the thread.  For one thing, Peter has not explained why this issue
> appears now with parallel index build when it did not before; it's
> not like logtape.c isn't old enough to vote.

Yeah, he has actually.  In other cases, the buffer is guaranteed to
have been filled at least once (and thus, from valgrind's point of
view, is initialized) because if that weren't going to happen then we
would have not have switched to a tape-sort in the first place.  You
can't set work_mem smaller than 8kB.  But in the parallel case each
worker must always produce a tape, so it can happen if a worker is
unlucky enough to get only a very small slice of the data (because the
other participants gobble it all up before that process really gets
going).

> Even granting that a suppression is the way to fix it, the proposed
> suppression seems pretty darn broad, and hence likely to mask things
> we'd wish it hadn't.

Well, he talked about that at some length too.  I don't know how
you're not seeing it on the thread.  But what I really need here is
some input on an option you do like, not just a list of things you
don't like.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pgsql: Support parallel btree index builds.

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Feb 6, 2018 at 10:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ... I, at least, don't have that understanding from looking
>> at the thread.  For one thing, Peter has not explained why this issue
>> appears now with parallel index build when it did not before; it's
>> not like logtape.c isn't old enough to vote.

> Yeah, he has actually.  In other cases, the buffer is guaranteed to
> have been filled at least once (and thus, from valgrind's point of
> view, is initialized) because if that weren't going to happen then we
> would have not have switched to a tape-sort in the first place.  You
> can't set work_mem smaller than 8kB.  But in the parallel case each
> worker must always produce a tape, so it can happen if a worker is
> unlucky enough to get only a very small slice of the data (because the
> other participants gobble it all up before that process really gets
> going).

Ah, I see.  So this is really a problem that's been latent all along,
but was never exposed in any previous use-case for logtape.c.

> But what I really need here is
> some input on an option you do like, not just a list of things you
> don't like.

I like the option of doing VALGRIND_MAKE_MEM_DEFINED on the tail
portion of the buffer before writing it.  That seems pretty tightly
tied to the behavior we're decreeing valid, whereas the suppression
is not.

            regards, tom lane


Re: pgsql: Support parallel btree index builds.

От
Peter Geoghegan
Дата:
On Tue, Feb 6, 2018 at 8:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I like the option of doing VALGRIND_MAKE_MEM_DEFINED on the tail
> portion of the buffer before writing it.  That seems pretty tightly
> tied to the behavior we're decreeing valid, whereas the suppression
> is not.

I think that the suppression is actually slightly better scoped than
that, since, for example, that won't just affect writes of
uninitialized bytes from the buffer. But I'll do it that way.

-- 
Peter Geoghegan