Re: [HACKERS] WIP: [[Parallel] Shared] Hash

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: [HACKERS] WIP: [[Parallel] Shared] Hash
Дата
Msg-id CAEepm=3g=dMG+84083fkFzLvgMJ7HdhbGB=AeZABNukbZm3hpA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] WIP: [[Parallel] Shared] Hash  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: [HACKERS] WIP: [[Parallel] Shared] Hash  (Andres Freund <andres@anarazel.de>)
Re: [HACKERS] WIP: [[Parallel] Shared] Hash  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Wed, Mar 1, 2017 at 10:40 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> I'm testing a new version which incorporates feedback from Andres and
> Ashutosh, and is refactored to use a new SharedBufFileSet component to
> handle batch files, replacing the straw-man implementation from the v5
> patch series.  I've set this to waiting-on-author and will post v6
> tomorrow.

I created a system for reference counted partitioned temporary files
called SharedBufFileSet: see 0007-hj-shared-buf-file.patch.  Then I
ripped out the code for sharing batch files that I previously had
cluttering up nodeHashjoin.c, and refactored it into a new component
called a SharedTuplestore which wraps a SharedBufFileSet and gives it
a tuple-based interface: see 0008-hj-shared-tuplestore.patch.  The
name implies aspirations of becoming a more generally useful shared
analogue of tuplestore, but for now it supports only the exact access
pattern needed for hash join batches ($10 wrench).

It creates temporary files like this:

  base/pgsql_tmp/pgsql_tmp[pid].[set].[partition].[participant].[segment]

I'm not sure why nodeHashjoin.c is doing raw batchfile read/write
operations anyway; why not use tuplestore.c for that (as
tuplestore.c's comments incorrectly say is the case)?  Maybe because
Tuplestore's interface doesn't support storing the extra hash value.
In SharedTuplestore I solved that problem by introducing an optional
fixed sized piece of per-tuple meta-data.  Another thing that is
different about SharedTuplestore is that it supports partitions, which
is convenient for this project and probably other parallel projects
too.

In order for workers to be able to participate in reference counting
schemes based on DSM segment lifetime, I had to give the
Exec*InitializeWorker() functions access to the dsm_segment object,
whereas previously they received only the shm_toc in order to access
its contents.  I invented ParallelWorkerContext which has just two
members 'seg' and 'toc': see
0005-hj-let-node-have-seg-in-worker.patch.  I didn't touch the FDW API
or custom scan API where they currently take toc, though I can see
that there is an argument that they should; changing those APIs seems
like a bigger deal.  Another approach would be to use ParallelContext,
as passed into ExecXXXInitializeDSM, with the members that are not
applicable to workers zeroed out.  Thoughts?

I got rid of the ExecDetachXXX stuff I had invented in the last
version, because acf555bc fixed the problem a better way.

I found that I needed to put use more than one toc entry for a single
executor node, in order to reserve space for the inner and outer
SharedTuplestore objects.  So I invented a way to make more extra keys
with PARALLEL_KEY_EXECUTOR_NTH(plan_node_id, N).

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: [HACKERS] PATCH: Configurable file mode mask
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: [HACKERS] Change in "policy" on dump ordering?