Обсуждение: pgsql: Disk-based Hash Aggregation.

Поиск
Список
Период
Сортировка

pgsql: Disk-based Hash Aggregation.

От
Jeff Davis
Дата:
Disk-based Hash Aggregation.

While performing hash aggregation, track memory usage when adding new
groups to a hash table. If the memory usage exceeds work_mem, enter
"spill mode".

In spill mode, new groups are not created in the hash table(s), but
existing groups continue to be advanced if input tuples match. Tuples
that would cause a new group to be created are instead spilled to a
logical tape to be processed later.

The tuples are spilled in a partitioned fashion. When all tuples from
the outer plan are processed (either by advancing the group or
spilling the tuple), finalize and emit the groups from the hash
table. Then, create new batches of work from the spilled partitions,
and select one of the saved batches and process it (possibly spilling
recursively).

Author: Jeff Davis
Reviewed-by: Tomas Vondra, Adam Lee, Justin Pryzby, Taylor Vesely, Melanie Plageman
Discussion: https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel@j-davis.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/1f39bce021540fde00990af55b4432c55ef4b3c7

Modified Files
--------------
doc/src/sgml/config.sgml                      |   32 +
src/backend/commands/explain.c                |   37 +
src/backend/executor/nodeAgg.c                | 1092 ++++++++++++++++++++++++-
src/backend/optimizer/path/costsize.c         |   70 +-
src/backend/optimizer/plan/planner.c          |   19 +-
src/backend/optimizer/prep/prepunion.c        |    2 +-
src/backend/optimizer/util/pathnode.c         |   14 +-
src/backend/utils/misc/guc.c                  |   20 +
src/include/executor/nodeAgg.h                |    8 +
src/include/nodes/execnodes.h                 |   22 +-
src/include/optimizer/cost.h                  |    4 +-
src/test/regress/expected/aggregates.out      |  184 +++++
src/test/regress/expected/groupingsets.out    |  122 +++
src/test/regress/expected/select_distinct.out |   62 ++
src/test/regress/expected/sysviews.out        |    4 +-
src/test/regress/sql/aggregates.sql           |  131 +++
src/test/regress/sql/groupingsets.sql         |  103 +++
src/test/regress/sql/select_distinct.sql      |   62 ++
18 files changed, 1950 insertions(+), 38 deletions(-)


Re: pgsql: Disk-based Hash Aggregation.

От
Tom Lane
Дата:
Jeff Davis <jdavis@postgresql.org> writes:
> Disk-based Hash Aggregation.

I noticed that the regression tests seemed suddenly slower than they
have been.  A bit of poking around reveals that this patch made
groupingsets.sql take approximately 8X longer than it used to,
and more than twice as long as any other core regression test.

This is absolutely, positively, not acceptable for a test that gets
run hundreds of times a day by lots of people and buildfarm animals.

If there's no way to test the feature in some significantly-cheaper way,
perhaps we should move this test out to a separate script that's not run
by default.

            regards, tom lane



Re: pgsql: Disk-based Hash Aggregation.

От
Jeff Davis
Дата:
On Mon, 2020-03-23 at 02:05 -0400, Tom Lane wrote:
> If there's no way to test the feature in some significantly-cheaper
> way,
> perhaps we should move this test out to a separate script that's not
> run
> by default.

I'll rework the tests. I wanted to be a bit more aggressive about
testing right after the checkin to shake out any problems, but I don't
think that's necessary any more.

Regards,
    Jeff Davis





Re: pgsql: Disk-based Hash Aggregation.

От
Tom Lane
Дата:
Jeff Davis <pgsql@j-davis.com> writes:
> On Mon, 2020-03-23 at 02:05 -0400, Tom Lane wrote:
>> If there's no way to test the feature in some significantly-cheaper
>> way,
>> perhaps we should move this test out to a separate script that's not
>> run
>> by default.

> I'll rework the tests. I wanted to be a bit more aggressive about
> testing right after the checkin to shake out any problems, but I don't
> think that's necessary any more.

Fair enough.

            regards, tom lane