Обсуждение: "Truncated" tuples for tuple hash tables

Поиск
Список
Период
Сортировка

"Truncated" tuples for tuple hash tables

От
Tom Lane
Дата:
While looking at the recently-noticed problem that HashAggregate nodes
store more columns of the input than they need to, I couldn't help
noticing how much of the hashtable space goes into HeapTuple header
overhead.  A couple months ago we were able to get a useful improvement
in sorting by not storing unnecessary header fields in sort files, and
I'm strongly tempted to do the same in tuple hash tables.

Unlike the case with sort temp files, it's important to be able to
access the stored data without moving/copying it.  So, not wishing to
duplicate all the tuple access machinery we have already, I'm
envisioning a compromise design that leaves a couple bytes on the table
but looks enough like a standard tuple to be directly usable.
Specifically, something like this:

typedef struct TruncatedTupleData
{   uint32        t_len;            /* length of tuple */
   char          pad[...];         /* see below */
   int16        t_natts;           /* number of attributes */   ... the rest matching HeapTupleHeaderData ...
}

The padding would be chosen such that the offset of t_natts would have
the same value modulo MAXIMUM_ALIGNOF as it has in HeapTupleHeaderData.
This ensures that if a TruncatedTuple is stored starting on a MAXALIGN
boundary, data within it is correctly aligned the same as it would be
in a normal tuple.  With the current struct definitions, 2 bytes of
padding would be needed on all supported platforms, and a
TruncatedTuple would be 16 bytes shorter than a regular tuple.
However, because we are also eliminating a HeapTupleData struct, the
total savings in tuple hash tables would be 36 to 40 bytes per tuple.

To make use of a TruncatedTuple, we'd set up a temporary HeapTupleData
struct with its t_data field pointing 16 bytes before the start of the
TruncatedTuple.  As long as the code using it never tries to access any
of the missing fields (t_xmin through t_ctid), this would work exactly
like a normal HeapTuple.

Going forward, we'd have to be careful to preserve the existing
field-ordering separation between transaction status fields and data
content fields in tuple headers, but that's just a matter of adding some
comments to htup.h.

It's tempting to think about using this same representation for tuples
stored in memory by tuplesort.c and tuplestore.c.  That'd probably
require some changes in their APIs, but I think it's doable.

Comments?  Anyone think this is too ugly for words?
        regards, tom lane


Re: "Truncated" tuples for tuple hash tables

От
Martijn van Oosterhout
Дата:
On Mon, Jun 26, 2006 at 10:36:00AM -0400, Tom Lane wrote:
> While looking at the recently-noticed problem that HashAggregate nodes
> store more columns of the input than they need to, I couldn't help
> noticing how much of the hashtable space goes into HeapTuple header
> overhead.  A couple months ago we were able to get a useful improvement
> in sorting by not storing unnecessary header fields in sort files, and
> I'm strongly tempted to do the same in tuple hash tables.
>
> Unlike the case with sort temp files, it's important to be able to
> access the stored data without moving/copying it.  So, not wishing to
> duplicate all the tuple access machinery we have already, I'm
> envisioning a compromise design that leaves a couple bytes on the table
> but looks enough like a standard tuple to be directly usable.

I considered this, but ran into the problem that heap_getattr fell back
to fastgetattr, which wouldn't know what kind of tuple it was given.
Now, if you're going to add a special heap_getattr for these tuples,
then ofcourse there's no problem.

Maybe create a version of heap_getattr that takes the fallback function
as a parameter?

Anyway, I think it's a good idea. Most places in the backend after the
SeqScan/IndexScan node really don't care about most of the header
fields and being able to drop them would be nice.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: "Truncated" tuples for tuple hash tables

От
Tom Lane
Дата:
Martijn van Oosterhout <kleptog@svana.org> writes:
> On Mon, Jun 26, 2006 at 10:36:00AM -0400, Tom Lane wrote:
>> Unlike the case with sort temp files, it's important to be able to
>> access the stored data without moving/copying it.  So, not wishing to
>> duplicate all the tuple access machinery we have already, I'm
>> envisioning a compromise design that leaves a couple bytes on the table
>> but looks enough like a standard tuple to be directly usable.

> I considered this, but ran into the problem that heap_getattr fell back
> to fastgetattr, which wouldn't know what kind of tuple it was given.
> Now, if you're going to add a special heap_getattr for these tuples,
> then ofcourse there's no problem.

No, I'm specifically *not* going to do that.  AFAICS the proposed
representation requires no changes in heap_getattr; if it did, it'd
be too invasive to contemplate :-(.  It should look exactly like any
other HeapTuple structure, except that the "transaction status" fields
will contain garbage.  Do you see something I missed?
        regards, tom lane


Re: "Truncated" tuples for tuple hash tables

От
"Bort, Paul"
Дата:
Tom Lane said:
>
> To make use of a TruncatedTuple, we'd set up a temporary HeapTupleData
> struct with its t_data field pointing 16 bytes before the start of the
> TruncatedTuple.  As long as the code using it never tries to
> access any
> of the missing fields (t_xmin through t_ctid), this would work exactly
> like a normal HeapTuple.
>

This sounds like a security risk. What's the worst thing that could be
in
those 16 bytes, and could that be used to bite (sorry) us?

If those 16 bytes could be user data in another tuple, there might be an
attack there.



Re: "Truncated" tuples for tuple hash tables

От
Tom Lane
Дата:
"Bort, Paul" <pbort@tmwsystems.com> writes:
> Tom Lane said:
>> As long as the code using it never tries to access any
>> of the missing fields (t_xmin through t_ctid), this would work exactly
>> like a normal HeapTuple.

> This sounds like a security risk.

No more than any other wild-pointer problem.  At the level of C code
it's always trivial to break anything ;-).  The reason we don't need
to worry is that in the upper levels of the executor there is no such
thing as access to system columns --- any attempt to fetch a system
column is converted to a Var reference that appears in the initial
table-scan node, and thereafter it's an ordinary user column.  This
must be so because trying to keep the system columns in their normal
privileged positions breaks down as soon as you consider a join; there
would only be room for one, and the query might well be trying to fetch
(say) ctid from more than one table.  So any code that was trying to
fetch these fields would be buggy anyway.

In the case of the TupleHashTable code, the only access that need be
provided is through a TupleTableSlot.  We could get a little bit of
protection against programming mistakes by using the "virtual tuple"
feature of slots to disallow attempts to fetch any system columns from
a truncated tuple.  I'm not sure if this would be feasible for tuplesort
though; haven't looked at how it's used yet.
        regards, tom lane


Re: "Truncated" tuples for tuple hash tables

От
Simon Riggs
Дата:
On Mon, 2006-06-26 at 16:48 +0200, Martijn van Oosterhout wrote:
> On Mon, Jun 26, 2006 at 10:36:00AM -0400, Tom Lane wrote:
> > While looking at the recently-noticed problem that HashAggregate nodes
> > store more columns of the input than they need to, I couldn't help
> > noticing how much of the hashtable space goes into HeapTuple header
> > overhead.  A couple months ago we were able to get a useful improvement
> > in sorting by not storing unnecessary header fields in sort files, and
> > I'm strongly tempted to do the same in tuple hash tables.

> Anyway, I think it's a good idea. Most places in the backend after the
> SeqScan/IndexScan node really don't care about most of the header
> fields and being able to drop them would be nice.

I understood Tom meant to do this only for HashAgg and Tuplestore. Tom,
is it possible to extend this further across the executor as Martijn
suggests? That would be useful, even if it is slightly harder to measure
the benefit than it is with the can-spill-to-disk cases.

IMHO it would be better to call the format TupleWithoutVisibilityData so
its very clear that accessing the visibility fields aren't present and
any attempt to access them would be a mistake. TruncatedTuple isn't
clear about what's missing or its consequences.

--  Simon Riggs EnterpriseDB          http://www.enterprisedb.com



Re: "Truncated" tuples for tuple hash tables

От
Tom Lane
Дата:
Simon Riggs <simon@2ndquadrant.com> writes:
> On Mon, 2006-06-26 at 16:48 +0200, Martijn van Oosterhout wrote:
>> Anyway, I think it's a good idea. Most places in the backend after the
>> SeqScan/IndexScan node really don't care about most of the header
>> fields and being able to drop them would be nice.

> I understood Tom meant to do this only for HashAgg and Tuplestore. Tom,
> is it possible to extend this further across the executor as Martijn
> suggests? That would be useful, even if it is slightly harder to measure
> the benefit than it is with the can-spill-to-disk cases.

There isn't any benefit except where we collect lots of tuples, which is
to say tuplesort/tuplestore/tuplehashtable.  In other places in the
executor, there's basically only one transient tuple in existence per
plan node; jumping through hoops to save 16 bytes per plan node is just
silly.  (What's more, as of 8.1 most of those tuples will be in "virtual
tuple" format anyway, and so the optimization wouldn't make any
difference at all...)

> IMHO it would be better to call the format TupleWithoutVisibilityData so
> its very clear that accessing the visibility fields aren't present and
> any attempt to access them would be a mistake. TruncatedTuple isn't
> clear about what's missing or its consequences.

I'm not wedded to "TruncatedTuple", but I don't like your suggestion
better; it presumes too much about what the difference might be between
truncated and full tuples.  Even today, I don't think
"TupleWithoutVisibilityData" makes it clear that t_ctid is missing;
and down the road there might be other header fields that we don't need
to have in in-memory tuples.  Another small problem is that given our
naming conventions for structs vs pointers to structs, using "Data" as
the last word of a struct name is a bad idea --- for consistency,
pointers to it would be typedef TupleWithoutVisibility, which seems a
bit weird.  For consistency with the existing struct names, I think we
need to choose a name of the form "<adjective>Tuple".

I thought for awhile about MemoryTuple (as contrasted to HeapTuple) but
that seems too generic.  Any other thoughts?
        regards, tom lane


Re: "Truncated" tuples for tuple hash tables

От
Simon Riggs
Дата:
On Mon, 2006-06-26 at 14:36 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Mon, 2006-06-26 at 16:48 +0200, Martijn van Oosterhout wrote:
> >> Anyway, I think it's a good idea. Most places in the backend after the
> >> SeqScan/IndexScan node really don't care about most of the header
> >> fields and being able to drop them would be nice.
> 
> > I understood Tom meant to do this only for HashAgg and Tuplestore. Tom,
> > is it possible to extend this further across the executor as Martijn
> > suggests? That would be useful, even if it is slightly harder to measure
> > the benefit than it is with the can-spill-to-disk cases.
> 
> There isn't any benefit 

OK, see that... 

> I thought for awhile about MemoryTuple (as contrasted to HeapTuple) but
> that seems too generic.  Any other thoughts?

I like MemoryTuple but since we only use it when we go to disk...

ExecutorTuple, MinimalTuple, DataOnlyTuple, MultTuple, TempFileTuple

Pick one: I'm sorry I opined.

--  Simon Riggs EnterpriseDB          http://www.enterprisedb.com



Re: "Truncated" tuples for tuple hash tables

От
Tom Lane
Дата:
Simon Riggs <simon@2ndquadrant.com> writes:
> On Mon, 2006-06-26 at 14:36 -0400, Tom Lane wrote:
>> I thought for awhile about MemoryTuple (as contrasted to HeapTuple) but
>> that seems too generic.  Any other thoughts?

> I like MemoryTuple but since we only use it when we go to disk...

> ExecutorTuple, MinimalTuple, DataOnlyTuple, MultTuple, TempFileTuple

MinimalTuple seems good to me ...
        regards, tom lane


Re: "Truncated" tuples for tuple hash tables

От
Tom Lane
Дата:
I wrote:
> There isn't any benefit except where we collect lots of tuples, which is
> to say tuplesort/tuplestore/tuplehashtable.  In other places in the
> executor, there's basically only one transient tuple in existence per
> plan node; jumping through hoops to save 16 bytes per plan node is just
> silly.  (What's more, as of 8.1 most of those tuples will be in "virtual
> tuple" format anyway, and so the optimization wouldn't make any
> difference at all...)

After further study of the code, here's my hit-list of places that could
make worthwhile use of MinimalTuples:
tuplesort.c (in-memory, on-disk case done already)tuplestore.c (in-memory and on-disk)TupleHashTable (execGrouping.c
---used by nodeAgg and nodeSubplan)hash joins (in-memory hash table and tuple "batch" files)analyze.c (tuples collected
in-memoryfor stats analysis)
 

It looks like there is actually not anyplace else in the executor where
we "materialize" tuples anymore, except for execMain.c's INSERT/UPDATE
code, which of course is going to want full tuples it can stash on disk.
Everything else is dealing in TupleTableSlots that probably contain
virtual tuples.

So in one sense this *is* "all across the executor".  But the amount of
code to touch seems pretty limited.
        regards, tom lane


Re: "Truncated" tuples for tuple hash tables

От
Simon Riggs
Дата:
On Mon, 2006-06-26 at 16:43 -0400, Tom Lane wrote:
> I wrote:
> > There isn't any benefit except where we collect lots of tuples, which is
> > to say tuplesort/tuplestore/tuplehashtable.  In other places in the
> > executor, there's basically only one transient tuple in existence per
> > plan node; jumping through hoops to save 16 bytes per plan node is just
> > silly.  (What's more, as of 8.1 most of those tuples will be in "virtual
> > tuple" format anyway, and so the optimization wouldn't make any
> > difference at all...)
> 
> After further study of the code, here's my hit-list of places that could
> make worthwhile use of MinimalTuples:
> 
>     tuplesort.c (in-memory, on-disk case done already)
>     tuplestore.c (in-memory and on-disk)
>     TupleHashTable (execGrouping.c --- used by nodeAgg and nodeSubplan)
>     hash joins (in-memory hash table and tuple "batch" files)

Thats the list I thought you meant.

>     analyze.c (tuples collected in-memory for stats analysis)

Do we save enough there for us to care?

Will that allow us to increase the sample size for larger tables?

--  Simon Riggs EnterpriseDB          http://www.enterprisedb.com



Re: "Truncated" tuples for tuple hash tables

От
Tom Lane
Дата:
Simon Riggs <simon@2ndquadrant.com> writes:
> On Mon, 2006-06-26 at 16:43 -0400, Tom Lane wrote:
>> analyze.c (tuples collected in-memory for stats analysis)

> Do we save enough there for us to care?

Possibly not --- it's certainly low-priority, but I listed it for
completeness.
        regards, tom lane