Обсуждение: About Custom Aggregates, C Extensions and Memory

Поиск
Список
Период
Сортировка

About Custom Aggregates, C Extensions and Memory

От
Marthin Laubscher
Дата:

Hi,

 

I’ll skip over context (until someone asks or I can showcase the results) and cut to the chase:

 

A custom aggregate seems the best vehicle for what I seek to implement. Given the processing involved, it’s probably best written in C.

 

That makes the aggregate and opaque value encoded and compressed to an internal format that allows direct equality testing and comparison. For everything else it needs to be decoded into memory, worked on and then encoded into a value as expected by the database ecosystem.

 

The challenge being that decoding and encoding presents a massive overhead (easily 2 orders of magnitude or more) compared to the lightning fast operations to e.g. add or remove a value from the aggregate while in memory, killing performance and limiting potential.

 

Naturally I’m looking for feasible options to retain and reuse accumulator values decoded in memory at least between successive calls when aggregating a set of values, and ideally, also when the aggregate is used again later in the same session or query, within reason of course.

 

I’ve tried but failed to gain the sufficient understanding of the life and limits of palloc/palloc0 memory in the aggregate and C extension context from the documentation to reformulate my algorithms for such environment around whatever opportunities might exist.

 

My project needs it, I cannot postpone much longer. But I need a sherpa, someone who knows the terrain, pathways and pitfalls, who’ll engage with me for a little while to alleviate my ignorance and uncertainties. If I knew in advance what all the questions were, they’d no longer be questions, but I don’t. Albeit as async and terse as you like, I need a conversation that can discard unworkable alternatives early and focus instead on whatever is compatible with the environment.

 

When we’re done, I’d be happy to write up and suggest the documentation updates I believe would have circumvented this cry for help.

 

Who’s willing to help me find my way up this mountain or turn it into a mole-hill?

 

Regards,

Marthin Laubscher

Re: About Custom Aggregates, C Extensions and Memory

От
Tom Lane
Дата:
Marthin Laubscher <postgres@lobeshare.co.za> writes:
> A custom aggregate seems the best vehicle for what I seek to implement. Given the processing involved, it’s probably
bestwritten in C. 
> That makes the aggregate and opaque value encoded and compressed to an internal format that allows direct equality
testingand comparison. For everything else it needs to be decoded into memory, worked on and then encoded into a value
asexpected by the database ecosystem.  
> The challenge being that decoding and encoding presents a massive overhead (easily 2 orders of magnitude or more)
comparedto the lightning fast operations to e.g. add or remove a value from the aggregate while in memory, killing
performanceand limiting potential. 

Yeah.  What you want is to declare the aggregate as having transtype
"internal" (which basically means that ExecAgg will store a pointer
for you) and make that pointer point to a data structure kept in the
"aggcontext", which will have a suitable lifespan.  json_agg() might
be a suitable example to look at.  Keep in mind that the finalfn
mustn't modify the stored state, as there are optimizations where
it'll be applied more than once.

            regards, tom lane



Re: About Custom Aggregates, C Extensions and Memory

От
Marthin Laubscher
Дата:
Tom Lane < tgl@sss.pgh.pa.us> writes:

> Yeah. What you want is to declare the aggregate as having transtype
> "internal" (which basically means that ExecAgg will store a pointer
> for you) and make that pointer point to a data structure kept in the
> "aggcontext", which will have a suitable lifespan. json_agg() might
> be a suitable example to look at. Keep in mind that the finalfn
> mustn't modify the stored state, as there are optimizations where
> it'll be applied more than once.

Thank you.

Stunned, once more, by the incredible depth of prior art in the Postgres ecosystem, allow me to ask before
underestimatingit again.
 
 
I was fixing on (ab)using user-defined aggregates for a concept that would, on second thought, be better described as a
key-onlytable persisted and manipulated as a single opaque value that is only ever expanded to a table/set/collection
inmemory. Initial focus is on storing 8-byte integer IDs but bigger UUIDs might be needed down the line. 
 

I've leaned away from involving any composite or table semantics for no better reason than inadequate understanding of
in-memorytable opportunities and limitations. I ended up planning a user-defined type with the last base-type on the
list- pseudo type - which includes "internal" that I would define the aggregate to produce values for.
 

Then I read your mail, some more documentation, and discovered
a) Moving-Aggregate Mode,
b) which has a separate implementation to account for forward and inverse transition (which is relevant to my use
case),
d) accessible mainly via window functions (which are not that relevant to my use case),
d) doesn't cover my "other" other operations including set-operations like intersection, difference, union, 
e) which all would benefit from being able to reuse the same decoded into memory representation as the aggregate.

Could PostgreSQL be as brilliantly put together that the internal pseudo base type for a user defined type and the
internaltranstype in a user defined aggregate are one and the same thing, essentially extending the lifespan and
utilityof the in-memory version further towards what I really need?
 

What would be your thoughts and advice here? Are there more prior-art gems that tamed this wilderness before? Am I
justifiablyor too easily scared by in-memory table semantics and implications when what I'm dealing with really amount
tosets rather than (opaque) scalar values? I definitely want to store values representing an entire set at time in a
columnvalue, there's no place for names - the opaque value IS the identity of the set (mathematic bijection), which
runsorthogonal to the original Sequel, now SQL, but are there other facilities around I'm duplicating here or have
enoughin common with to leverage?
 

Thanks again for your time and your utterly unique and dedicated approach keeping such firm yet friendly control of
PostgreSQL.It is truly appreciated and noticed, especially in the chaotic context of open source. 8 years your junior I
spenta life as design authority, but in a corporate setting, meaning I find dealing with open-source people worse than
herdingcats, so I think you're doing incredibly well. Thank you.
 

Regards,
Marthin Laubscher





Re: About Custom Aggregates, C Extensions and Memory

От
Tom Lane
Дата:
Marthin Laubscher <postgres@lobeshare.co.za> writes:
> I was fixing on (ab)using user-defined aggregates for a concept that would, on second thought, be better described as
akey-only table persisted and manipulated as a single opaque value that is only ever expanded to a table/set/collection
inmemory. Initial focus is on storing 8-byte integer IDs but bigger UUIDs might be needed down the line.  

Hm.  We do not have in-memory tables, although in some cases a
temporary table is close enough.  But there is one other pre-existing
mechanism that might help you: "expanded objects".  The idea there is
that you have some "flat" representation of your data type that can
go into a table at need, but you also have an in-memory representation
that is better suited to computation, and most of your complicated
operations prefer to work on the expanded representation.  The name of
the game then becomes how to minimize the number of times a value gets
flattened and expanded as you push it around in a computation.
As of v18, we have a pretty decent story for that when it comes to
values you are manipulating within pl/pgsql functions, although less
so if you need to use other languages.  (The expanded-object
functionality exists before v18, but it's hard for user-defined types
to avoid excess flattening in earlier versions.)

If that sounds promising, see

src/include/utils/expandeddatum.h
src/backend/utils/adt/expandeddatum.c

as well as the two in-core uses of the concept:

src/backend/utils/adt/array_expanded.c
src/backend/utils/adt/expandedrecord.c

This thread that led to the v18 improvements might be useful too:

https://www.postgresql.org/message-id/flat/CACxu%3DvJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg%40mail.gmail.com

            regards, tom lane



Re: About Custom Aggregates, C Extensions and Memory

От
Marthin Laubscher
Дата:
Tom Lane tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us> wrote:


> Hm. We do not have in-memory tables, although in some cases a temporary table is close enough.

Yay, I didn't somehow overlook them.

> But there is one other pre-existing mechanism that might help you: "expanded objects". The idea there is that you
havesome "flat" representation of your data type that can go into a table at need, but you also have an in-memory
representationthat is better suited to computation, and most of your complicated operations prefer to work on the
expandedrepresentation. The name of the game then becomes how to minimize the number of times a value gets flattened
andexpanded as you push it around in a computation. 
>
> As of v18, we have a pretty decent story for that when it comes to values you are manipulating within pl/pgsql
functions,although less so if you need to use other languages. (The expanded-object functionality exists before v18,
butit's hard for user-defined types to avoid excess flattening in earlier versions.) 
>
> If that sounds promising, see...

I don't yet command the internal variety to engage with the complexities of Expanded Objects, which I thought formed
partof the TOAST domain I was trying not to get too caught up in just yet. 

I suspect that putting emphasis on the opacity of the aggregate value made it unclear that the I'd like for values of
myUDT to look a lot like ordinary variable length binary data values for which the <, > and = operations apply with no
decodingrequired. I only need the sticky decoding for more involved operations like aggregation, various set type
operationsincluding the simplistic adding or removing a scalar value rather than a set of size one, and testing set
membershipof scalars index values with whichever is more appropriate between IN or ANY.  I implemented the first
trivial/ naïve attempts in plpgsql but that environment really wasn't conducive to get the real functions written, but
Ionce spoke C better than English (not a joke) so I'm pretty confident I can and should do it there. 

Another, who shall not be named, suggested I take a look at the different memory contexts so I had a look. If I
understoodcorrectly, the User Defined Type itself does store something analogous to the aggcontext in an aggregate, but
inthe input and output functions as well as other operators and functions on the UDT I could define an internal memory
structure,switch to a memory context with a suitable lifespan, allocate and manage memory using the appropriate
functionsalong the lines StringInfo does and remembering to switch back to the original memory context the each
functiongets called with. There's even a suggestion that UDT specific custom memory contexts may be created in the
TopMemoryContextusing AllocSetContextCreate with some name or identifier. I'd have to figure out what to use for such
identifier.Maybe there UDT instance already has a database or tuple ID that can be adopted for that purpose. It looks
likeI'd have to find a way to broker between "normal function memory contexts" and per tuple aggregate memory contexts
butthere are language in the comments around that about what is and isn't kosher that I don't know how to interpret..  

I don't particularly trust the source. I've paraphrased the above but not all the terms mentioned could be found in the
GitHubcode base. So it could be utter bollocks, outdated information, never meant for public consumption, or entirely
accurateand useful. 

Does hearing me mention such things cause you nightmares, either for the perils I'd be facing or the mess it is likely
tocause in your beloved database, or could it be the start of a plan that just might work?  

Essentially the aggregate functions would still be front and centre as defined for the user defined type, and though
theuser defined type itself would be largely unaware of it, all the individual functions that manipulate values of the
UDTwould go through the same process of getting access to the value in decoded for if it already exist before calling
thedecoding routines if it doesn't. If I choose the right memory context, would that simply age-out when the session,
transaction,query or aggregate is done, or how what else would know we're done with the memory so we can let go of it?
Asfor transitioning between per tuple aggregate context and normal function context a plan can be devised if the
transitionpoints can be detected, to copy stuff across in memory. Should be possible e.g. to do that in a final
function,even without disturbing the value on the aggregate side of things as advised before. 

You're forgiven for thinking I am crazy to consider manually manipulating memory contexts simpler than the high-level
supportfunctions created to toast and detoast values at the plpgsql level of abstraction. I'm a big fan of failing
earlyand hard when things (besides user input) isn't as expected, and also a big fan of explicit programming, i.e. as
littlemagical side-effects as possible.  

The SuiteSparse:GraphBLASTS stuff that sparked the conversation you referred to is very likely to play a role in my
projectsomewhere down the line. They don't know the first thing about what I'm doing and why, I might not always see
eyeto eye with all the players from the domains where their work has found fertile grounds, and I'm far from ready for
suchcomplications, but it seems inevitable that our paths will cross at some point. Thanks for the reference, however
unintentional.

Regards,
Marthin Laubscher







Re: About Custom Aggregates, C Extensions and Memory

От
Tom Lane
Дата:
Marthin Laubscher <postgres@lobeshare.co.za> writes:
> Essentially the aggregate functions would still be front and centre as defined for the user defined type, and though
theuser defined type itself would be largely unaware of it, all the individual functions that manipulate values of the
UDTwould go through the same process of getting access to the value in decoded for if it already exist before calling
thedecoding routines if it doesn't. If I choose the right memory context, would that simply age-out when the session,
transaction,query or aggregate is done, or how what else would know we're done with the memory so we can let go of it? 

Well, yeah, that's the problem.  You can certainly maintain your own
persistent data structure somewhere, but then it's entirely on your
head to manage it and avoid memory leakage/bloating as you process
more and more data.  The mechanisms I pointed you at provide a
structure that makes sure space gets reclaimed when there's no longer
a reference to it, but if you go the roll-your-own route then it's
a lot messier.

A mechanism that might work well enough is a transaction-lifespan
hash table.  You could look at, for example, uncommitted_enum_types
in pg_enum.c for sample code.

            regards, tom lane



Re: About Custom Aggregates, C Extensions and Memory

От
Marthin Laubscher
Дата:
Tom Lane tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us> wrote:

> Well, yeah, that's the problem. You can certainly maintain your own persistent data structure somewhere, but then
it'sentirely on your head to manage it and avoid memory leakage/bloating as you process more and more data. The
mechanismsI pointed you at provide a structure that makes sure space gets reclaimed when there's no longer a reference
toit, but if you go the roll-your-own route then it's a lot messier.
 

Of course, I'm sole owner and admin of every instance of PostgreSQL involved, but a great many things have always been
andwill remain on my head if I "gots it wrongly", so yes, rolling my own would be an option of last resort. I was able
to"see" how the aggregate memory mechanism worked but confess that I must still connect the dots in the expanded datum
case.I know you're seeing something there I don't yet recognise, so I'll look again.  
 

> A mechanism that might work well enough is a transaction-lifespan hash table. You could look at, for example,
uncommitted_enum_typesin pg_enum.c for sample code.
 

And at that, naturally.

Trying hard to not get too far ahead of myself here, perhaps I should try running into the foreseen performance issue
beforeaddressing it. How about I make a trivial implementation (skipping over the complex optimised processing,
compressionand advanced logic ensuring canonical compressed values) of the aggregate and user defined type the
aggregatewould calculate, and put that up for a review first.
 

If all goes to script, the result ought to be that the aggregate would nicely reuse the decoded version of the
aggregatein memory and forget it existed when the final function has been run. But then each user defined type function
gettingcalled would decode the stored byte array, do its bit on the data, and encode to a byte array again.
 

It won't be optimal, but it will be simple, and go a long way towards ensuring I got the whole type and aggregate
ecosystemset up as it should be. It will also settle any doubts as to whether membership tests would use "IN" or "ANY"
semantics.

Step next would be to use say the transaction memory context to retain the decoded version in memory between calls to
different(non-aggregate) UDT functions (made in the same transaction). Beyond finding the appropriate context to use,
thechallenge, I understand, would be to identify the particular instance of the UDT in memory, which is where you're
suggestingusing a hash table. With fewer outstanding vagaries and uncertainties, the way forward might be quite obvious
bythen, but from where I stand right now  I can only think of worse ways than hashing, so it's most likely be hashing.


That said, the two most important reality checks we'd have to consider at that point would be:

a) whether there are ever enough, as in more than three, of these values present in the same memory context to warrant
thehash calculation overhead since a full comparison will be required anyway, and
 

b) whether the special characteristic of my UDT values where identical values have, by definition, identical structure
inmemory offers enough opportunity to use a simple reference count to decide if a value can/should be changed in-place,
ifthe changed value should be written to its own fresh slot or just need an increased reference count on memory that
alreadyrepresent that value. 
 

Once we cracked the intra-function nut, we could look for a way to share memory between the aggregate- and other
UDT-functions.

Either way, I've got a boat load more reading and writing to do. Thank you ever so much for your time and attention. It
isa great kindness, well appreciated.
 

Regards,
Marthin Laubscher