Обсуждение: domain_in performance considerations

Поиск
Список
Период
Сортировка

domain_in performance considerations

От
Tom Lane
Дата:
Neil Conway wrote some pretty nice things here:
http://www.advogato.org/person/nconway/diary.html?start=26
but commented
> It would be worthwhile to investigate whether this results in a
> performance regression, though: there's no easy way to cache the
> executor machinery needed to evaluate a CHECK constraint in this
> design, whereas the prior design allowed each call-site to implement
> its own EState caching.
which echoes some performance worries I'd expressed in the original
proposal back here:
http://archives.postgresql.org/pgsql-hackers/2005-07/msg00320.php

I wanted to mention a couple of things for the archives, before they
fade from short-term memory:

The patch-as-committed arranges to cache the results of
GetDomainConstraints, which I think is the worst performance hit
involved, since it requires at least one and possibly several indexscans
of the pg_constraint system catalog.  The cache has lifetime equal to
the caller's caching of FmgrInfo for the domain_in function, which is
ordinarily query-lifespan, which I think is about right.  And the output
of GetDomainConstraints is just a palloc'd node tree, so it'll go away
when the memory context containing the FmgrInfo is freed.  So I don't
see any particular issues there.

What is not so pleasant is that the domains.c code instantiates and
destroys an EState and subsidiary structure for each call, if there are
any CHECK constraints to be checked.  This is normally only a few
malloc's, so it's not *that* big a deal, but it could easily be a
performance-limiting factor.

It would be just a small change to make the code cache the EState across
calls, saving a link to it in the FmgrInfo, but I am worried about that.
If the EState's query context is made to be a child of the memory
context containing the caller's FmgrInfo, then there is no problem as
far as memory management goes --- destroying that parent context will
make all the memory associated with the EState go away.  The problem is
that an EState might have other open resources, principally buffer pins,
and there would not be any clean way to close down those resources when
the EState is no longer needed.  We don't have any sort of shutdown
callback that domain_in could make use of in the general case.

It's possible that caching the EState would be OK without any shutdown
callback, because I suspect that an EState used only for legal CHECK
constraint expressions (which may not contain sub-selects) would own no
non-memory resources.  But it'd take some effort to verify that, and it
seems like a pretty fragile assumption over the long haul anyway.

Another idea I had looked at seriously was to make callers of input
functions pass in an EState if they had one available (this could be
done via the "context" field we already have in FunctionCallInfoData).
However, inspection of the existing callers of input functions showed
that that'd only readily work for COPY.  We could maybe have made it
work for plpgsql, but I had a lot of questions as to whether the
available EState would have the right lifespan relative to the FmgrInfo.
Everybody else was out in the cold anyway, because they had no ready
access to any suitable EState.

So I felt that I should commit a version-zero of the patch that I was
pretty confident would *work*, and not get into these dubious
performance-enhancing tweaks.  We can try to spin it up from here,
if needed.  It'd be worth first profiling some domain-intensive queries
and seeing if there's any reason to worry or not.

In any case, I'll fall back on the wisdom of the sage who said
"I can make this program arbitrarily fast ... if it doesn't have to
give the right answer".  Domains are giving measurably more-right
answers today than they were yesterday.  Making 'em fast comes after.
        regards, tom lane


Re: domain_in performance considerations

От
Alvaro Herrera
Дата:
Tom Lane wrote:

Hi,

> It would be just a small change to make the code cache the EState across
> calls, saving a link to it in the FmgrInfo, but I am worried about that.
> If the EState's query context is made to be a child of the memory
> context containing the caller's FmgrInfo, then there is no problem as
> far as memory management goes --- destroying that parent context will
> make all the memory associated with the EState go away.  The problem is
> that an EState might have other open resources, principally buffer pins,
> and there would not be any clean way to close down those resources when
> the EState is no longer needed.  We don't have any sort of shutdown
> callback that domain_in could make use of in the general case.

This is a shot in the dark, but I remember you commenting awhile back
that there was a way to register a callback to be called on memory
context reset or delete.  I wonder if it would be possible to create a
separate ResourceOwner for the EState, and save the EState in a certain
memory context so that a callback would release the buffer pins or
whatever.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: domain_in performance considerations

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> This is a shot in the dark, but I remember you commenting awhile back
> that there was a way to register a callback to be called on memory
> context reset or delete.

AFAIR there's no such thing associated with memory contexts per se.
There is one for EStates, but that requires access to an outer EState,
which is exactly what we lack for the other solution too.

I wonder though if we shouldn't invent one for memory contexts.  Maybe
even replace the EState-specific shutdown callback with mcontext cleanup
on its query context?
        regards, tom lane