Обсуждение: Re: When Update balloons memory

Поиск
Список
Период
Сортировка

Re: When Update balloons memory

От
Tom Lane
Дата:
[ redirecting to pgsql-bugs ]

Klaudie Willis <Klaudie.Willis@protonmail.com> writes:
> Turns out the base case is simpler than I thought. Not involving partitions at all

> CREATE TABLE public.part_main (
>     txid bigint,
>     actiondate timestamp without time zone NOT NULL
> );

> insert into part_main
> select x, '2019-06-01'::timestamp + x%365 * interval '1 day'
> from generate_series(1, 30 * 1E6) as x;

> CREATE INDEX partindx ON public.part_main USING btree ((actiondate)::date);  -- mem bug?
> -- mem runaway follows
> update part_main set txid = txid + 1;

ITYM "((actiondate::date))", but yeah, this leaks memory like there's
no tomorrow.  I traced it to 9dc718bdf (Pass down "logically unchanged
index" hint), which has added a function index_unchanged_by_update()
that (a) looks fairly expensive, (b) leaks a copy of every expression
tree it examines, and (c) is invoked over again for each row, even
though AFAICS the answer shouldn't change across rows.  This seems very
poorly thought through.  Peter?

            regards, tom lane

PS: personally I would have used pull_varnos() instead of reinventing
that wheel.  But in any case the real problem is repeated invocation.



Re: When Update balloons memory

От
Peter Geoghegan
Дата:
On Tue, Dec 14, 2021 at 7:58 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> ITYM "((actiondate::date))", but yeah, this leaks memory like there's
> no tomorrow.  I traced it to 9dc718bdf (Pass down "logically unchanged
> index" hint), which has added a function index_unchanged_by_update()
> that (a) looks fairly expensive, (b) leaks a copy of every expression
> tree it examines, and (c) is invoked over again for each row, even
> though AFAICS the answer shouldn't change across rows.  This seems very
> poorly thought through.  Peter?

Ugh, what a howler. Clearly I am at fault here. Apologies.

Are you sure that it would really be worth the trouble of caching our
answer? It's not clear that that has only minimal maintenance burden.
I have always suspected that index_unchanged_by_update() was at least
slightly over-engineered.

The fact is that most individual aminsert() calls that get the hint
will never actually apply it in any way. In practice the hint is only
relevant when there isn't enough space on an nbtree leaf page to fit
the incoming item. Even then, it won't be used when there are LP_DEAD
bits set on the leaf page -- we prefer to perform a conventional index
deletion over a bottom-up index deletion. And so there is a fair
practical argument to be made in favor of assuming that we should give
the hint in cases where we can't rule it out inexpensively. Of course
that assumes that there will be no other use for the hint in the
future. I'm not making this argument myself, but it does seem like a
factor worth considering.

-- 
Peter Geoghegan



Re: When Update balloons memory

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> Are you sure that it would really be worth the trouble of caching our
> answer? It's not clear that that has only minimal maintenance burden.

I'd be inclined to do so if we can find a suitable place to put it.
But wouldn't a field in IndexInfo serve?  Letting the field default
to "not optimizable" would cover most cases.

> The fact is that most individual aminsert() calls that get the hint
> will never actually apply it in any way.

Yeah, you could make an argument that just not trying to optimize when
there are index expressions would be fine for this --- and we may have
to fix it that way in v14, because I'm not sure whether adding a field
in IndexInfo would be safe ABI-wise.  But ISTM that the overhead of
index_unchanged_by_update is a bit more than I care to pay per row
even when it's only considering plain index columns.  I'm generally
allergic to useless per-row computations, especially when they're
being added by an alleged performance improvement.

Another thing we ought to check into is the extent to which this
is duplicative of the setup calculations for HOT updates --- I seem
to recall that there's already roughly-similar logic somewhere else.

And, not to be too picky, but does this cope with the case where
an indexed column is changed by a BEFORE trigger, not by the
query proper?

            regards, tom lane



Re: When Update balloons memory

От
Peter Geoghegan
Дата:
On Tue, Dec 14, 2021 at 11:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'd be inclined to do so if we can find a suitable place to put it.
> But wouldn't a field in IndexInfo serve?  Letting the field default
> to "not optimizable" would cover most cases.

I'll come up with a patch for that soon.

> Yeah, you could make an argument that just not trying to optimize when
> there are index expressions would be fine for this --- and we may have
> to fix it that way in v14, because I'm not sure whether adding a field
> in IndexInfo would be safe ABI-wise.  But ISTM that the overhead of
> index_unchanged_by_update is a bit more than I care to pay per row
> even when it's only considering plain index columns.  I'm generally
> allergic to useless per-row computations, especially when they're
> being added by an alleged performance improvement.

I am tempted to broach the idea of always giving the hint in the case
of a non-HOT update, actually. But that's probably too weird to
countenance when you take a broader, API-level view of things. (So
I'll skip the explanation of why I think that might be reasonable from
the point of view of the nbtree code.)

> Another thing we ought to check into is the extent to which this
> is duplicative of the setup calculations for HOT updates --- I seem
> to recall that there's already roughly-similar logic somewhere else.

That's handled fairly directly, on the heapam side. At the top of
heap_update(), with some relcache infrastructure. Unlike
heap_update(), index_unchanged_by_update() cares about which specific
indexes have "logically modified" attributes. We already know for sure
that the update can't have been a HOT UPDATE when
index_unchanged_by_update() is reached, of course.

> And, not to be too picky, but does this cope with the case where
> an indexed column is changed by a BEFORE trigger, not by the
> query proper?

No. It's much better to err in the direction of giving the hint,
rather than not giving the hint. In order for us to make the category
of error that seems like it might actually be a problem (not giving
the hint when we should), the BEFORE trigger would have to "undo" an
explicit change to an updated column.

We also want to give the hint when a partial index is subject to lots
of non-HOT updates, when successive updates make the predicate flip
between matching and not matching. That was shown to be particularly
valuable (with a workload that has such an index). So the fact that we
don't handle predicates is intentional, even though the justification
for that relies on an implementation deficiency in HOT, that might be
fixed some day.

-- 
Peter Geoghegan



Re: When Update balloons memory

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> On Tue, Dec 14, 2021 at 11:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> And, not to be too picky, but does this cope with the case where
>> an indexed column is changed by a BEFORE trigger, not by the
>> query proper?

> No. It's much better to err in the direction of giving the hint,
> rather than not giving the hint. In order for us to make the category
> of error that seems like it might actually be a problem (not giving
> the hint when we should), the BEFORE trigger would have to "undo" an
> explicit change to an updated column.

Uh ... it seems that you are writing as though "giving the hint"
means saying that the column value changed.  That seems quite
confusingly backwards to me, as that is/ought to be the expected
assumption.  Maybe you should invert the flag state while you
are at it.

            regards, tom lane



Re: When Update balloons memory

От
Peter Geoghegan
Дата:
On Tue, Dec 14, 2021 at 3:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Uh ... it seems that you are writing as though "giving the hint"
> means saying that the column value changed.  That seems quite
> confusingly backwards to me, as that is/ought to be the expected
> assumption.

That's not what I meant. When I say "give the hint", I mean pass
"indexUnchanged = true" to aminsert(). This is interpreted within
btinsert() as "the incoming index tuple is a duplicate of at least one
index, so perform a bottom-up index deletion pass if and when the
alternative is splitting the leaf page". In practice the hint always
has to be treated as a noisy signal about what might work, as a
strategy of last resort, with costs that are imposed on non-HOT
updaters.

-- 
Peter Geoghegan



Re: When Update balloons memory

От
Peter Geoghegan
Дата:
On Tue, Dec 14, 2021 at 11:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'd be inclined to do so if we can find a suitable place to put it.
> But wouldn't a field in IndexInfo serve?  Letting the field default
> to "not optimizable" would cover most cases.

Attached draft HEAD-only bugfix adds two new bool fields to IndexInfo.
The first bool indicates if we've already done the required work for
this IndexInfo. The second field is used as a cache (if the cache is
set the first bool is 'true'). These two fields fit in existing
alignment padding, so the marginal space overhead is zero.

I'll probably need to greatly simplify the code for backpatch, to
avoid an ABI break. Seems fine to teach index_unchanged_by_update to
return "true" unconditionally, given how the IndexUnchanged hint is
currently applied.

I haven't made the code use pull_varnos(), which you suggested back in
December. It looks like it would be tricky to do that from the
executor, since pull_varnos() has a PlannerInfo* argument. That has
been the case since your commit 55dc86eca7 from January 2021, "Fix
pull_varnos' miscomputation of relids set for a PlaceHolderVar".
Please advise.

-- 
Peter Geoghegan

Вложения

Re: When Update balloons memory

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> I haven't made the code use pull_varnos(), which you suggested back in
> December. It looks like it would be tricky to do that from the
> executor, since pull_varnos() has a PlannerInfo* argument. That has
> been the case since your commit 55dc86eca7 from January 2021, "Fix
> pull_varnos' miscomputation of relids set for a PlaceHolderVar".
> Please advise.

Pass NULL for that, per 6867f963e:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=6867f963e#patch2

We'd have to back-patch that bit, but I don't see any problem
with doing so.

            regards, tom lane



Re: When Update balloons memory

От
Peter Geoghegan
Дата:
On Tue, Jan 11, 2022 at 11:54 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Pass NULL for that, per 6867f963e:
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=6867f963e#patch2

Will look into that.

> We'd have to back-patch that bit, but I don't see any problem
> with doing so.

But the back-patch fix will make index_unchanged_by_update return true
unconditionally (to avoid an ABI break). It won't actually do anything
with Vars, which, as I said, seems okay given the current way in which
we apply the hint.

-- 
Peter Geoghegan



Re: When Update balloons memory

От
Peter Geoghegan
Дата:
On Tue, Jan 11, 2022 at 11:59 AM Peter Geoghegan <pg@bowt.ie> wrote:
> On Tue, Jan 11, 2022 at 11:54 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Pass NULL for that, per 6867f963e:
> >
> > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=6867f963e#patch2
>
> Will look into that.

Took a look. Not sure that using pull_varnos would represent an
improvement, though. Do you feel strongly about it?

The current approach has the advantage of allowing
index_expression_changed_walker to directly apply an
FirstLowInvalidHeapAttributeNumber offset to each Var's varattno. We
can't just use bms_overlap to check for overlap between the bms
returned by pull_varnos and our updatedColumns bms. It would be
possible to transform the bms returned by pull_varnos to compensate,
of course, but that seems rather awkward

-- 
Peter Geoghegan



Re: When Update balloons memory

От
Peter Geoghegan
Дата:
On Tue, Jan 11, 2022 at 1:15 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Took a look. Not sure that using pull_varnos would represent an
> improvement, though. Do you feel strongly about it?

Pushed a fix just now. No need to block on adding the cache/basic fix,
I think, since the pull_varnos thing is really a separate question.

Thanks
--
Peter Geoghegan