Re: index prefetching

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: index prefetching
Дата
Msg-id CAH2-WznETOgna+p=oAr2PZamyTumw1ips3EZvyqUWUV-CeB9zg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: index prefetching  (Tomas Vondra <tomas@vondra.me>)
Список pgsql-hackers
On Wed, Aug 6, 2025 at 10:12 AM Tomas Vondra <tomas@vondra.me> wrote:
> That's quite possible. What concerns me about using tables like pgbench
> accounts table is reproducibility - initially it's correlated, and then
> it gets "randomized" by the workload. But maybe the exact pattern
> depends on the workload - how many clients, how long, how it correlates
> with vacuum, etc. Reproducing the dataset might be quite tricky.

I meant a pristine/newly created pgbench_accounts_pkey index.

> That's why I prefer using "reproducible" data sets. I think the data
> sets with "fuzz" seem like a pretty good model. I plan to experiment
> with adding some duplicate values / runs, possibly with two "levels" of
> randomness (global for all runs, and smaller local perturbations).

Agreed that reproducibility is really important.

> > I bet that there's at least a couple of not-that-noticeable
> > performance bugs, for example due to some hard to pin down issue with
> > prefetch distance getting out of hand. Possibly because the read
> > stream doesn't get to see contiguous requests for TIDs that point to
> > the same heap page, but does see it when things are slightly out of
> > order. Two different queries that have approximately the same accesses
> > should have approximately the same performance -- minor variations in
> > leaf page layout or heap page layout or scan direction shouldn't be
> > confounding.
> >
>
> I think in a way cases like that are somewhat inherent, I wouldn't even
> call that "bug" probably. Any heuristics (driving the distance) will
> have such issues. Give me a heuristics and I'll construct an adversary
> case breaking it.
>
> I think the question will be how likely (and how serious) such cases
> are. If it's rare / limited to cases where we're unlikely to pick an
> index scan etc. then maybe it's OK.

It's something that needs to be considered on a case-by-case basis.
But in general when I see an inconsistency like that, I'm suspicious.
The difference that I see right now feels quite random and
unprincipled. It's not a small difference (375.752 ms vs 465.370 ms
for the backwards scan).

Maybe if I go down the road of fixing this particular issue, I'll find
myself playing performance whack-a-mole, where every change that
benefits one query comes at some cost to some other query. But I doubt
it.

> I plan to keep testing with buffered I/O (with "io_method=worker"),
> simply because that's what most systems will keep using for a while. But
> it's a good idea to test with direct I/O too.

OK.

> Same here. I read the comments about batch mode and deadlocks multiple
> times, and it's still not clear to me what exactly would be needed to
> make it safe.

It feels like the comments about READ_STREAM_USE_BATCHING could use some work.

> > The main difficulty with GiST may be that we may be obligated to fix
> > existing (unfixed!) bugs that affect index-only scans. The master
> > branch is subtly broken, but we can't in good conscience ignore those
> > problems while making these kinds of changes.
> >
>
> Right, that's a valid point.
>
> The thing that worries me a bit is that the ordered scans (e.g. with
> reordering by distance) detach the scan from the leaf pages, i.e. the
> batches are no longer "tied" to a leaf page.
>
> Perhaps "worries" is not the right word - I don't think it should be a
> problem, but it's a difference.

Obviously, the problem that GiST ordered scans create for us isn't a
new one. The new API isn't that different to the old amgettuple one in
all the ways that matter here. amgettuple has exactly the same
stipulations about holding on to buffer pins to prevent unsafe
concurrent TID recycling -- stipulations that GiST currently just
ignores (at least in the case of index-only scans, which cannot rely
on a _bt_drop_lock_and_maybe_pin-like mechanism to avoid unsafe
concurrent TID recycling hazards).

If, in the end, the only solution that really works for GiST is a more
aggressive/invasive one than we'd prefer, then making those changes
must have been inevitable all along -- even with the old amgettuple
interface. That's why I'm not too worried about GiST ordered scans;
we're not making that problem any harder to solve. It's even possible
that it'll be a bit *easier* to fix the problem with the new batch
interface, since it somewhat normalizes the idea of hanging on to
buffer pins for longer.

--
Peter Geoghegan



В списке pgsql-hackers по дате отправления: