Обсуждение: Questions regarding Index AMs and natural ordering

Поиск
Список
Период
Сортировка

Questions regarding Index AMs and natural ordering

От
Matthias van de Meent
Дата:
Hi,

Over in [0] and [1] there are patches that touch on the topic of
'natual ordering' index retrieval, and [2] also touches on the topic.
For those patches, I've been looking at how the planner and executor
indicate to index AMs that they expects the output to be ordered, and
how this ordering should work.
I've mostly found how it works for index_key opr constant, but I've
yet to find a good mental model for how the planner handles indexes
that can expose the 'intrinsic order' of data, i.e. indexes with
`amcanorder=true`, because there is very little (if any) real
documentation on what is expected from indexes when it advertises
certain features, and how the executor signals to the AM that it wants
to make use of those features.

For example, btree ignores any ordering scan keys that it is given in
btrescan, which seems fine for btree because the ordering of a btree
is static (and no other order than that is expected apart from it's
reverse order), but this becomes problematic for other indexes that
could return ordered data but would prefer not to have to go through
the motions of making sure the return order is 100% correct, rather
than a k-sorted sequence, or just the matches to the quals (like
GIST). Is returning index scan results in (reverse) natural order not
optional but required with amcanorder? If it is required, why is the
am indicator called 'canorder' instead of 'willorder', 'doesorder' or
'isordered'?

Alternatively, if an am should be using the order scan keys from
.amrescan and natural order scans also get scan keys, is there some
place I find the selection process for ordered index AMs, and how this
ordering should be interepreted? There are no good examples available
in core code because btree is special-cased, and there are no other
in-tree AMs that have facilities where both `amcanorderbyop` and
`amcanorder` are set.

I did read through indexam.sgml, but that does not give a clear answer
on this distinction of 'amcanorder' having required ordered results or
not, nor on how to interpret amrescan's orderbys argument. I also
looked at planner code where it interacts with amcanorder /
amcanorderbyop, but I couldn't wrap my head around its interactions
with indexes, either (more specifically, the ordering part of those
indexes) due to the complexity of the planner and the many layers that
the various concepts are passed through. The README in
backend/optimizer didn't answer this question for me, either.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

[0] https://www.postgresql.org/message-id/flat/EB2AF704-70FC-4D73-A97A-A7884A0381B5%40kleczek.org
[1]
https://www.postgresql.org/message-id/flat/CAH2-Wz%3DksvN_sjcnD1%2BBt-WtifRA5ok48aDYnq3pkKhxgMQpcw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/e70fa091-e338-1598-9de4-6d0ef6b693e2%40enterprisedb.com



Re: Questions regarding Index AMs and natural ordering

От
Peter Geoghegan
Дата:
On Thu, Nov 23, 2023 at 9:16 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> For example, btree ignores any ordering scan keys that it is given in
> btrescan, which seems fine for btree because the ordering of a btree
> is static (and no other order than that is expected apart from it's
> reverse order), but this becomes problematic for other indexes that
> could return ordered data but would prefer not to have to go through
> the motions of making sure the return order is 100% correct, rather
> than a k-sorted sequence, or just the matches to the quals (like
> GIST). Is returning index scan results in (reverse) natural order not
> optional but required with amcanorder? If it is required, why is the
> am indicator called 'canorder' instead of 'willorder', 'doesorder' or
> 'isordered'?

I don't know. I have a hard time imagining an index AM that is
amcanorder=true that isn't either nbtree, or something very similar
(so similar that it seems unlikely that anybody would actually go to
the trouble of implementing it from scratch).

You didn't mention support for merge joins. That's one of the defining
characteristics of an amcanorder=true index AM, since an
"ammarkpos/amrestrpos function need only be provided if the access
method supports ordered scans". It's hard to imagine how that could
work with a loosely ordered index. It seems to imply that the scan
must always work with a simple linear order.

Cases where the planner uses a merge join often involve an index path
with an "interesting sort order" that "enables" the merge join.
Perhaps most of the alternative plans (that were almost as cheap as
the merge join plan) would have had to scan the same index in the same
way anyway, so it ends up making sense to use a merge join. The merge
join can get ordered results from the index "at no extra cost". (All
of this is implicit, of course -- the actual reason why the planner
chose the merge join plan is because it worked out to be the cheapest
plan.)

My impression is that this structure is fairly well baked in -- which
the optimizer/README goes into. That is, the planner likes to think of
paths as having an intrinsic order that merge joins can take advantage
of -- merge joins tend to win by being "globally optimal" without
being "locally optimal". So generating extra index paths that don't
have any intrinsic order (but can be ordered using a special kind of
index scan) seems like it might be awkward to integrate.

I'm far from an expert on the planner, so take anything that I say
about it with a grain of salt.

> Alternatively, if an am should be using the order scan keys from
> .amrescan and natural order scans also get scan keys, is there some
> place I find the selection process for ordered index AMs, and how this
> ordering should be interepreted? There are no good examples available
> in core code because btree is special-cased, and there are no other
> in-tree AMs that have facilities where both `amcanorderbyop` and
> `amcanorder` are set.

The general notion of a data type's sort order comes from its default
btree operator class, so the whole idea of a generic sort order is
deeply tied to the nbtree AM. That's why we sometimes have btree
operator classes for types that you'd never actually want to index (at
least not using a btree index).

--
Peter Geoghegan



Re: Questions regarding Index AMs and natural ordering

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> On Thu, Nov 23, 2023 at 9:16 AM Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
>> Is returning index scan results in (reverse) natural order not
>> optional but required with amcanorder? If it is required, why is the
>> am indicator called 'canorder' instead of 'willorder', 'doesorder' or
>> 'isordered'?

> I don't know. I have a hard time imagining an index AM that is
> amcanorder=true that isn't either nbtree, or something very similar
> (so similar that it seems unlikely that anybody would actually go to
> the trouble of implementing it from scratch).

Agreed on that, but I don't have that hard a time imagining cases
where it might be useful for btree not to guarantee ordered output.
IIRC, it currently has to do extra pushups to ensure that behavior
in ScalarArrayOp cases.  We've not bothered to expand the planner
infrastructure to distinguish "could, but doesn't" paths for btree
scans, but that's more about it not being a priority than because it
wouldn't make sense.  If we did put work into that, we'd probably
generate multiple indexscan Paths for the same index and same index
conditions, some of which are marked with sort ordering PathKeys and
some of which aren't (and have a flag that would eventually tell the
index AM not to bother with sorting at runtime).

> The general notion of a data type's sort order comes from its default
> btree operator class, so the whole idea of a generic sort order is
> deeply tied to the nbtree AM.

Right.  There hasn't been a reason to decouple that, just as our
notions of hashing are tied to the hash AM.  This doesn't entirely
foreclose other AMs that handle sorted output, but it constrains
them to use btree's opclasses to represent the ordering.

            regards, tom lane



Re: Questions regarding Index AMs and natural ordering

От
Peter Geoghegan
Дата:
On Thu, Nov 23, 2023 at 11:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Agreed on that, but I don't have that hard a time imagining cases
> where it might be useful for btree not to guarantee ordered output.
> IIRC, it currently has to do extra pushups to ensure that behavior
> in ScalarArrayOp cases.  We've not bothered to expand the planner
> infrastructure to distinguish "could, but doesn't" paths for btree
> scans, but that's more about it not being a priority than because it
> wouldn't make sense.

I'm glad that you brought that up, because it's an important issue for
my ScalarArrayOp patch (which Matthias referenced). My patch simply
removes this restriction from the planner -- now ScalarArrayOp clauses
aren't treated as a special case when generating path keys. This works
in all cases because the patch generalizes nbtree's approach to native
ScalarArrayOp execution, allowing index scans to work as one
continuous index scan in all cases.

As you might recall, the test case that exercises the issue is:

SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;

It doesn't actually make much sense to execute this as two primitive
index scans, though. The most efficient approach is to perform a
single index descent, while still being able to use a true index qual
for "tenthous" (since using a filter qual is so much slower due to the
overhead of accessing the heap just to filter out non-matching
tuples). That's what the patch does.

This would be true even without the "ORDER BY" -- accessing the leaf
page once is faster than accessing it twice (same goes for the root).
So I see no principled reason why we'd ever really want to start a
primitive index scan that wasn't "anchored to an equality constraint".
This is reliably faster, while also preserving index sort order,
almost as a side-effect.

--
Peter Geoghegan



Re: Questions regarding Index AMs and natural ordering

От
Matthias van de Meent
Дата:
On Thu, 23 Nov 2023 at 19:52, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Thu, Nov 23, 2023 at 9:16 AM Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> > For example, btree ignores any ordering scan keys that it is given in
> > btrescan, which seems fine for btree because the ordering of a btree
> > is static (and no other order than that is expected apart from it's
> > reverse order), but this becomes problematic for other indexes that
> > could return ordered data but would prefer not to have to go through
> > the motions of making sure the return order is 100% correct, rather
> > than a k-sorted sequence, or just the matches to the quals (like
> > GIST). Is returning index scan results in (reverse) natural order not
> > optional but required with amcanorder? If it is required, why is the
> > am indicator called 'canorder' instead of 'willorder', 'doesorder' or
> > 'isordered'?
>
> I don't know. I have a hard time imagining an index AM that is
> amcanorder=true that isn't either nbtree, or something very similar
> (so similar that it seems unlikely that anybody would actually go to
> the trouble of implementing it from scratch).

Well, BRIN (with minmax opclasses) could return ordered results if it
needs to (see [0]; though that implements it as a distinct plan node).
Ordering the tuples correctly takes quite some effort, but is quite
likely to use less effort and/or scratch space than a table/bitmap
scan + sort, because we won't have to manage all tuples of the table
at the same time. However, it woould be extremely expensive if the
planner expects this to always return the data in btree order.

For GIST with the btree_gist opclasses it is even easier to return
ordered results (patch over at [1]), but then still it prefers not to
have to make a strict ordering as it adds overhead vs 'normal' index
scans.

Also, was that a confirmation that amcanorder is a requirement for the
AM to return data in index order (unless amrescan's orderbys is not
null), or just a comment on the reason for the name of 'amcanorder'
being unclear?

> You didn't mention support for merge joins. That's one of the defining
> characteristics of an amcanorder=true index AM, since an
> "ammarkpos/amrestrpos function need only be provided if the access
> method supports ordered scans". It's hard to imagine how that could
> work with a loosely ordered index. It seems to imply that the scan
> must always work with a simple linear order.

I probably didn't think of merge join support because 'merge join' is
not mentioned as such in the index AM api - I knew of
ammarkpos/amrestrpos, but hadn't yet gone into detail what they're
used for.

> Cases where the planner uses a merge join often involve an index path
> with an "interesting sort order" that "enables" the merge join.
> Perhaps most of the alternative plans (that were almost as cheap as
> the merge join plan) would have had to scan the same index in the same
> way anyway, so it ends up making sense to use a merge join. The merge
> join can get ordered results from the index "at no extra cost". (All
> of this is implicit, of course -- the actual reason why the planner
> chose the merge join plan is because it worked out to be the cheapest
> plan.)

Couldn't the merge join (or scan node) use a tuple store to return to
some earlier point in the index scan when a native version of markpos
is not easily supported? It would add (potentially very significant)
IO overhead, but it would also allow merge joins on ordered paths that
currently don't have a natural way of marking their position.

> > Alternatively, if an am should be using the order scan keys from
> > .amrescan and natural order scans also get scan keys, is there some
> > place I find the selection process for ordered index AMs, and how this
> > ordering should be interepreted? There are no good examples available
> > in core code because btree is special-cased, and there are no other
> > in-tree AMs that have facilities where both `amcanorderbyop` and
> > `amcanorder` are set.
>
> The general notion of a data type's sort order comes from its default
> btree operator class, so the whole idea of a generic sort order is
> deeply tied to the nbtree AM. That's why we sometimes have btree
> operator classes for types that you'd never actually want to index (at
> least not using a btree index).

Yes, the part where btree opclasses determine a type's ordering is
clear. But what I'm looking for is "how do I, as an index AM
implementation, get the signal that I need to return column-ordered
data?" If that signal is "index AM marked amcanorder == index must
return ordered data", then that's suboptimal for the index AM writer,
but understandable. If amcanorder does not imply always ordered
retrieval, then I'd like to know how it is signaled to the AM. But as
of yet I've not found a clear and conclusive answer either way.

Kind regards,

Mattthias van de Meent.

[0] https://www.postgresql.org/message-id/flat/e70fa091-e338-1598-9de4-6d0ef6b693e2%40enterprisedb.com
[1] https://www.postgresql.org/message-id/flat/EB2AF704-70FC-4D73-A97A-A7884A0381B5%40kleczek.org



Re: Questions regarding Index AMs and natural ordering

От
Peter Geoghegan
Дата:
On Fri, Nov 24, 2023 at 8:44 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> Also, was that a confirmation that amcanorder is a requirement for the
> AM to return data in index order (unless amrescan's orderbys is not
> null), or just a comment on the reason for the name of 'amcanorder'
> being unclear?

It was both, I suppose.

> > Cases where the planner uses a merge join often involve an index path
> > with an "interesting sort order" that "enables" the merge join.
> > Perhaps most of the alternative plans (that were almost as cheap as
> > the merge join plan) would have had to scan the same index in the same
> > way anyway, so it ends up making sense to use a merge join. The merge
> > join can get ordered results from the index "at no extra cost". (All
> > of this is implicit, of course -- the actual reason why the planner
> > chose the merge join plan is because it worked out to be the cheapest
> > plan.)
>
> Couldn't the merge join (or scan node) use a tuple store to return to
> some earlier point in the index scan when a native version of markpos
> is not easily supported?

You can materialize any executor node, allowing it to be accessed in
random order, as required by merge joins (in many cases). But any
index AM that relies on that clearly isn't an amcanorder=true index
AM, which is what you asked about.

Whether or not you should actually care about whether your index AM
can meet the expectations that the API has for amcanorder=true index
AMs is far from clear. In the end the design has to make sense, and
integrate into the existing API in some way or other -- but the
details are likely to depend on things that nobody thought of just
yet. I don't think that it's all that useful to discuss it while
everything remains so abstract.

> It would add (potentially very significant)
> IO overhead, but it would also allow merge joins on ordered paths that
> currently don't have a natural way of marking their position.

I don't know. Maybe it's possible. It might even be practically
achievable, while delivering a compelling benefit to users. This is
the kind of thing that I don't tend to speculate about, at least not
before I have more concrete information about what is possible through
some kind of prototyping process.

> Yes, the part where btree opclasses determine a type's ordering is
> clear. But what I'm looking for is "how do I, as an index AM
> implementation, get the signal that I need to return column-ordered
> data?" If that signal is "index AM marked amcanorder == index must
> return ordered data", then that's suboptimal for the index AM writer,
> but understandable. If amcanorder does not imply always ordered
> retrieval, then I'd like to know how it is signaled to the AM. But as
> of yet I've not found a clear and conclusive answer either way.

I suppose that amcanorder=true cannot mean that, since we have the
SAOP path key thing (at least for now). But that wasn't true until bug
fix commit 807a40c5, so prior to that point I guess it was tacitly the
case that B-Tree scans always returned ordered results.

Here we are trying to divine the intentions of an "abstract" API by
discussing highly obscure bugs that were either bugs in nbtree, or
bugs in what we once expected of nbtree. Seems more than a bit silly
to me.

I'm not suggesting that the idea of an abstract API doesn't need to be
taken seriously at all -- far from it. Just that "case law precedent"
can play an important role in how it is interpreted. The fact that
some things remain ambiguous isn't necessarily a problem that needs to
be solved. If there is a real problem, then IMV it should always start
with a concrete complaint about how the API doesn't support certain
requirements. In other words, I'm of the opinion that such a complaint
should end with the API, as opposed to starting with it.

--
Peter Geoghegan



Re: Questions regarding Index AMs and natural ordering

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> On Fri, Nov 24, 2023 at 8:44 AM Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
>> Yes, the part where btree opclasses determine a type's ordering is
>> clear. But what I'm looking for is "how do I, as an index AM
>> implementation, get the signal that I need to return column-ordered
>> data?" If that signal is "index AM marked amcanorder == index must
>> return ordered data", then that's suboptimal for the index AM writer,
>> but understandable. If amcanorder does not imply always ordered
>> retrieval, then I'd like to know how it is signaled to the AM. But as
>> of yet I've not found a clear and conclusive answer either way.

> I suppose that amcanorder=true cannot mean that, since we have the
> SAOP path key thing (at least for now).

As things stand, amcanorder definitely means that the index always
returns ordered data, since the planner will unconditionally apply
pathkeys to the indexscan Paths generated for it (see plancat.c's
get_relation_info which sets up info->sortopfamily, and
build_index_pathkeys which uses that).  We could reconsider that
definition if there were a reason to, but so far it hasn't seemed
interesting.  The hack in 807a40c5 is a hack, without a doubt, but
that doesn't necessarily mean we should spend time on generalizing it,
and even less that we should have done so in 2012.  Maybe by now there
are non-core index AMs that have cases where it's worth being pickier.
We'd have to invent some API that allows the index AM to have a say in
what pathkeys get applied.

            regards, tom lane



Re: Questions regarding Index AMs and natural ordering

От
Peter Geoghegan
Дата:
On Fri, Nov 24, 2023 at 10:58 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <pg@bowt.ie> writes:
> > I suppose that amcanorder=true cannot mean that, since we have the
> > SAOP path key thing (at least for now).
>
> As things stand, amcanorder definitely means that the index always
> returns ordered data, since the planner will unconditionally apply
> pathkeys to the indexscan Paths generated for it (see plancat.c's
> get_relation_info which sets up info->sortopfamily, and
> build_index_pathkeys which uses that).  We could reconsider that
> definition if there were a reason to, but so far it hasn't seemed
> interesting.  The hack in 807a40c5 is a hack, without a doubt, but
> that doesn't necessarily mean we should spend time on generalizing it,
> and even less that we should have done so in 2012.

That is certainly my preferred interpretation. Not least because I am
in the process of removing the hack completely, which has shown very
significant benefits for queries with SAOPs that now get to depend on
the sort order in some crucial way.

> Maybe by now there
> are non-core index AMs that have cases where it's worth being pickier.
> We'd have to invent some API that allows the index AM to have a say in
> what pathkeys get applied.

Matthias and I recently discussed a design sketched by James Coleman
some years back, which Matthias seemed particularly interested in:

https://www.postgresql.org/message-id/flat/CAAaqYe-SsHgXKXPpjn7WCTUnB_RQSxXjpSaJd32aA%3DRquv0AgQ%40mail.gmail.com

James' test case benefits from my own patch in the obvious way: it can
use SAOP index quals, while still being able to get an ordered scan
that terminates early via a LIMIT.  But the design he sketched
proposes to go much further than that -- it's far more targeted. His
design reconstructs a useful sort order by "multiplexing" different
parts of the index (for different SAOP constants), merging together
multiple streams of ordered tuples into one stream. This means that
the index produces tuples in a useful order, sufficient to terminate
the scan early -- but it's a sort order that doesn't match the index
at all. Obviously, that's a totally novel idea.

It's possible that something like that might just make sense -- it's
theoretically optimal, at least. My guess is that if it really did
happen then the planner would treat it like the special case that it
is. It very much reminds me of loose index scan, where the index AM
API has to be revised so that high level semantic information can be
pushed down into the index AM.

If something like that can offer stellar performance, that just isn't
achievable in any other way, then I guess it's worth accepting the
cost of an uglified index AM API. Whether or not such a feature really
can be that compelling likely depends on a myriad of factors that we
cannot possibly anticipate ahead of time. There are just too many
things in this general area that *might* make sense someday.

As we discussed back in 2022, I think that MDAM style "skip scan"
(meaning support for skipping around an index given a query with
"missing key predicates") shouldn't be a special case in the planner
-- only costing needs to know anything about skipping. In general I
find that it's most useful to classify all of these techniques
according to whether or not they are compatible with the orthodox
definition of amcanorder that you described. In other words, to
classify them based on whether they involve pushing down high level
semantic information to the index AM.

--
Peter Geoghegan



Re: Questions regarding Index AMs and natural ordering

От
Matthias van de Meent
Дата:
On Fri, 24 Nov 2023, 19:58 Tom Lane, <tgl@sss.pgh.pa.us> wrote:
>
> Peter Geoghegan <pg@bowt.ie> writes:
> > On Fri, Nov 24, 2023 at 8:44 AM Matthias van de Meent
> > <boekewurm+postgres@gmail.com> wrote:
> >> Yes, the part where btree opclasses determine a type's ordering is
> >> clear. But what I'm looking for is "how do I, as an index AM
> >> implementation, get the signal that I need to return column-ordered
> >> data?" If that signal is "index AM marked amcanorder == index must
> >> return ordered data", then that's suboptimal for the index AM writer,
> >> but understandable. If amcanorder does not imply always ordered
> >> retrieval, then I'd like to know how it is signaled to the AM. But as
> >> of yet I've not found a clear and conclusive answer either way.
>
> > I suppose that amcanorder=true cannot mean that, since we have the
> > SAOP path key thing (at least for now).
>
> As things stand, amcanorder definitely means that the index always
> returns ordered data, since the planner will unconditionally apply
> pathkeys to the indexscan Paths generated for it (see plancat.c's
> get_relation_info which sets up info->sortopfamily, and
> build_index_pathkeys which uses that).  We could reconsider that
> definition if there were a reason to, but so far it hasn't seemed
> interesting.

For GIST there is now a case for improving the support for optionally
ordered retrieval, as there is a patch that tries to hack ORDER BY
support into GIST. Right now that patch applies (what I consider) a
hack by implicitly adding an operator ordering clause for ORDER BY
column with the column type's btree ordering operator, but with
improved APIs that shouldn't need such a hacky approach.

> The hack in 807a40c5 is a hack, without a doubt, but
> that doesn't necessarily mean we should spend time on generalizing it,
> and even less that we should have done so in 2012.  Maybe by now there
> are non-core index AMs that have cases where it's worth being pickier.
> We'd have to invent some API that allows the index AM to have a say in
> what pathkeys get applied.

I think that would be quite useful, as it would allow indexes to
return ordered results in other orders than the defined key order, and
it would allow e.g. BRIN to run its sort for ordered retrieval inside
the index scan node (rather than requiring its own sort node type).

CC: Tomas, maybe you have some ideas about this as well? What was the
reason for moving BRIN-assisted sort into its own node? Was there more
to it than "BRIN currently doesn't have amgettuple, and amgettuple
can't always be used"?

Kind regards,

Matthias van de Meent