Обсуждение: Reverse-sort indexes and NULLS FIRST/LAST sorting
The SQL2003 spec adds optional "NULLS FIRST" and "NULLS LAST" modifiers for ORDER BY clauses. Teodor proposed an implementation here: http://archives.postgresql.org/pgsql-patches/2006-12/msg00019.php which I didn't care for at all: http://archives.postgresql.org/pgsql-hackers/2006-12/msg00133.php Doing this right is going to require introducing the nulls-first-or-last concept into all the system's handling of sort ordering. Messy as that sounds, I think it will end up logically cleaner than what we have now, because it will let us fix some issues involving descending-order index opclasses and backwards-sort mergejoins. Neither of those can really work correctly right now, the reason being exactly that we lack a framework for dealing with variable sort positioning of NULLs. I'm hoping to fix this as a consequence of the work I'm doing with operator families for 8.3. What I'd like to come out of it is support for both NULLS FIRST/LAST and reverse-sort index columns. Reverse-sort indexes are already in the TODO list, the application being to create an index whose sort order matches a query like "ORDER BY x ASC, y DESC". There are some user-visible decisions to be made first, so this message is to start a discussion about what we want. One way we could handle this is to say that reverse-sort indexes are implemented by adding explicit catalog entries for reverse-sort opclasses, with no additions to the underlying btree index mechanisms. So you might make an index using a command like CREATE INDEX fooi ON foo (x, y reverse_int4_ops); btree indexes would always sort by the given opclass with NULLS LAST. So the two possible orderings that could be derived from this index (using forward or backward scan respectively) are ORDER BY x ASC NULLS LAST, y DESC NULLS LASTORDER BY x DESC NULLS FIRST, y ASC NULLS FIRST The other way that seems like it could win acceptance is to make REVERSE an explicit optional property of an index column; and if we do that we might as well allow NULLS FIRST/LAST to be an optional property as well. Then you could say something like CREATE INDEX fooi ON foo (x, y REVERSE NULLS FIRST); (Or maybe use DESC instead of REVERSE as the keyword --- not very important at this point.) This index would support scans with these two sort orderings: ORDER BY x ASC NULLS LAST, y DESC NULLS FIRSTORDER BY x DESC NULLS FIRST, y ASC NULLS LAST This second way is more flexible in that it allows indexes to support mixed null orderings; another attraction is that we'd not have to create explicit reverse-sort opclasses, which would be a tedious bit of extra work for every datatype. On the other hand, adding these extra flag bits to indexes seems like a horribly ugly wart, mainly because they're irrelevant to anything except a btree index. (Or at least irrelevant to anything that doesn't support ordered scans, but in practice that's only btree for the foreseeable future.) Also, having to account for these options in the btree code would make it more complex and perhaps slower. Comments? I've got mixed feelings about which way to jump myself. regards, tom lane
On Mon, Jan 01, 2007 at 05:53:35PM -0500, Tom Lane wrote: > The SQL2003 spec adds optional "NULLS FIRST" and "NULLS LAST" modifiers > for ORDER BY clauses. Teodor proposed an implementation here: > http://archives.postgresql.org/pgsql-patches/2006-12/msg00019.php > which I didn't care for at all: > http://archives.postgresql.org/pgsql-hackers/2006-12/msg00133.php <snip> > One way we could handle this is to say that reverse-sort indexes are > implemented by adding explicit catalog entries for reverse-sort opclasses, > with no additions to the underlying btree index mechanisms. So you > might make an index using a command like > > CREATE INDEX fooi ON foo (x, y reverse_int4_ops); Personally I favour this approach. It's also the approach similar to what I did with the COLLATE stuff. It's IMHO the cleanest because it encapsulates the order at the level where it's important. In particular, NULLS FIRST/LAST makes sense for btree, but no other index type, so storing the order seperatly is wasted space for any other index type. But in a sense this doesn't go far enough. In general a column can be ordered four ways, and like you say later, it doesn't allow mixed NULLS FIRST/LAST orderins. > The other way that seems like it could win acceptance is to make REVERSE > an explicit optional property of an index column; and if we do that we > might as well allow NULLS FIRST/LAST to be an optional property as well. > Then you could say something like > > CREATE INDEX fooi ON foo (x, y REVERSE NULLS FIRST); While the syntax is nice, I think this method of implementation is a bad idea. Like I said it's wasted processing for non-btree index types. Issues which you havn't addressed are: - Pathkeys: How is the forward/reverse/nulls first/last going to be encoded in the pathkey? I don't think the current method (using the operator OID) is going to stretch far enough. But that leaves you with deciding whether to keep support for SORT_LT/GTFUNC? - How do you deal with people asking for NULLS FIRST/LAST which is the opposite of how the index is defined. Say you can't use the index? > Comments? I've got mixed feelings about which way to jump myself. Somehow neither is quite satisfying. My COLLATE patch solved it by adding an extra layer on top of the operator classes to encode the ordering nulls first/last, but I don't think we really want that. One totally whacked out idea is to allowed the btree code to call the operator to decide nulls first/last, that would allow you to factor that part out at least. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
I'd like to see this implemented with more general collation support in mind. In general, each index column can be ordered by one collation. A query matching the index collation can use the index directly, a query asking for another collation needs to convert. The trivial way to convert from one collation to another is to sort according to the new collation. For other conversions, there can be cheaper ways. I'd like to have explicit support for collations and converting between them, perhaps by introducing a new ConvertCollation node type. The default implementation would be to sort, but for example to convert from NULLS FIRST to NULLS LAST could be implemented by buffering the null columns to temporary storage and returning all non-null rows first. There's similar tricks that could be used to convert between many Western European collations, without resorting to full sort. The NULLS FIRST/LAST support, as well as ascending and descending orderings would be special cases of the general collation and collation conversion machinery. I don't know how much work that would be to implement, compared to what you're proposing. It would require adding an extra collation concept to all the places that care about sort ordering, but you're proposing to add nulls-first-or-last flag to all those places anyway. Tom Lane wrote: > The SQL2003 spec adds optional "NULLS FIRST" and "NULLS LAST" modifiers > for ORDER BY clauses. Teodor proposed an implementation here: > http://archives.postgresql.org/pgsql-patches/2006-12/msg00019.php > which I didn't care for at all: > http://archives.postgresql.org/pgsql-hackers/2006-12/msg00133.php > > Doing this right is going to require introducing the nulls-first-or-last > concept into all the system's handling of sort ordering. Messy as that > sounds, I think it will end up logically cleaner than what we have now, > because it will let us fix some issues involving descending-order index > opclasses and backwards-sort mergejoins. Neither of those can really work > correctly right now, the reason being exactly that we lack a framework for > dealing with variable sort positioning of NULLs. > > I'm hoping to fix this as a consequence of the work I'm doing with > operator families for 8.3. What I'd like to come out of it is support > for both NULLS FIRST/LAST and reverse-sort index columns. Reverse-sort > indexes are already in the TODO list, the application being to create > an index whose sort order matches a query like "ORDER BY x ASC, y DESC". > There are some user-visible decisions to be made first, so this message > is to start a discussion about what we want. > > One way we could handle this is to say that reverse-sort indexes are > implemented by adding explicit catalog entries for reverse-sort opclasses, > with no additions to the underlying btree index mechanisms. So you > might make an index using a command like > > CREATE INDEX fooi ON foo (x, y reverse_int4_ops); > > btree indexes would always sort by the given opclass with NULLS LAST. > So the two possible orderings that could be derived from this index > (using forward or backward scan respectively) are > > ORDER BY x ASC NULLS LAST, y DESC NULLS LAST > ORDER BY x DESC NULLS FIRST, y ASC NULLS FIRST > > The other way that seems like it could win acceptance is to make REVERSE > an explicit optional property of an index column; and if we do that we > might as well allow NULLS FIRST/LAST to be an optional property as well. > Then you could say something like > > CREATE INDEX fooi ON foo (x, y REVERSE NULLS FIRST); > > (Or maybe use DESC instead of REVERSE as the keyword --- not very > important at this point.) This index would support scans with these > two sort orderings: > > ORDER BY x ASC NULLS LAST, y DESC NULLS FIRST > ORDER BY x DESC NULLS FIRST, y ASC NULLS LAST > > This second way is more flexible in that it allows indexes to support > mixed null orderings; another attraction is that we'd not have to create > explicit reverse-sort opclasses, which would be a tedious bit of extra > work for every datatype. On the other hand, adding these extra flag bits > to indexes seems like a horribly ugly wart, mainly because they're > irrelevant to anything except a btree index. (Or at least irrelevant to > anything that doesn't support ordered scans, but in practice that's only > btree for the foreseeable future.) Also, having to account for these > options in the btree code would make it more complex and perhaps slower. > > Comments? I've got mixed feelings about which way to jump myself. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Martijn van Oosterhout <kleptog@svana.org> writes: > Issues which you havn't addressed are: > - Pathkeys: How is the forward/reverse/nulls first/last going to be > encoded in the pathkey? I'm envisioning a struct with operator OID and null-ordering flag. If we implement the explicit REVERSE variant then we'd have to be prepared to match the OID against either the LessThan or GreaterThan member of an index opclass depending --- it wouldn't add a third field to pathkeys. > But that leaves you with > deciding whether to keep support for SORT_LT/GTFUNC? I'm kind of inclined to drop the LTFUNC/GTFUNC business and insist that every operator used as a sort operator must be a btree opclass member. But that decision seems entirely orthogonal to the rest of it. > - How do you deal with people asking for NULLS FIRST/LAST which is the > opposite of how the index is defined. Say you can't use the index? That's right, you can't. Just like any other ordering incompatibility. > One totally whacked out idea is to allowed the btree code to call the > operator to decide nulls first/last, that would allow you to factor > that part out at least. This doesn't work at all unless you make all the operators non-strict, which I hardly think we want. Also, that path leads to needing FOUR opclasses per datatype to support all the orderings :-( regards, tom lane
Heikki Linnakangas <heikki@enterprisedb.com> writes: > I'd like to see this implemented with more general collation support in > mind. I'm really not prepared to buy into that, simply because it puts ICU or some equivalent large chunk of new code into the critical path to finish what I'm doing. The fact that pathkeys will become structs will probably make it easier to add collation later (adding another field to the struct won't mean a wholesale notational change), but that doesn't mean I have to do it now. > The NULLS FIRST/LAST support, as well as ascending and descending > orderings would be special cases of the general collation and collation > conversion machinery. That seems like a bad idea, because nulls first/last and asc/desc ordering are valid concepts for all btree-indexable datatypes, whereas collation is only meaningful for text. Besides, that approach just moves the bloat over from too-many-opclasses to too-many-collations; do we really want to need four collation objects for each basic collation? regards, tom lane
Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: >> I'd like to see this implemented with more general collation support in >> mind. > > I'm really not prepared to buy into that, simply because it puts ICU or > some equivalent large chunk of new code into the critical path to finish > what I'm doing. ... Yeah, I didn't mean doing that right now. Just to keep it in mind so that what we do now fits in nicely with it in the future. >> The NULLS FIRST/LAST support, as well as ascending and descending >> orderings would be special cases of the general collation and collation >> conversion machinery. > > That seems like a bad idea, because nulls first/last and asc/desc > ordering are valid concepts for all btree-indexable datatypes, whereas > collation is only meaningful for text. Besides, that approach just > moves the bloat over from too-many-opclasses to too-many-collations; do > we really want to need four collation objects for each basic collation? Hmm, I guess we don't. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Martijn van Oosterhout <kleptog@svana.org> writes: > On Mon, Jan 01, 2007 at 05:53:35PM -0500, Tom Lane wrote: >> One way we could handle this is to say that reverse-sort indexes are >> implemented by adding explicit catalog entries for reverse-sort opclasses, >> with no additions to the underlying btree index mechanisms. So you >> might make an index using a command like >> >> CREATE INDEX fooi ON foo (x, y reverse_int4_ops); > Personally I favour this approach. It's also the approach similar to > what I did with the COLLATE stuff. It's IMHO the cleanest because it > encapsulates the order at the level where it's important. > In particular, NULLS FIRST/LAST makes sense for btree, but no other > index type, so storing the order seperatly is wasted space for any > other index type. After further thought I'm leaning towards doing it the other way, ie, with explicit DESC and NULLS FIRST/LAST modifiers attached to index columns, rather than specialized opclasses. Comparing this to the concerns that have been mentioned: * Allows indexes to support mixed nulls-first-or-last orderings. Although I can't make a real strong argument why that's important, I have a feeling that we'll miss it if we can't handle it. * Solves the problem once, instead of requiring extra code in every datatype. Even though that code would largely be boilerplate copy-and-paste stuff, it's still tedious and easy to get wrong. * Possible slowdown of btree code: after looking a bit, I think this is a red herring. Most of the work would be done by flipping operator strategy numbers during scan setup. As best I can tell without actually coding it, the only addition required to the inner-loop comparison function will be one extra test-and-branch in the code paths where we've detected we are comparing a NULL to a non-NULL. That doesn't seem like a big problem. * Ugly wart added to pg_index: yeah, it is, although I have an idea that might make it more palatable. Rather than add a couple of boolean vectors (which is a datatype we don't have anyway), I'm thinking of adding an int2vector field named something like "indoption" which would store per-column flag bits with index-AM-specific meanings. DESC and NULLS FIRST/LAST would take up two of these bits for btree (and any other index AM that supports ordered scans), the rest are available for expansion. I do not know if there's any immediate use for such flag bits for GiST or GIN, but one obvious possible use for btree is to store collation info. (I suppose we can find a way to identify a collation with a dozen or less bits, so it should fit.) Another possible objection is that in the proposed CREATE INDEX syntax index-column-id [ opclass-name ] [ DESC ] [ NULLS {FIRST|LAST} ] DESC must be a fully reserved word else it can't be distinguished from an opclass name. But guess what, it already is. Comments? regards, tom lane
On Jan 4, 2007, at 13:33 , Tom Lane wrote: > Another possible objection is that in the proposed CREATE INDEX syntax > > index-column-id [ opclass-name ] [ DESC ] [ NULLS {FIRST|LAST} ] > > DESC must be a fully reserved word else it can't be distinguished from > an opclass name. But guess what, it already is. A point in favor of using DESC over REVERSE as you had earlier proposed is that DESC is already a reserved word, while REVERSE isnt' even in the list of key words. As DESC is quite closely associated with its antonym ASC wrt ordering, any thoughts of allowing ASC as an optional noise word? Users may be surprised if ASC were to throw an error. Michael Glaesemann grzm seespotcode net
Michael Glaesemann <grzm@seespotcode.net> writes: > On Jan 4, 2007, at 13:33 , Tom Lane wrote: >> index-column-id [ opclass-name ] [ DESC ] [ NULLS {FIRST|LAST} ] >> >> DESC must be a fully reserved word else it can't be distinguished from >> an opclass name. But guess what, it already is. > A point in favor of using DESC over REVERSE as you had earlier > proposed is that DESC is already a reserved word, while REVERSE isnt' > even in the list of key words. Right, that's what convinced me not to use REVERSE. Also, the parallelism of this construct to what is allowed in ORDER BY seems a bit pleasing. > As DESC is quite closely associated > with its antonym ASC wrt ordering, any thoughts of allowing ASC as an > optional noise word? Users may be surprised if ASC were to throw an > error. Yup, I'd come to the same plan. Actually ASC will not be a complete noise word: if you specify it (or a NULLS clause) on an index type that doesn't have a sort order, you'll get an error. regards, tom lane