Обсуждение: Unexpected Name Resolution Behavior with ORDER BY COLLATE Clause in PostgreSQL 16.2

Поиск
Список
Период
Сортировка

Unexpected Name Resolution Behavior with ORDER BY COLLATE Clause in PostgreSQL 16.2

От
"Bender, Patrice"
Дата:

In PostgreSQL 16.2 running on aarch64-unknown-linux-musl, an unexpected error occurs when executing a SQL query that orders the results by a column with a specified collation (using the COLLATE clause) after performing a join. Specifically, when attempting to order the results by a column that exists in both tables involved in the join, the query fails with an "ambiguous column reference" error if the ORDER BY clause includes a COLLATE specification. This behavior diverges from the expected and standard SQL behavior, where the column reference in the ORDER BY clause should first resolve to the columns listed in the SELECT statement before considering the source tables.

 

### Versions used

 

PostgreSQL 16.2 on aarch64-unknown-linux-musl, compiled by gcc (Alpine 13.2.1_git20231014) 13.2.1 20231014, 64-bit

 

### Steps to reproduce

 

1.start `psql`

2.create two tables, having one element in common:

 

```sql

    CREATE TABLE Foo (id SERIAL PRIMARY KEY, descr TEXT);

    CREATE TABLE Bar (id SERIAL PRIMARY KEY, descr TEXT, foo_id INT);

    -- optionally, insert some data

    INSERT INTO BAR (DESCR, FOO_ID) VALUES ('Description for Bar 1', 1);

    INSERT INTO FOO (DESCR) VALUES ('Description for Foo 1');

```

 

3.execute a query, which references a column of the query in an order by with a collation:

 

```sql

   SELECT

     BAR.descr as "descr"

   from BAR left join FOO on BAR.foo_id = FOO.id

   order by descr COLLATE "en-x-icu" ASC;

```

 

4.observe the error message:

 

```log

    ERROR:  column reference "descr" is ambiguous

    LINE 1: ...BAR left join FOO on BAR.foo_id = FOO.id order by descr COLL...

```

 

5. If you remove the collate, the query will work.

 

### Expected behavior

 

In the ANSI sql standard, the reference in the order by should first be looked up in the queries columns, and then in the tables.

This works if we omit the "COLLATE" clause. I'd expect the lookup to work the same way with the "COLLATE" clause.

 

 

"Bender, Patrice" <patrice.bender@sap.com> writes:
> In PostgreSQL 16.2 running on aarch64-unknown-linux-musl, an
> unexpected error occurs when executing a SQL query that orders the
> results by a column with a specified collation (using the COLLATE
> clause) after performing a join. Specifically, when attempting to
> order the results by a column that exists in both tables involved in
> the join, the query fails with an "ambiguous column reference" error
> if the ORDER BY clause includes a COLLATE specification. This
> behavior diverges from the expected and standard SQL behavior, where
> the column reference in the ORDER BY clause should first resolve to
> the columns listed in the SELECT statement before considering the
> source tables.

This behavior appears to me to be consistent with our documentation
[1]:

    Each expression can be the name or ordinal number of an output
    column (SELECT list item), or it can be an arbitrary expression
    formed from input-column values.
    ...
    If an ORDER BY expression is a simple name that matches both an
    output column name and an input column name, ORDER BY will
    interpret it as the output column name. This is the opposite of
    the choice that GROUP BY will make in the same situation. This
    inconsistency is made to be compatible with the SQL standard.

As soon as you add COLLATE, it's not a simple name anymore but an
expression, and so it is not considered to possibly match an output
column name.

As for the question of SQL standard compliance, please cite chapter
and verse supporting your position.  I believe that the spec changed
between SQL92 and SQL99.  The SQL92 standard said that ORDER BY
items could be output column names or numbers (and nothing else).
Later editions say that ORDER BY expressions are just like any
others, ie names in them are *input* column names.  Our interpretation
is a bit of a mess but is intended to preserve some compatibility
with SQL92, while following the modern spec in any case that isn't
strictly SQL92 compliant.  (I suppose the para quoted above should
be modified to explain that a little more clearly.  The compatibility
section at the bottom of the page has the details, but this bit is
kind of misleading by itself.)

            regards, tom lane

[1] https://www.postgresql.org/docs/current/sql-select.html#SQL-ORDERBY



On Mon, 2024-03-25 at 15:52 +0000, Bender, Patrice wrote:
> PostgreSQL16.2
> on aarch64-unknown-linux-musl, compiled by gcc (Alpine 13.2.1_git20231014)13.2.120231014,64-bit
>  
>     CREATE TABLE Foo (id SERIAL PRIMARY KEY, descr TEXT);
>     CREATE TABLE Bar (id SERIAL PRIMARY KEY, descr TEXT, foo_id INT);
>     -- optionally, insert some data
>     INSERT INTO BAR (DESCR, FOO_ID) VALUES ('Description for Bar 1', 1);
>     INSERT INTO FOO (DESCR) VALUES ('Description for Foo 1');
>
>    SELECT
>      BAR.descr as "descr"
>    from BAR left join FOO on BAR.foo_id = FOO.id
>    order by descr COLLATE "en-x-icu" ASC;
>
>     ERROR:  column reference "descr" is ambiguous
>     LINE 1: ...BAR left join FOO on BAR.foo_id = FOO.id order by descr COLL...
>  
> 5.If you remove the collate, the query will work.
>  
> ### Expected behavior
>  
> In the ANSI sql
> standard, the reference in the order by should first be looked up in the queries columns, and
> then
> in the tables.
> This worksif we omit the "COLLATE" clause. I'd expect the lookup to work the same way with the "COLLATE" clause.

This is not a bug.

If you say "ORDER BY descr", you are referencing the result set column with that alias.

If you say "ORDER BY descr COLLATE "en-x-icu"" or "ORDER BY descr || ' '" or anything else
that is not a plain column reference, but an expression, "descr" is *not* understood to
be a result set column, but a column of one of the involved tables, and that reference
is ambiguous.

There is little sense in quoting the SQL standard here, because as far as I can tell it
only supports column names, no expressions, in the ORDER BY clause.

Yours,
Laurenz Albe