Обсуждение: [Question] Window Function Results without ORDER BY Clause

Поиск
Список
Период
Сортировка

[Question] Window Function Results without ORDER BY Clause

От
Zhang Mingli
Дата:
Hi,

I am reaching out to discuss the behavior of window functions in Postgres, specifically regarding the use of the OVER() clause without an ORDER BY specification.
In our recent tests, we observed that the results can be unstable. 
For example, when executing the following query:


SELECT sum(unique1) OVER (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING),
 unique1, four
FROM tenk1 
WHERE unique1 < 10;

The case is in window.sql of regression.

explain(costs off) SELECT sum(unique1) over (rows between current row and unbounded following),
 unique1, four
FROM tenk1 WHERE unique1 < 10;
 QUERY PLAN
--------------------------------------------------------------------
 WindowAgg
 Window: w1 AS (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
 -> Index Scan using tenk1_unique1 on tenk1
 Index Cond: (unique1 < 10)
(4 rows)

regression=# SELECT sum(unique1) over (rows between current row and unbounded following),
 unique1, four
FROM tenk1 WHERE unique1 < 10;
 sum | unique1 | four
-----+---------+------
 45 | 0 | 0
 45 | 1 | 1
 44 | 2 | 2
 42 | 3 | 3
 39 | 4 | 0
 35 | 5 | 1
 30 | 6 | 2
 24 | 7 | 3
 17 | 8 | 0
 9 | 9 | 1
(10 rows)


However, after setting enable_indexscan = off, the results changed:

regression=# set enable_indexscan = off;
SET
regression=# SELECT sum(unique1) over (rows between current row and unbounded following),
 unique1, four
FROM tenk1 WHERE unique1 < 10;
 sum | unique1 | four
-----+---------+------
 45 | 4 | 0
 41 | 2 | 2
 39 | 1 | 1
 38 | 6 | 2
 32 | 9 | 1
 23 | 8 | 0
 15 | 5 | 1
 10 | 3 | 3
 7 | 7 | 3
 0 | 0 | 0
(10 rows)

regression=# explain(costs off) SELECT sum(unique1) over (rows between current row and unbounded following),
 unique1, four
FROM tenk1 WHERE unique1 < 10;
 QUERY PLAN
--------------------------------------------------------------------
 WindowAgg
 Window: w1 AS (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
 -> Seq Scan on tenk1
 Filter: (unique1 < 10)
(4 rows)


Referring to the SQL 2011 standard, it states that if ORDER BY is omitted, the order of rows in the partition is undefined. 
While using a window function without ORDER BY is valid, the resulting output seems unpredictable.


So, are both result sets technically correct given the absence of an ORDER BY clause?



--
Zhang Mingli
HashData

Re: [Question] Window Function Results without ORDER BY Clause

От
"David G. Johnston"
Дата:
On Friday, July 11, 2025, Zhang Mingli <zmlpostgres@gmail.com> wrote:

Referring to the SQL 2011 standard, it states that if ORDER BY is omitted, the order of rows in the partition is undefined. 
While using a window function without ORDER BY is valid, the resulting output seems unpredictable.

So, are both result sets technically correct given the absence of an ORDER BY clause?

The system is behaving within the requirements of the specification.  The query itself is bugged code that the query author should fix.

David J.

Re: [Question] Window Function Results without ORDER BY Clause

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Friday, July 11, 2025, Zhang Mingli <zmlpostgres@gmail.com> wrote:
>> So, are both result sets technically correct given the absence of an ORDER
>> BY clause?

> The system is behaving within the requirements of the specification.  The
> query itself is bugged code that the query author should fix.

Well, it's our own regression-test query.  I think the actual question
being asked here is "do our regression tests need to pass under random
non-default GUC settings?".  I'd say no; it'd be next door to
impossible to guarantee that.  If this query gave unstable results
in practice, we'd have noticed by now (it's been there since 2010).

            regards, tom lane



Re: [Question] Window Function Results without ORDER BY Clause

От
"DINESH NAIR"
Дата:
Hi, 

David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Friday, July 11, 2025, Zhang Mingli <zmlpostgres@gmail.com> wrote:
>> So, are both result sets technically correct given the absence of an
ORDER
>>
BY clause?

> The system is behaving within the requirements of the specification.  The
> query itself is bugged code that the query author should fix.

Well, it's our own regression-test query.  I think the actual question
being asked here is "do our regression tests need to pass under random
non-default GUC settings?".  I'd say no; it'd be next door to
impossible to guarantee that.  If this query gave unstable
results
in practice, we'd have noticed
by now (it's been there since 2010).

                        regards, tom lane

Would like to know .. 

Thanks in advance 

Dinesh



From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Friday, July 11, 2025 9:27 PM
To: David G. Johnston <david.g.johnston@gmail.com>
Cc: Zhang Mingli <zmlpostgres@gmail.com>; PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: [Question] Window Function Results without ORDER BY Clause
 
Caution: This email was sent from an external source. Please verify the sender’s identity before clicking links or opening attachments.

"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Friday, July 11, 2025, Zhang Mingli <zmlpostgres@gmail.com> wrote:
>> So, are both result sets technically correct given the absence of an ORDER
>> BY clause?

> The system is behaving within the requirements of the specification.  The
> query itself is bugged code that the query author should fix.

Well, it's our own regression-test query.  I think the actual question
being asked here is "do our regression tests need to pass under random
non-default GUC settings?".  I'd say no; it'd be next door to
impossible to guarantee that.  If this query gave unstable results
in practice, we'd have noticed by now (it's been there since 2010).

                        regards, tom lane


Re: [Question] Window Function Results without ORDER BY Clause

От
Zhang Mingli
Дата:

On Jul 11, 2025 at 23:57 +0800, Tom Lane <tgl@sss.pgh.pa.us>, wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
On Friday, July 11, 2025, Zhang Mingli <zmlpostgres@gmail.com> wrote:
So, are both result sets technically correct given the absence of an ORDER
BY clause?

The system is behaving within the requirements of the specification. The
query itself is bugged code that the query author should fix.

Well, it's our own regression-test query. I think the actual question
being asked here is "do our regression tests need to pass under random
non-default GUC settings?". I'd say no; it'd be next door to
impossible to guarantee that.
Hi, I didn't mean to imply that our regression tests should pass under random non-default GUC settings.
My concern arose from the development of our distributed databases, such as Greenplum and Apache Cloudberry.
In Postgres, the tuple order remains stable under constant GUC settings(Index Scan, SeqScan and there is no Parallel Seqscan for Window Agg). 
However, in a distributed environment, tuples are spread across segments, and the order in which data is returned to the Master node can be unpredictable. 
This unpredictability can affect the stability of results when calculating window functions.
I just wanted to confirm that both results are correct in the absence of an "ORDER BY" clause. I believe I've received the clarification I needed.
Thank you all!

--
Zhang Mingli
HashData