Обсуждение: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options
Adds the options RESPECT/IGNORE NULLS (null treatment clause) and FROM FIRST/LAST to the non-aggregate window functions. A previous patch (https://www.postgresql.org/message-id/CA+=vxNa5_N1q5q5OkxC0aQnNdbo2Ru6GVw+86wk+oNsUNJDLig@mail.gmail.com) partially implemented this feature. However, that patch worked by adding the null treatment clause to the window frame's frameOptions variable, and consequently had the limitation that it wasn't possible to reuse a window frame definition in a single query where two functions were called that had different null treatment options. This meant that the patch was never committed. The attached path takes a different approach which gets around this limitation. For example, the following query would not work correctly with the implementation in the old patch but does with the attached patch: WITH cte (x) AS ( select null union select 1 union select 2) SELECT x, first_value(x) over w as with_default, first_value(x) respect nulls over w as with_respect, first_value(x) ignore nulls over w as with_ignore from cte WINDOW w as (order by x nulls first rows between unbounded preceding and unbounded following); x | with_default | with_respect | with_ignore ---+--------------+--------------+------------- | | | 1 1 | | | 1 2 | | | 1 (3 rows) == Implementation == The patch adds two types to the pg_type catalog: "ignorenulls" and "fromlast". These types are of the Boolean category, and work as wrappers around the bool type. They are used as function arguments to extra versions of the window functions that take additional boolean arguments. RESPECT NULLS and FROM FIRST are ignored by the parser, but IGNORE NULLS and FROM LAST lead to the extra versions being called with arguments to ignore nulls and order from last. == Testing == Updated documentation and added regression tests. All existing tests pass. This change will need a catversion bump. Thanks to Krasiyan Andreev for initially testing this patch.
Вложения
On Fri, Jul 13, 2018 at 01:52:00PM +0100, Oliver Ford wrote: > Adds the options RESPECT/IGNORE NULLS (null treatment clause) and FROM > FIRST/LAST to the non-aggregate window functions. Please find attached an updated version for OID drift. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Вложения
Hi,
Patch applies and compiles, all included tests and building of the docs pass.
I am using last version from more than two months ago in production environment with real data and I didn't find any bugs,
so I'm marking this patch as ready for committer in the commitfest app.
На сб, 28.07.2018 г. в 22:00 ч. David Fetter <david@fetter.org> написа:
On Fri, Jul 13, 2018 at 01:52:00PM +0100, Oliver Ford wrote:
> Adds the options RESPECT/IGNORE NULLS (null treatment clause) and FROM
> FIRST/LAST to the non-aggregate window functions.
Please find attached an updated version for OID drift.
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>>>>> "Krasiyan" == Krasiyan Andreev <krasiyan@gmail.com> writes: Krasiyan> Hi, Krasiyan> Patch applies and compiles, all included tests and building Krasiyan> of the docs pass. I am using last version from more than two Krasiyan> months ago in production environment with real data and I Krasiyan> didn't find any bugs, so I'm marking this patch as ready for Krasiyan> committer in the commitfest app. Unfortunately, reviewing it from a committer perspective - I can't possibly commit this as it stands, and anything I did to it would be basically a rewrite of much of it. Some of the problems could be fixed. For example the type names could be given pg_* prefixes (it's clearly not acceptable to create random special-purpose boolean subtypes in pg_catalog and _not_ give them such a prefix), and the precedence hackery in gram.y could have comments added (gram.y is already bad enough; _anything_ fancy with precedence has to be described in the comments). But I don't like that hack with the special types at all, and I think that needs a better solution. Normally I'd push hard to try and get some solution that's sufficiently generic to allow user-defined functions to make use of the feature. But I think the SQL spec people have managed to make that literally impossible in this case, what with the FROM keyword appearing in the middle of a production and not followed by anything sufficiently distinctive to even use for extra token lookahead. Also, as has been pointed out in a number of previous features, we're starting to accumulate identifiers that are reserved in subtly different ways from our basic four-category system (which is itself a significant elaboration compared to the spec's simple reserved/unreserved distinction). As I recall this objection was specifically raised for CUBE, but justified there by the existence of the contrib/cube extension (and the fact that the standard CUBE() construct is used only in very specific places in the syntax). This patch would make lead / lag / first_value / last_value / nth_value syntactically "special" while not actually reserving them (beyond having them in unreserved_keywords); I think serious consideration should be given to whether they should instead become col_name_keywords (which would, I believe, make it unnecessary to mess with precedence). Anyone have any thoughts or comments on the above? -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > Normally I'd push hard to try and get some solution that's sufficiently > generic to allow user-defined functions to make use of the feature. But > I think the SQL spec people have managed to make that literally > impossible in this case, what with the FROM keyword appearing in the > middle of a production and not followed by anything sufficiently > distinctive to even use for extra token lookahead. Yeah. Is there any appetite for a "Just Say No" approach? That is, refuse to implement the spec's syntax on the grounds that it's too brain-dead to even consider, and instead provide some less random, more extensibility-friendly way to accomplish the same thing? The FROM FIRST/LAST bit seems particularly badly thought through, because AFAICS it is flat out ambiguous with a normal FROM clause immediately following the window function call. The only way to make it not so would be to make FIRST and LAST be fully reserved, which is neither a good idea nor spec-compliant. In short, there's a really good case to be made here that the SQL committee is completely clueless about syntax design, and so we shouldn't follow this particular pied piper. > ... This patch would make lead / lag / > first_value / last_value / nth_value syntactically "special" while not > actually reserving them (beyond having them in unreserved_keywords); I > think serious consideration should be given to whether they should > instead become col_name_keywords (which would, I believe, make it > unnecessary to mess with precedence). I agree that messing with the precedence rules is likely to have unforeseen and undesirable side-effects. Generally, if you need to create a precedence rule, that's because your grammar is ambiguous. Precedence fixes that in a well-behaved way only for cases that actually are very much like operator precedence rules. Otherwise, you may just be papering over something that isn't working very well. See e.g. commits 670a6c7a2 and 12b716457 for past cases where we learned that the hard way. (The latter also points out that if you must have a precedence hack, it's safer to hack individual rules than to stick precedences onto terminal symbols.) In the case at hand, since the proposed patch doesn't make FIRST and LAST be fully reserved, it seems just about certain that it can be made to misbehave, including failing on queries that were and should remain legal. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> The FROM FIRST/LAST bit seems particularly badly thought through, Tom> because AFAICS it is flat out ambiguous with a normal FROM clause Tom> immediately following the window function call. The only way to Tom> make it not so would be to make FIRST and LAST be fully reserved, Tom> which is neither a good idea nor spec-compliant. In the actual spec syntax it's not ambiguous at all because NTH_VALUE is a reserved word (as are LEAD, LAG, FIRST_VALUE and LAST_VALUE), and OVER is a mandatory clause in its syntax, so a FROM appearing before the OVER must be part of a FROM FIRST/LAST and not introducing a FROM-clause. In our syntax, if we made NTH_VALUE etc. a col_name_keyword (and thus not legal as a function name outside its own special syntax) it would also become unambiguous. i.e. given this token sequence (with . marking the current posision): select nth_value(x) . from first ignore if we know up front that "nth_value" is a window function and not any other kind of function, we know that we have to shift the "from" rather than reducing the select-list because we haven't seen an "over" yet. (Neither "first" nor "ignore" are reserved, so "select foo(x) from first ignore;" is a valid and complete query, and without reserving the function name we'd need at least four tokens of lookahead to decide otherwise.) This is why I think the col_name_keyword option needs to be given serious consideration - it still doesn't reserve the names as strongly as the spec does, but enough to make the standard syntax work without needing any dubious hacks. -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > Tom> The FROM FIRST/LAST bit seems particularly badly thought through, > Tom> because AFAICS it is flat out ambiguous with a normal FROM clause > Tom> immediately following the window function call. The only way to > Tom> make it not so would be to make FIRST and LAST be fully reserved, > Tom> which is neither a good idea nor spec-compliant. > In the actual spec syntax it's not ambiguous at all because NTH_VALUE is > a reserved word (as are LEAD, LAG, FIRST_VALUE and LAST_VALUE), and OVER > is a mandatory clause in its syntax, so a FROM appearing before the OVER > must be part of a FROM FIRST/LAST and not introducing a FROM-clause. Hmm ... > In our syntax, if we made NTH_VALUE etc. a col_name_keyword (and thus > not legal as a function name outside its own special syntax) it would > also become unambiguous. > i.e. given this token sequence (with . marking the current posision): > select nth_value(x) . from first ignore > if we know up front that "nth_value" is a window function and not any > other kind of function, we know that we have to shift the "from" rather > than reducing the select-list because we haven't seen an "over" yet. I don't really find that to be a desirable solution, because quite aside from the extensibility problem, it would mean that a lot of errors become "syntax error" where we formerly gave a more useful message. This does open up a thought about how to proceed, though. I'd been trying to think of a way to solve this using base_yylex's ability to do some internal lookahead and change token types based on that. If you just think of recognizing FROM FIRST/LAST, you get nowhere because that's still legal in other contexts. But if you were to look for FROM followed by FIRST/LAST followed by IGNORE/RESPECT/OVER, I think that could only validly happen in this syntax. It'd take some work to extend base_yylex to look ahead 2 tokens not one, but I'm sure that could be done. (You'd also need a lookahead rule to match "IGNORE/RESPECT NULLS OVER", but that seems just as doable.) Then the relevant productions use FROM_LA, IGNORE_LA, RESPECT_LA instead of the corresponding bare tokens, and the grammar no longer has an ambiguity problem. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> If you just think of recognizing FROM FIRST/LAST, you get nowhere Tom> because that's still legal in other contexts. But if you were to Tom> look for FROM followed by FIRST/LAST followed by Tom> IGNORE/RESPECT/OVER, I think that could only validly happen in Tom> this syntax. No; you need to go four tokens ahead in total, not three. Assuming nth_value is unreserved, then select nth_value(x) from first ignore; is a valid query that has nth_value(x) as an expression, "first" as a table name and "ignore" as its alias. Only when you see NULLS after IGNORE, or OVER after FIRST/LAST, do you know that you're looking at a window function and not a from clause. So FROM_LA would have to mean "FROM" followed by any of: FIRST IGNORE NULLS LAST IGNORE NULLS FIRST RESPECT NULLS LAST RESPECT NULLS FIRST OVER LAST OVER Remember that while OVER is reserved, all of FIRST, LAST, RESPECT and IGNORE are unreserved. Tom> It'd take some work to extend base_yylex to look ahead 2 tokens Tom> not one, but I'm sure that could be done. (You'd also need a Tom> lookahead rule to match "IGNORE/RESPECT NULLS OVER", but that Tom> seems just as doable.) Then the relevant productions use FROM_LA, Tom> IGNORE_LA, RESPECT_LA instead of the corresponding bare tokens, Tom> and the grammar no longer has an ambiguity problem. Yeah, but at the cost of having to extend base_yylex to go 3 tokens ahead (not 2) rather than the current single lookahead slot. Doable, certainly (probably not much harder to do 3 than 2 actually) -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > Tom> If you just think of recognizing FROM FIRST/LAST, you get nowhere > Tom> because that's still legal in other contexts. But if you were to > Tom> look for FROM followed by FIRST/LAST followed by > Tom> IGNORE/RESPECT/OVER, I think that could only validly happen in > Tom> this syntax. > No; you need to go four tokens ahead in total, not three. Assuming > nth_value is unreserved, then > select nth_value(x) from first ignore; > is a valid query that has nth_value(x) as an expression, "first" as a > table name and "ignore" as its alias. No, because once IGNORE is a keyword, even unreserved, it's not legal as an AS-less alias. We'd be breaking queries like that no matter what. (I know there are people around here who'd like to remove that restriction, but it's not happening anytime soon IMO.) regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> select nth_value(x) from first ignore; Tom> No, because once IGNORE is a keyword, even unreserved, it's not Tom> legal as an AS-less alias. That rule only applies in the select-list, not in the FROM clause; table aliases in FROM are just ColId, so they can be anything except a fully reserved or type_func_name keyword. -- Andrew (irc:RhodiumToad)
So I've tried to rough out a decision tree for the various options on how this might be implemented (discarding the "use precedence hacks" option). Opinions? Additions? (formatted for emacs outline-mode) * 1. use lexical lookahead +: relatively straightforward parser changes +: no new reserved words +: has the option of working extensibly with all functions -: base_yylex needs extending to 3 lookahead tokens ** 1.1. Allow from/ignore clause on all (or all non-agg) window function calls If the clauses are legal on all window functions, what to do about existing window functions for which the clauses do not make sense? *** 1.1.1. Ignore the clause when the function isn't aware of it +: simple -: somewhat surprising for users perhaps? *** 1.1.2. Change the behavior of the windowapi in some consistent way Not sure if this can work. +: fairly simple(maybe?) and predictable -: changes the behavior of existing window functions ** 1.2. Allow from/ignore clause on only certain functions +: avoids any unexpected behavior -: needs some way to control what functions allow it *** 1.2.1. Check the function name in parse analysis against a fixed list. +: simple -: not extensible *** 1.2.2. Provide some option in CREATE FUNCTION +: extensible -: fairly intrusive, adding stuff to create function and pg_proc *** 1.2.3. Do something magical with function argument types +: doesn't need changes in create function / pg_proc -: it's an ugly hack * 2. reserve nth_value etc. as functions +: follows the spec reasonably well +: less of a hack than extending base_yylex -: new reserved words -: more parser rules -: not extensible (now goto 1.2.1) * 3. "just say no" to the spec e.g. add new functions like lead_ignore_nulls(), or add extra boolean args to lead() etc. telling them to skip nulls +: simple -: doesn't conform to spec -: using extra args isn't quite the right semantics -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > So I've tried to rough out a decision tree for the various options on > how this might be implemented (discarding the "use precedence hacks" > option). Opinions? Additions? I think it'd be worth at least drafting an implementation for the lexical-lookahead fix. I think it's likely that we'll need to extend base_yylex to do more lookahead in the future even if we don't do it for this, given the SQL committee's evident love for COBOL-ish syntax and lack of regard for what you can do in LALR(1). The questions of how we interface to the individual window functions are really independent of how we handle the parsing problem. My first inclination is to just pass the flags down to the window functions (store them in WindowObject and provide some additional inquiry functions in windowapi.h) and let them deal with it. > If the clauses are legal on all window functions, what to do about existing > window functions for which the clauses do not make sense? Option 1: do nothing, document that nothing happens if w.f. doesn't implement it. Option 2: record whether the inquiry functions got called. At end of query, error out if they weren't and the options were used. It's also worth wondering if we couldn't just implement the flags in some generic fashion and not need to involve the window functions at all. FROM LAST, for example, could and perhaps should be implemented by inverting the sort order. Possibly IGNORE NULLS could be implemented inside the WinGetFuncArgXXX functions? These behaviors might or might not make much sense with other window functions, but that doesn't seem like it's our problem. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> So I've tried to rough out a decision tree for the various options >> on how this might be implemented (discarding the "use precedence >> hacks" option). Opinions? Additions? Tom> I think it'd be worth at least drafting an implementation for the Tom> lexical-lookahead fix. I think it's likely that we'll need to Tom> extend base_yylex to do more lookahead in the future even if we Tom> don't do it for this, given the SQL committee's evident love for Tom> COBOL-ish syntax and lack of regard for what you can do in Tom> LALR(1). That's not _quite_ a fair criticism of the SQL committee; they're not ignoring the capabilities of parsers, they're just not making their syntax robust in the presence of the kinds of extensions and generalizations that we want to do. Without exception that I know of, every time we've run into a problem that needed ugly precedence rules or extra lookahead it's been caused by us not reserving something that is reserved in the spec, or by us allowing constructs that are not in the spec at all (postfix operators, especially), or by us deliberately generalizing what the spec allows (e.g. allowing full expressions where the spec only allows column references, or allowing extra parens around subselects) or where we've repurposed syntax from the spec in an incompatible way (as in ANY(array)). Anyway, for the time being I will mark this patch as "returned with feedback". >> If the clauses are legal on all window functions, what to do about >> existing window functions for which the clauses do not make sense? Tom> Option 1: do nothing, document that nothing happens if w.f. Tom> doesn't implement it. That was 1.1.1 on my list. Tom> Option 2: record whether the inquiry functions got called. At end Tom> of query, error out if they weren't and the options were used. Erroring at the _end_ of the query seems a bit of a potential surprise. Tom> It's also worth wondering if we couldn't just implement the flags Tom> in some generic fashion and not need to involve the window Tom> functions at all. That was what I meant by option 1.1.2 on my list. Tom> FROM LAST, for example, could and perhaps should be implemented by Tom> inverting the sort order. Actually that can't work for reasons brought up in the recent discussion of optimization of window function sorts: if you change the sort order you potentially disturb the ordering of peer rows, and the spec requires that an (nth_value(x,n) from last over w) and (otherfunc(x) over w) for order-equivalent windows "w" must see the peer rows in the same order. So FROM LAST really does have to keep the original sort order, and count backwards from the end of the window. Tom> Possibly IGNORE NULLS could be implemented inside the Tom> WinGetFuncArgXXX functions? These behaviors might or might not Tom> make much sense with other window functions, but that doesn't seem Tom> like it's our problem. That's about what I was thinking for option 1.1.2, yes. -- Andrew (irc:RhodiumToad)
>>>>> "Krasiyan" == Krasiyan Andreev <krasiyan@gmail.com> writes: Krasiyan> I am using last version from more than two months ago in Krasiyan> production environment with real data and I didn't find any Krasiyan> bugs, so I'm marking this patch as ready for committer in the Krasiyan> commitfest app. Oliver (or anyone else), do you plan to continue working on this in the immediate future, in line with the comments from myself and Tom in this thread? If so I'll bump it to the next CF, otherwise I'll mark it "returned with feedback". I'm happy to help out with further work on this patch if needed, time permitting. -- Andrew (irc:RhodiumToad)
Greetings, This seems to have died out, and that's pretty unfortunate because this is awfully useful SQL standard syntax that people look for and wish we had. * Andrew Gierth (andrew@tao11.riddles.org.uk) wrote: > So I've tried to rough out a decision tree for the various options on > how this might be implemented (discarding the "use precedence hacks" > option). Opinions? Additions? > > (formatted for emacs outline-mode) > > * 1. use lexical lookahead > > +: relatively straightforward parser changes > +: no new reserved words > +: has the option of working extensibly with all functions > > -: base_yylex needs extending to 3 lookahead tokens This sounds awful grotty and challenging to do and get right, and the alternative (just reserving these, as the spec does) doesn't seem so draconian as to be that much of an issue. > * 2. reserve nth_value etc. as functions > > +: follows the spec reasonably well > +: less of a hack than extending base_yylex > > -: new reserved words > -: more parser rules > -: not extensible > For my 2c, at least, reserving these strikes me as entirely reasonable. Yes, it sucks that we have to partially-reserve some additional keywords, but such is life. I get that we'll throw syntax errors sometimes when we might have given a better error, but I think we can accept that. > (now goto 1.2.1) Hmm, not sure this was right? but sure, I'll try... > *** 1.2.1. Check the function name in parse analysis against a fixed list. > > +: simple > -: not extensible Seems like this is more-or-less required since we'd be reserving them..? > *** 1.2.2. Provide some option in CREATE FUNCTION > > +: extensible > -: fairly intrusive, adding stuff to create function and pg_proc How would this work though, if we reserve the functions as keywords..? Maybe I'm not entirely following, but wouldn't attempts to use other functions end up with syntax errors in at least some of the cases, meaning that having other functions support this wouldn't really work? I don't particularly like the idea that some built-in functions would always work but others would work but only some of the time. > *** 1.2.3. Do something magical with function argument types > > +: doesn't need changes in create function / pg_proc > -: it's an ugly hack Not really a fan of 'ugly hack'. > * 3. "just say no" to the spec > > e.g. add new functions like lead_ignore_nulls(), or add extra boolean > args to lead() etc. telling them to skip nulls > > +: simple > -: doesn't conform to spec > -: using extra args isn't quite the right semantics Ugh, no thank you. Thanks! Stephen
Вложения
Thank you very much for feedback and yes, that is very useful SQL syntax.
Maybe you miss my previous answer, but you are right, that patch is currently dead,
because some important design questions must be discussed here, before patch rewriting.
I have dropped support of from first/last for nth_value(),
but also I reimplemented it in a different way,
by using negative number for the position argument, to be able to get the same frame in exact reverse order.
After that patch becomes much more simple and some concerns about precedence hack has gone.
I have not renamed special bool type "ignorenulls"
(I know that it is not acceptable way for calling extra version
of window functions, but also it makes things very easy and it can reuse frames),
but I removed the other special bool type "fromlast".
Attached file was for PostgreSQL 13 (master git branch, last commit fest),
everything was working and patch was at the time in very good shape, all tests was passed.
I read previous review and suggestions from Tom about special bool type and unreserved keywords and also,
that IGNORE NULLS could be implemented inside the WinGetFuncArgXXX functions,
but I am not sure how exactly to proceed (some example will be very helpful).
Maybe you miss my previous answer, but you are right, that patch is currently dead,
because some important design questions must be discussed here, before patch rewriting.
I have dropped support of from first/last for nth_value(),
but also I reimplemented it in a different way,
by using negative number for the position argument, to be able to get the same frame in exact reverse order.
After that patch becomes much more simple and some concerns about precedence hack has gone.
I have not renamed special bool type "ignorenulls"
(I know that it is not acceptable way for calling extra version
of window functions, but also it makes things very easy and it can reuse frames),
but I removed the other special bool type "fromlast".
Attached file was for PostgreSQL 13 (master git branch, last commit fest),
everything was working and patch was at the time in very good shape, all tests was passed.
I read previous review and suggestions from Tom about special bool type and unreserved keywords and also,
that IGNORE NULLS could be implemented inside the WinGetFuncArgXXX functions,
but I am not sure how exactly to proceed (some example will be very helpful).
На чт, 30.04.2020 г. в 21:58 Stephen Frost <sfrost@snowman.net> написа:
Greetings,
This seems to have died out, and that's pretty unfortunate because this
is awfully useful SQL standard syntax that people look for and wish we
had.
* Andrew Gierth (andrew@tao11.riddles.org.uk) wrote:
> So I've tried to rough out a decision tree for the various options on
> how this might be implemented (discarding the "use precedence hacks"
> option). Opinions? Additions?
>
> (formatted for emacs outline-mode)
>
> * 1. use lexical lookahead
>
> +: relatively straightforward parser changes
> +: no new reserved words
> +: has the option of working extensibly with all functions
>
> -: base_yylex needs extending to 3 lookahead tokens
This sounds awful grotty and challenging to do and get right, and the
alternative (just reserving these, as the spec does) doesn't seem so
draconian as to be that much of an issue.
> * 2. reserve nth_value etc. as functions
>
> +: follows the spec reasonably well
> +: less of a hack than extending base_yylex
>
> -: new reserved words
> -: more parser rules
> -: not extensible
>
For my 2c, at least, reserving these strikes me as entirely reasonable.
Yes, it sucks that we have to partially-reserve some additional
keywords, but such is life. I get that we'll throw syntax errors
sometimes when we might have given a better error, but I think we can
accept that.
> (now goto 1.2.1)
Hmm, not sure this was right? but sure, I'll try...
> *** 1.2.1. Check the function name in parse analysis against a fixed list.
>
> +: simple
> -: not extensible
Seems like this is more-or-less required since we'd be reserving them..?
> *** 1.2.2. Provide some option in CREATE FUNCTION
>
> +: extensible
> -: fairly intrusive, adding stuff to create function and pg_proc
How would this work though, if we reserve the functions as keywords..?
Maybe I'm not entirely following, but wouldn't attempts to use other
functions end up with syntax errors in at least some of the cases,
meaning that having other functions support this wouldn't really work?
I don't particularly like the idea that some built-in functions would
always work but others would work but only some of the time.
> *** 1.2.3. Do something magical with function argument types
>
> +: doesn't need changes in create function / pg_proc
> -: it's an ugly hack
Not really a fan of 'ugly hack'.
> * 3. "just say no" to the spec
>
> e.g. add new functions like lead_ignore_nulls(), or add extra boolean
> args to lead() etc. telling them to skip nulls
>
> +: simple
> -: doesn't conform to spec
> -: using extra args isn't quite the right semantics
Ugh, no thank you.
Thanks!
Stephen
Вложения
I revisited the thread: https://www.postgresql.org/message-id/flat/CAGMVOdsbtRwE_4%2Bv8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A%40mail.gmail.com and came up with attached POC patch (I used some varibale names appearing in the Krasiyan Andreev's patch). I really love to have RESPECT/IGNORE NULLS because I believe they are convenient for users. For FIRST/LAST I am not so excited since there are alternatives as our document stats, so FIRST/LAST are not included in the patch. Currently in the patch only nth_value is allowed to use RESPECT/IGNORE NULLS. I think it's not hard to implement it for others (lead, lag, first_value and last_value). No document nor test patches are included for now. Note that RESPECT/IGNORE are not registered as reserved keywords in this patch (but registered as unreserved keywords). I am not sure if this is acceptable or not. > The questions of how we interface to the individual window functions > are really independent of how we handle the parsing problem. My > first inclination is to just pass the flags down to the window functions > (store them in WindowObject and provide some additional inquiry functions > in windowapi.h) and let them deal with it. I agree with this. Also I do not change the prototype of nth_value. So I pass RESPECT/IGNORE NULLS information from the raw parser to parse/analysis and finally to WindowObject. > It's also worth wondering if we couldn't just implement the flags in > some generic fashion and not need to involve the window functions at > all. FROM LAST, for example, could and perhaps should be implemented > by inverting the sort order. Possibly IGNORE NULLS could be implemented > inside the WinGetFuncArgXXX functions? These behaviors might or might > not make much sense with other window functions, but that doesn't seem > like it's our problem. Yes, probably we could make WinGetFuncArgXXX a little bit smarter in this direction (not implemented in the patch at this point). Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp From 07f01f8859e159c908ada72e8f53daf51e0b8bdf Mon Sep 17 00:00:00 2001 From: Tatsuo Ishii <ishii@postgresql.org> Date: Sat, 22 Apr 2023 16:52:50 +0900 Subject: [PATCH v1 1/3] Implement IGNORE or RESPECT NULLS parse/analysis part. Implement SQL standard's IGNORE/RESPECT NULLS clause for window functions. For now, only nth_value() can use this option. --- src/backend/parser/gram.y | 22 ++++++++++++++++++---- src/backend/parser/parse_func.c | 13 +++++++++++++ src/include/nodes/parsenodes.h | 8 ++++++++ src/include/nodes/primnodes.h | 2 ++ src/include/parser/kwlist.h | 2 ++ 5 files changed, 43 insertions(+), 4 deletions(-) diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index acf6cf4866..2980ecd666 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -276,6 +276,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); MergeWhenClause *mergewhen; struct KeyActions *keyactions; struct KeyAction *keyaction; + NullTreatment nulltreatment; } %type <node> stmt toplevel_stmt schema_stmt routine_body_stmt @@ -661,6 +662,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); json_object_constructor_null_clause_opt json_array_constructor_null_clause_opt +%type <nulltreatment> opt_null_treatment + /* * Non-keyword token types. These are hard-wired into the "flex" lexer. * They must be listed first so that their numeric codes do not depend on @@ -718,7 +721,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); HANDLER HAVING HEADER_P HOLD HOUR_P - IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE + IDENTITY_P IF_P IGNORE_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION @@ -752,7 +755,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); RANGE READ REAL REASSIGN RECHECK RECURSIVE REF_P REFERENCES REFERENCING REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA - RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP + RESET RESPECT RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP ROUTINE ROUTINES ROW ROWS RULE SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT @@ -15213,7 +15216,7 @@ func_application: func_name '(' ')' * (Note that many of the special SQL functions wouldn't actually make any * sense as functional index entries, but we ignore that consideration here.) */ -func_expr: func_application within_group_clause filter_clause over_clause +func_expr: func_application within_group_clause filter_clause opt_null_treatment over_clause { FuncCall *n = (FuncCall *) $1; @@ -15246,7 +15249,8 @@ func_expr: func_application within_group_clause filter_clause over_clause n->agg_within_group = true; } n->agg_filter = $3; - n->over = $4; + n->null_treatment = $4; + n->over = $5; $$ = (Node *) n; } | json_aggregate_func filter_clause over_clause @@ -15790,6 +15794,14 @@ filter_clause: | /*EMPTY*/ { $$ = NULL; } ; +/* + * Window function option clauses + */ +opt_null_treatment: + RESPECT NULLS_P { $$ = RESPECT_NULLS; } + | IGNORE_P NULLS_P { $$ = IGNORE_NULLS; } + | /*EMPTY*/ { $$ = NULL_TREATMENT_NOT_SET; } + ; /* * Window Definitions @@ -17111,6 +17123,7 @@ unreserved_keyword: | HOUR_P | IDENTITY_P | IF_P + | IGNORE_P | IMMEDIATE | IMMUTABLE | IMPLICIT_P @@ -17223,6 +17236,7 @@ unreserved_keyword: | REPLACE | REPLICA | RESET + | RESPECT | RESTART | RESTRICT | RETURN diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c index b3f0b6a137..92af0d10f1 100644 --- a/src/backend/parser/parse_func.c +++ b/src/backend/parser/parse_func.c @@ -31,6 +31,7 @@ #include "parser/parse_target.h" #include "parser/parse_type.h" #include "utils/builtins.h" +#include "utils/fmgroids.h" #include "utils/lsyscache.h" #include "utils/syscache.h" @@ -99,6 +100,7 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs, bool agg_distinct = (fn ? fn->agg_distinct : false); bool func_variadic = (fn ? fn->func_variadic : false); CoercionForm funcformat = (fn ? fn->funcformat : COERCE_EXPLICIT_CALL); + NullTreatment null_treatment = (fn ? fn->null_treatment : NULL_TREATMENT_NOT_SET); bool could_be_projection; Oid rettype; Oid funcid; @@ -534,6 +536,13 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs, errmsg("window function %s cannot have WITHIN GROUP", NameListToString(funcname)), parser_errposition(pstate, location))); + /* Check RESPECT NULLS or IGNORE NULLS is specified. They are only allowed with nth_value */ + if (null_treatment != NULL_TREATMENT_NOT_SET && funcid != F_NTH_VALUE) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("window function %s cannot have RESPECT NULLS or IGNORE NULLS", + NameListToString(funcname)), + parser_errposition(pstate, location))); } else if (fdresult == FUNCDETAIL_COERCION) { @@ -835,6 +844,10 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs, wfunc->winagg = (fdresult == FUNCDETAIL_AGGREGATE); wfunc->aggfilter = agg_filter; wfunc->location = location; + if (null_treatment == IGNORE_NULLS) + wfunc->ignorenulls = true; + else + wfunc->ignorenulls = false; /* * agg_star is allowed for aggregate functions but distinct isn't diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index cc7b32b279..f13ae26a24 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -404,6 +404,13 @@ typedef struct RoleSpec int location; /* token location, or -1 if unknown */ } RoleSpec; +typedef enum NullTreatment +{ + NULL_TREATMENT_NOT_SET = 0, + RESPECT_NULLS, + IGNORE_NULLS +} NullTreatment; + /* * FuncCall - a function or aggregate invocation * @@ -431,6 +438,7 @@ typedef struct FuncCall bool agg_distinct; /* arguments were labeled DISTINCT */ bool func_variadic; /* last argument was labeled VARIADIC */ CoercionForm funcformat; /* how to display this node */ + NullTreatment null_treatment; /* RESPECT_NULLS or IGNORE NULLS */ int location; /* token location, or -1 if unknown */ } FuncCall; diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h index be9c29f0bf..213297dbd3 100644 --- a/src/include/nodes/primnodes.h +++ b/src/include/nodes/primnodes.h @@ -559,6 +559,8 @@ typedef struct WindowFunc bool winstar pg_node_attr(query_jumble_ignore); /* is function a simple aggregate? */ bool winagg pg_node_attr(query_jumble_ignore); + /* true if IGNORE NULLS, false if RESPECT NULLS */ + bool ignorenulls pg_node_attr(query_jumble_ignore); /* token location, or -1 if unknown */ int location; } WindowFunc; diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h index f5b2e61ca5..c7e61a8f0e 100644 --- a/src/include/parser/kwlist.h +++ b/src/include/parser/kwlist.h @@ -198,6 +198,7 @@ PG_KEYWORD("hold", HOLD, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("hour", HOUR_P, UNRESERVED_KEYWORD, AS_LABEL) PG_KEYWORD("identity", IDENTITY_P, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("if", IF_P, UNRESERVED_KEYWORD, BARE_LABEL) +PG_KEYWORD("ignore", IGNORE_P, UNRESERVED_KEYWORD, AS_LABEL) PG_KEYWORD("ilike", ILIKE, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL) PG_KEYWORD("immediate", IMMEDIATE, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("immutable", IMMUTABLE, UNRESERVED_KEYWORD, BARE_LABEL) @@ -360,6 +361,7 @@ PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("reset", RESET, UNRESERVED_KEYWORD, BARE_LABEL) +PG_KEYWORD("respect", RESPECT, UNRESERVED_KEYWORD, AS_LABEL) PG_KEYWORD("restart", RESTART, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("restrict", RESTRICT, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("return", RETURN, UNRESERVED_KEYWORD, BARE_LABEL) -- 2.25.1 From 3dc6f4bb897f76247589db018716bf5680d5331c Mon Sep 17 00:00:00 2001 From: Tatsuo Ishii <ishii@postgresql.org> Date: Sat, 22 Apr 2023 16:58:48 +0900 Subject: [PATCH v1 2/3] Implement IGNORE or RESPECT NULLS planner part. --- src/backend/optimizer/util/clauses.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index a9c7bc342e..40fc62c447 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -2474,6 +2474,7 @@ eval_const_expressions_mutator(Node *node, newexpr->winref = expr->winref; newexpr->winstar = expr->winstar; newexpr->winagg = expr->winagg; + newexpr->ignorenulls = expr->ignorenulls; newexpr->location = expr->location; return (Node *) newexpr; -- 2.25.1 From a78feec9bf7b08c644c2b3089b2de9237d4fcd9e Mon Sep 17 00:00:00 2001 From: Tatsuo Ishii <ishii@postgresql.org> Date: Sat, 22 Apr 2023 17:00:06 +0900 Subject: [PATCH v1 3/3] Implement IGNORE or RESPECT NULLS executor and window functions part. --- src/backend/executor/nodeWindowAgg.c | 11 ++++++++++ src/backend/utils/adt/windowfuncs.c | 30 +++++++++++++++++++++++++--- src/include/windowapi.h | 2 ++ 3 files changed, 40 insertions(+), 3 deletions(-) diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c index 3ac581a711..7e2affb12c 100644 --- a/src/backend/executor/nodeWindowAgg.c +++ b/src/backend/executor/nodeWindowAgg.c @@ -69,6 +69,7 @@ typedef struct WindowObjectData int readptr; /* tuplestore read pointer for this fn */ int64 markpos; /* row that markptr is positioned on */ int64 seekpos; /* row that readptr is positioned on */ + WindowFunc *wfunc; /* WindowFunc of this function */ } WindowObjectData; /* @@ -2617,6 +2618,7 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags) winobj->winstate = winstate; winobj->argstates = wfuncstate->args; winobj->localmem = NULL; + winobj->wfunc = wfunc; perfuncstate->winobj = winobj; /* It's a real window function, so set up to call it. */ @@ -3620,3 +3622,12 @@ WinGetFuncArgCurrent(WindowObject winobj, int argno, bool *isnull) return ExecEvalExpr((ExprState *) list_nth(winobj->argstates, argno), econtext, isnull); } + +/* + * Return current WindowFunc + */ +WindowFunc * +WinGetWindowFunc(WindowObject winobj) +{ + return winobj->wfunc; +} diff --git a/src/backend/utils/adt/windowfuncs.c b/src/backend/utils/adt/windowfuncs.c index b87a624fb2..919295ba13 100644 --- a/src/backend/utils/adt/windowfuncs.c +++ b/src/backend/utils/adt/windowfuncs.c @@ -693,6 +693,7 @@ window_nth_value(PG_FUNCTION_ARGS) bool const_offset; Datum result; bool isnull; + bool isout; int32 nth; nth = DatumGetInt32(WinGetFuncArgCurrent(winobj, 1, &isnull)); @@ -705,9 +706,32 @@ window_nth_value(PG_FUNCTION_ARGS) (errcode(ERRCODE_INVALID_ARGUMENT_FOR_NTH_VALUE), errmsg("argument of nth_value must be greater than zero"))); - result = WinGetFuncArgInFrame(winobj, 0, - nth - 1, WINDOW_SEEK_HEAD, const_offset, - &isnull, NULL); + if (WinGetWindowFunc(winobj)->ignorenulls) + { + int i, n; + + i = n = 0; + + for (;;) + { + result = WinGetFuncArgInFrame(winobj, 0, + i++, WINDOW_SEEK_HEAD, false, + &isnull, &isout); + if (isout) + break; + + if (!isnull) + { + if (n == nth - 1) + break; + n++; + } + } + } + else + result = WinGetFuncArgInFrame(winobj, 0, + nth - 1, WINDOW_SEEK_HEAD, const_offset, + &isnull, NULL); if (isnull) PG_RETURN_NULL(); diff --git a/src/include/windowapi.h b/src/include/windowapi.h index b8c2c565d1..64f7d4c84d 100644 --- a/src/include/windowapi.h +++ b/src/include/windowapi.h @@ -61,4 +61,6 @@ extern Datum WinGetFuncArgInFrame(WindowObject winobj, int argno, extern Datum WinGetFuncArgCurrent(WindowObject winobj, int argno, bool *isnull); +extern WindowFunc *WinGetWindowFunc(WindowObject winobj); + #endif /* WINDOWAPI_H */ -- 2.25.1
On Sat, 22 Apr 2023, 13:14 Tatsuo Ishii, <ishii@sraoss.co.jp> wrote:
I revisited the thread:
https://www.postgresql.org/message-id/flat/CAGMVOdsbtRwE_4%2Bv8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A%40mail.gmail.com
and came up with attached POC patch (I used some varibale names
appearing in the Krasiyan Andreev's patch). I really love to have
RESPECT/IGNORE NULLS because I believe they are convenient for
users. For FIRST/LAST I am not so excited since there are alternatives
as our document stats, so FIRST/LAST are not included in the patch.
Currently in the patch only nth_value is allowed to use RESPECT/IGNORE
NULLS. I think it's not hard to implement it for others (lead, lag,
first_value and last_value). No document nor test patches are
included for now.
I've actually recently been looking at this feature again recently as well. One thing I wondered, but would need consensus, is to take the SEEK_HEAD/SEEK_TAIL case statements out of WinGetFuncArgInPartition. This function is only called by leadlag_common, which uses SEEK_CURRENT, so those case statements are never reached. Taking them out simplifies the code as it is but means future features might need it re-added (although I'm not sure the use case for it, as that function is for window funcs that ignore the frame options).
Note that RESPECT/IGNORE are not registered as reserved keywords in
this patch (but registered as unreserved keywords). I am not sure if
this is acceptable or not.
> The questions of how we interface to the individual window functions
> are really independent of how we handle the parsing problem. My
> first inclination is to just pass the flags down to the window functions
> (store them in WindowObject and provide some additional inquiry functions
> in windowapi.h) and let them deal with it.
I agree with this. Also I do not change the prototype of
nth_value. So I pass RESPECT/IGNORE NULLS information from the raw
parser to parse/analysis and finally to WindowObject.
This is a much better option than my older patch which needed to change the functions.
> It's also worth wondering if we couldn't just implement the flags in
> some generic fashion and not need to involve the window functions at
> all. FROM LAST, for example, could and perhaps should be implemented
> by inverting the sort order. Possibly IGNORE NULLS could be implemented
> inside the WinGetFuncArgXXX functions? These behaviors might or might
> not make much sense with other window functions, but that doesn't seem
> like it's our problem.
Yes, probably we could make WinGetFuncArgXXX a little bit smarter in
this direction (not implemented in the patch at this point).
+1 for doing it here. Maybe also refactor WinGetFuncArgInFrame, putting the exclusion checks in a static function as that function is already pretty big?
Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
On 4/22/23 14:14, Tatsuo Ishii wrote: > I revisited the thread: > https://www.postgresql.org/message-id/flat/CAGMVOdsbtRwE_4%2Bv8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A%40mail.gmail.com > > and came up with attached POC patch (I used some varibale names > appearing in the Krasiyan Andreev's patch). I really love to have > RESPECT/IGNORE NULLS because I believe they are convenient for > users. Excellent. I was thinking about picking my version of this patch up again, but I think this might be better than mine. I am curious why set_mark is false in the IGNORE version instead of also being const_offset. Surely the nth non-null in the frame will never go backwards. Dealing with marks was the main reason (I think) that my patch was not accepted. > For FIRST/LAST I am not so excited since there are alternatives > as our document stats, I disagree with this. The point of having FROM LAST is to avoid calculating a new window and running a new pass over it. > so FIRST/LAST are not included in the patch. I do agree that we can have <null treatment> without <from first or last> so let's move forward with this and handle the latter later. > Currently in the patch only nth_value is allowed to use RESPECT/IGNORE > NULLS. This should not be hard coded. It should be a new field in pg_proc (with a sanity check that it is only true for window functions). That way custom window functions can implement it. > I think it's not hard to implement it for others (lead, lag, > first_value and last_value). It doesn't seem like it should be, no. > No document nor test patches are included for now. I can volunteer to work on these if you want. > Note that RESPECT/IGNORE are not registered as reserved keywords in > this patch (but registered as unreserved keywords). I am not sure if > this is acceptable or not. For me, this is perfectly okay. Keep them at the lowest level of reservation as possible. -- Vik Fearing
Vik Fearing <vik@postgresfriends.org> writes: > On 4/22/23 14:14, Tatsuo Ishii wrote: >> Note that RESPECT/IGNORE are not registered as reserved keywords in >> this patch (but registered as unreserved keywords). I am not sure if >> this is acceptable or not. > For me, this is perfectly okay. Keep them at the lowest level of > reservation as possible. Yeah, keep them unreserved if at all possible. Any higher reservation level risks breaking existing applications that might be using these words as column or function names. regards, tom lane
> Excellent. I was thinking about picking my version of this patch up > again, but I think this might be better than mine. Thanks. > I am curious why set_mark is false in the IGNORE version instead of > also being const_offset. Surely the nth non-null in the frame will > never go backwards. Initially I thought that too. But when I used const_offset instead of false. I got an error: ERROR: cannot fetch row before WindowObject's mark position > I do agree that we can have <null treatment> without <from first or > last> so let's move forward with this and handle the latter later. Agreed. >> Currently in the patch only nth_value is allowed to use RESPECT/IGNORE >> NULLS. > > This should not be hard coded. It should be a new field in pg_proc > (with a sanity check that it is only true for window functions). That > way custom window functions can implement it. There were some discussions on this in the past. https://www.postgresql.org/message-id/flat/CAGMVOdsbtRwE_4%2Bv8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A%40mail.gmail.com It seems Tom and Andrew thought that "1.1.2. Change the behavior of the windowapi in some consistent way" is ambitious. If we follow this direction, I think each window function should check WindowFunc struct passed by WinGetWindowFunc (added in my patch) to check whether IGNORE NULLS can be applied or not in the function. If not, error out. This way, we don't need to add a new field to pg_proc. >> No document nor test patches are included for now. > > I can volunteer to work on these if you want. Thanks! I think you can work on top of the last patch posted by Krasiyan Andreev: https://www.postgresql.org/message-id/CAN1PwonAnC-KkRyY%2BDtRmxQ8rjdJw%2BgcOsHruLr6EnF7zSMH%3DQ%40mail.gmail.com Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
> Vik Fearing <vik@postgresfriends.org> writes: >> On 4/22/23 14:14, Tatsuo Ishii wrote: >>> Note that RESPECT/IGNORE are not registered as reserved keywords in >>> this patch (but registered as unreserved keywords). I am not sure if >>> this is acceptable or not. > >> For me, this is perfectly okay. Keep them at the lowest level of >> reservation as possible. > > Yeah, keep them unreserved if at all possible. Any higher reservation > level risks breaking existing applications that might be using these > words as column or function names. Agreed. Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
On Sun, Apr 23, 2023 at 4:29 AM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
> Vik Fearing <vik@postgresfriends.org> writes:
>
>> For me, this is perfectly okay. Keep them at the lowest level of
>> reservation as possible.
>
> Yeah, keep them unreserved if at all possible. Any higher reservation
> level risks breaking existing applications that might be using these
> words as column or function names.
Agreed.
Attached is a new version of the code and tests to implement this. There's now no modification to windowfuncs.c or the catalog,
it's only a bool added to FuncCall which if set to true, ignores nulls. It adds IGNORE/RESPECT at the Unreserved, As Label level.
The implementation also aims at better performance over previous versions by not disabling set_mark, and using an array to
track previous non-null positions in SEEK_HEAD or SEEK_CURRENT with Forward (lead, but not lag). The mark is set if a row
is out of frame and further rows can't be in frame (to ensure it works with an exclusion clause).
The attached test patch is mostly the same as in the previous patch set, but it doesn't fail on row_number anymore as the main patch
only rejects aggregate functions. The test patch also adds a test for EXCLUDE CURRENT ROW and for two contiguous null rows.
I've not yet tested custom window functions with the patch, but I'm happy to add them to the test patch in v2 if we want to go this way
in implementing this feature.
Вложения
> The attached test patch is mostly the same as in the previous patch > set, but it doesn't fail on row_number anymore as the main patch > only rejects aggregate functions. The test patch also adds a test for > +SELECT sum(orbit) RESPECT NULLS OVER () FROM planets; -- succeeds I think the standard does not allow to specify RESPECT NULLS other than lead, lag, first_value, last_value and nth_value. Unless we agree that PostgreSQL violates the standard in this regard, you should not allow to use RESPECT NULLS for the window functions, expect lead etc. and aggregates. See my patch. > +/* > + * Window function option clauses > + */ > +opt_null_treatment: > + RESPECT NULLS_P { $$ = RESPECT_NULLS; } > + | IGNORE_P NULLS_P { $$ = IGNORE_NULLS; } > + | /*EMPTY*/ { $$ = NULL_TREATMENT_NOT_SET; } > + ; With this, you can check if null treatment clause is used or not in each window function. In my previous patch I did the check in parse/analysis but I think it's better to be checked in each window function. This way, - need not to add a column to pg_proc. - allow user defined window functions to decide by themselves whether they can accept null treatment option. Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
>> The attached test patch is mostly the same as in the previous patch >> set, but it doesn't fail on row_number anymore as the main patch >> only rejects aggregate functions. The test patch also adds a test for > >> +SELECT sum(orbit) RESPECT NULLS OVER () FROM planets; -- succeeds > > I think the standard does not allow to specify RESPECT NULLS other > than lead, lag, first_value, last_value and nth_value. Unless we agree > that PostgreSQL violates the standard in this regard, you should not > allow to use RESPECT NULLS for the window functions, expect lead etc. > and aggregates. > > See my patch. > >> +/* >> + * Window function option clauses >> + */ >> +opt_null_treatment: >> + RESPECT NULLS_P { $$ = RESPECT_NULLS; } >> + | IGNORE_P NULLS_P { $$ = IGNORE_NULLS; } >> + | /*EMPTY*/ { $$ = NULL_TREATMENT_NOT_SET; } >> + ; > > With this, you can check if null treatment clause is used or not in > each window function. > > In my previous patch I did the check in parse/analysis but I think > it's better to be checked in each window function. This way, > > - need not to add a column to pg_proc. > > - allow user defined window functions to decide by themselves whether > they can accept null treatment option. Attached is the patch to implement this (on top of your patch). test=# SELECT row_number() RESPECT NULLS OVER () FROM (SELECT 1) AS s; ERROR: window function row_number cannot have RESPECT NULLS or IGNORE NULLS Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c index fac0e05dee..b8519d9890 100644 --- a/src/backend/executor/nodeWindowAgg.c +++ b/src/backend/executor/nodeWindowAgg.c @@ -74,6 +74,7 @@ typedef struct WindowObjectData int64 *win_nonnulls; /* tracks non-nulls in ignore nulls mode */ int nonnulls_size; /* track size of the win_nonnulls array */ int nonnulls_len; /* track length of the win_nonnulls array */ + WindowFunc *wfunc; /* WindowFunc of this function */ } WindowObjectData; /* @@ -2634,6 +2635,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags) winobj->nonnulls_len = 0; } + winobj->wfunc = wfunc; + /* It's a real window function, so set up to call it. */ fmgr_info_cxt(wfunc->winfnoid, &perfuncstate->flinfo, econtext->ecxt_per_query_memory); @@ -3881,3 +3884,24 @@ WinGetFuncArgCurrent(WindowObject winobj, int argno, bool *isnull) return ExecEvalExpr((ExprState *) list_nth(winobj->argstates, argno), econtext, isnull); } + +/* + * Error out that this window function cannot have null treatement. + */ +void +ErrorOutNullTreatment(WindowObject winobj) +{ + char *fname; + + Assert(WindowObjectIsValid(winobj)); + + if (winobj->wfunc->null_treatment == NULL_TREATMENT_NOT_SET) + return; + + fname = get_func_name(winobj->wfunc->winfnoid); + + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("window function %s cannot have RESPECT NULLS or IGNORE NULLS", + fname))); +} diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index 01fd16acf9..05e64c4569 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -2475,6 +2475,7 @@ eval_const_expressions_mutator(Node *node, newexpr->winstar = expr->winstar; newexpr->winagg = expr->winagg; newexpr->ignore_nulls = expr->ignore_nulls; + newexpr->null_treatment = expr->null_treatment; newexpr->location = expr->location; return (Node *) newexpr; diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 58c00bfd4f..e131428e85 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -276,6 +276,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); MergeWhenClause *mergewhen; struct KeyActions *keyactions; struct KeyAction *keyaction; + NullTreatment nulltreatment; } %type <node> stmt toplevel_stmt schema_stmt routine_body_stmt @@ -633,7 +634,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); opt_frame_clause frame_extent frame_bound %type <ival> opt_window_exclusion_clause %type <str> opt_existing_window_name -%type <boolean> null_treatment %type <boolean> opt_if_not_exists %type <boolean> opt_unique_null_treatment %type <ival> generated_when override_kind @@ -662,6 +662,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); json_object_constructor_null_clause_opt json_array_constructor_null_clause_opt +%type <nulltreatment> null_treatment + /* * Non-keyword token types. These are hard-wired into the "flex" lexer. * They must be listed first so that their numeric codes do not depend on @@ -15247,7 +15249,7 @@ func_expr: func_application within_group_clause filter_clause null_treatment ove n->agg_within_group = true; } n->agg_filter = $3; - n->ignore_nulls = $4; + n->null_treatment = $4; n->over = $5; $$ = (Node *) n; } @@ -15797,9 +15799,9 @@ filter_clause: * Window Definitions */ null_treatment: - IGNORE_P NULLS_P { $$ = true; } - | RESPECT_P NULLS_P { $$ = false; } - | /*EMPTY*/ { $$ = false; } + RESPECT_P NULLS_P { $$ = RESPECT_NULLS; } + | IGNORE_P NULLS_P { $$ = IGNORE_NULLS; } + | /*EMPTY*/ { $$ = NULL_TREATMENT_NOT_SET; } ; window_clause: diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c index afa4bcc8d1..63af8ca6aa 100644 --- a/src/backend/parser/parse_func.c +++ b/src/backend/parser/parse_func.c @@ -98,7 +98,7 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs, bool agg_star = (fn ? fn->agg_star : false); bool agg_distinct = (fn ? fn->agg_distinct : false); bool func_variadic = (fn ? fn->func_variadic : false); - bool ignore_nulls = (fn ? fn->ignore_nulls : false); + NullTreatment null_treatment = (fn ? fn->null_treatment : NULL_TREATMENT_NOT_SET); CoercionForm funcformat = (fn ? fn->funcformat : COERCE_EXPLICIT_CALL); bool could_be_projection; Oid rettype; @@ -516,11 +516,12 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs, NameListToString(funcname)), parser_errposition(pstate, location))); - /* It also can't treat nulls as a window function */ - if (ignore_nulls) + /* Aggregate functions cannot have null treatment clause */ + if (null_treatment != NULL_TREATMENT_NOT_SET) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), - errmsg("aggregate functions do not accept RESPECT/IGNORE NULLS"), + errmsg("aggregate function %s cannot have RESPECT NULLS or IGNORE NULLS", + NameListToString(funcname)), parser_errposition(pstate, location))); } } @@ -842,7 +843,8 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs, wfunc->winstar = agg_star; wfunc->winagg = (fdresult == FUNCDETAIL_AGGREGATE); wfunc->aggfilter = agg_filter; - wfunc->ignore_nulls = ignore_nulls; + wfunc->null_treatment = null_treatment; + wfunc->ignore_nulls = (null_treatment == IGNORE_NULLS); wfunc->location = location; /* diff --git a/src/backend/utils/adt/windowfuncs.c b/src/backend/utils/adt/windowfuncs.c index b87a624fb2..297e787927 100644 --- a/src/backend/utils/adt/windowfuncs.c +++ b/src/backend/utils/adt/windowfuncs.c @@ -85,6 +85,9 @@ window_row_number(PG_FUNCTION_ARGS) WindowObject winobj = PG_WINDOW_OBJECT(); int64 curpos = WinGetCurrentPosition(winobj); + /* row_number() does not support null treatment */ + ErrorOutNullTreatment(winobj); + WinSetMarkPosition(winobj, curpos); PG_RETURN_INT64(curpos + 1); } @@ -140,6 +143,9 @@ window_rank(PG_FUNCTION_ARGS) rank_context *context; bool up; + /* rank() does not support null treatment */ + ErrorOutNullTreatment(winobj); + up = rank_up(winobj); context = (rank_context *) WinGetPartitionLocalMemory(winobj, sizeof(rank_context)); @@ -202,6 +208,9 @@ window_dense_rank(PG_FUNCTION_ARGS) rank_context *context; bool up; + /* dense_rank() does not support null treatment */ + ErrorOutNullTreatment(winobj); + up = rank_up(winobj); context = (rank_context *) WinGetPartitionLocalMemory(winobj, sizeof(rank_context)); @@ -266,6 +275,9 @@ window_percent_rank(PG_FUNCTION_ARGS) Assert(totalrows > 0); + /* percent_rank() does not support null treatment */ + ErrorOutNullTreatment(winobj); + up = rank_up(winobj); context = (rank_context *) WinGetPartitionLocalMemory(winobj, sizeof(rank_context)); @@ -335,6 +347,9 @@ window_cume_dist(PG_FUNCTION_ARGS) Assert(totalrows > 0); + /* cume_dist() does not support null treatment */ + ErrorOutNullTreatment(winobj); + up = rank_up(winobj); context = (rank_context *) WinGetPartitionLocalMemory(winobj, sizeof(rank_context)); @@ -412,6 +427,9 @@ window_ntile(PG_FUNCTION_ARGS) WindowObject winobj = PG_WINDOW_OBJECT(); ntile_context *context; + /* ntile() does not support null treatment */ + ErrorOutNullTreatment(winobj); + context = (ntile_context *) WinGetPartitionLocalMemory(winobj, sizeof(ntile_context)); diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 073e2469ba..32fbab46a0 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -432,6 +432,7 @@ typedef struct FuncCall bool agg_distinct; /* arguments were labeled DISTINCT */ bool func_variadic; /* last argument was labeled VARIADIC */ CoercionForm funcformat; /* how to display this node */ + NullTreatment null_treatment; /* RESPECT_NULLS or IGNORE NULLS */ int location; /* token location, or -1 if unknown */ } FuncCall; diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h index 221b5e6218..545b5e5ac8 100644 --- a/src/include/nodes/primnodes.h +++ b/src/include/nodes/primnodes.h @@ -532,6 +532,13 @@ typedef struct GroupingFunc int location; } GroupingFunc; +typedef enum NullTreatment +{ + NULL_TREATMENT_NOT_SET = 0, + RESPECT_NULLS, + IGNORE_NULLS +} NullTreatment; + /* * WindowFunc * @@ -559,7 +566,8 @@ typedef struct WindowFunc bool winstar pg_node_attr(query_jumble_ignore); /* is function a simple aggregate? */ bool winagg pg_node_attr(query_jumble_ignore); - /* ignore nulls */ + /* null treatement */ + NullTreatment null_treatment pg_node_attr(query_jumble_ignore); bool ignore_nulls; /* token location, or -1 if unknown */ int location; diff --git a/src/include/windowapi.h b/src/include/windowapi.h index b8c2c565d1..8a50478ee9 100644 --- a/src/include/windowapi.h +++ b/src/include/windowapi.h @@ -61,4 +61,6 @@ extern Datum WinGetFuncArgInFrame(WindowObject winobj, int argno, extern Datum WinGetFuncArgCurrent(WindowObject winobj, int argno, bool *isnull); +extern void ErrorOutNullTreatment(WindowObject winobj); + #endif /* WINDOWAPI_H */
On Sat, 6 May 2023, 04:57 Tatsuo Ishii, <ishii@sraoss.co.jp> wrote:
Attached is the patch to implement this (on top of your patch).
test=# SELECT row_number() RESPECT NULLS OVER () FROM (SELECT 1) AS s;
ERROR: window function row_number cannot have RESPECT NULLS or IGNORE NULLS
The last time this was discussed (https://www.postgresql.org/message-id/1037735.1610402426%40sss.pgh.pa.us) it was suggested to make the feature generalizable, beyond what the standard says it should be limited to.
With it generalizable, there would need to be extra checks for custom functions, such as if they allow multiple column arguments (which I'll add in v2 of the patch if the design's accepted).
So I think we need a consensus on whether to stick to limiting it to several specific functions, or making it generalized yet agreeing the rules to limit it (such as no agg functions, and no functions with multiple column arguments).
> The last time this was discussed ( > https://www.postgresql.org/message-id/1037735.1610402426%40sss.pgh.pa.us) > it was suggested to make the feature generalizable, beyond what the > standard says it should be limited to. I have read the mail. In my understanding nobody said that standard window functions should all accept the null treatment clause. Also Tom said: https://www.postgresql.org/message-id/5567.1537884439%40sss.pgh.pa.us > The questions of how we interface to the individual window functions > are really independent of how we handle the parsing problem. My > first inclination is to just pass the flags down to the window functions > (store them in WindowObject and provide some additional inquiry functions > in windowapi.h) and let them deal with it. As I said before I totally agree with this. With my patch if a (custom) window function does not want to accept null treatment clause, it just calls ErrorOutNullTreatment(). It will raise an error if IGNORE NULLS or RESPECT NULLS is provided. If it does call the function, it is up to the function how to deal with the null treatment. In another word, the infrastructure does not have fixed rules to allow/disallow null treatment clause for each window function. It's "delegated" to each window function. Anyway we can change the rule for other than nth_value etc. later easily once my patch is brought in. > With it generalizable, there would need to be extra checks for custom > functions, such as if they allow multiple column arguments (which I'll add > in v2 of the patch if the design's accepted). I am not sure if allowing-multiple-column-arguments patch should be provided with null-treatment patch. > So I think we need a consensus on whether to stick to limiting it to > several specific functions, or making it generalized yet agreeing the rules > to limit it (such as no agg functions, and no functions with multiple > column arguments). Let's see the discussion... Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
On 9/7/24 22:25, Oliver Ford wrote: > On Sat, May 6, 2023 at 9:41 AM Oliver Ford <ojford@gmail.com> wrote: >> >> >> >> On Sat, 6 May 2023, 04:57 Tatsuo Ishii, <ishii@sraoss.co.jp> wrote: >>> >>> Attached is the patch to implement this (on top of your patch). >>> >>> test=# SELECT row_number() RESPECT NULLS OVER () FROM (SELECT 1) AS s; >>> ERROR: window function row_number cannot have RESPECT NULLS or IGNORE NULLS >> >> >> The last time this was discussed (https://www.postgresql.org/message-id/1037735.1610402426%40sss.pgh.pa.us) it was suggestedto make the feature generalizable, beyond what the standard says it should be limited to. >> >> With it generalizable, there would need to be extra checks for custom functions, such as if they allow multiple columnarguments (which I'll add in v2 of the patch if the design's accepted). >> >> So I think we need a consensus on whether to stick to limiting it to several specific functions, or making it generalizedyet agreeing the rules to limit it (such as no agg functions, and no functions with multiple column arguments). > > Reviving this thread, I've attached a rebased patch with code, docs, > and tests and added it to November commitfest. Excellent! One of these days we'll get this in. :-) I have a problem with this test, though: SELECT sum(orbit) RESPECT NULLS OVER () FROM planets; -- succeeds Why should that succeed? Especially since aggregates such as SUM() will ignore nulls! The error message on its partner seems to confirm this: SELECT sum(orbit) IGNORE NULLS OVER () FROM planets; -- fails ERROR: aggregate functions do not accept RESPECT/IGNORE NULLS I believe they should both fail. -- Vik Fearing
>> >> On Sat, 6 May 2023, 04:57 Tatsuo Ishii, <ishii@sraoss.co.jp> wrote: >> >>> >> >>> Attached is the patch to implement this (on top of your patch). >> >>> >> >>> test=# SELECT row_number() RESPECT NULLS OVER () FROM (SELECT 1) AS s; >> >>> ERROR: window function row_number cannot have RESPECT NULLS or IGNORE NULLS >> >> >> >> >> >> The last time this was discussed (https://www.postgresql.org/message-id/1037735.1610402426%40sss.pgh.pa.us) it wassuggested to make the feature generalizable, beyond what the standard says it should be limited to. >> >> >> >> With it generalizable, there would need to be extra checks for custom functions, such as if they allow multiple columnarguments (which I'll add in v2 of the patch if the design's accepted). >> >> >> >> So I think we need a consensus on whether to stick to limiting it to several specific functions, or making it generalizedyet agreeing the rules to limit it (such as no agg functions, and no functions with multiple column arguments). It seems you allow to use IGNORE NULLS for all window functions. If the case, you should explicitely stat that in the docs. Otherwise users will be confused because; 1) The SQL standard says IGNORE NULLS only for lead, lag, first_value, last_value and nth_value. 2) Some window function returns same rows with IGNORE NULLS/RESPECT NULLS. Consider following case. test=# create table t1(i int); CREATE TABLE test=# insert into t1 values(NULL),(NULL); INSERT 0 2 test=# select * from t1; i --- (2 rows) test=# SELECT row_number() IGNORE NULLS OVER w FROM t1 WINDOW w AS (ORDER BY i); row_number ------------ 1 2 (2 rows) The t1 table only contains NULL rows. By using IGNORE NULLS, I think it's no wonder that a user expects 0 rows returned, if there's no mention in the docs that actually IGNORE NULLS/RESPECT NULLS are just ignored in some window functions. Instead I think it's better that other than lead, lag, first_value, last_value and nth_value each window function errors out if IGNORE NULLS/RESPECT NULL are passed to these window functions. I take a look at the patch and noticed that following functions have no comments on what they are doing and what are the arguments. Please look into other functions in nodeWindowAgg.c and add appropriate comments to those functions. +static void increment_notnulls(WindowObject winobj, int64 pos) +static Datum ignorenulls_getfuncarginpartition(WindowObject winobj, int argno, + int relpos, int seektype, bool set_mark, bool *isnull, bool *isout) { +static Datum ignorenulls_getfuncarginframe(WindowObject winobj, int argno, + int relpos, int seektype, bool set_mark, bool *isnull, bool *isout) { Also the coding style does not fit into our coding standard. They should be written something like: static void increment_notnulls(WindowObject winobj, int64 pos) static Datum ignorenulls_getfuncarginpartition(WindowObject winobj, int argno, int relpos, int seektype, bool set_mark, bool *isnull, bool *isout) { static Datum ignorenulls_getfuncarginframe(WindowObject winobj, int argno, int relpos, int seektype, bool set_mark, bool *isnull, bool *isout) { See also: https://www.postgresql.org/docs/current/source-format.html + int ignore_nulls; /* ignore nulls */ You should add more comment here. I.e. what values are possible for ignore_nulls. I also notice that you have an array in memory which records non-null row positions in a partition. The position is represented in int64, which means 1 entry consumes 8 bytes. If my understanding is correct, the array continues to grow up to the partition size. Also the array is created for each window function (is it really necessary?). I worry about this because it might consume excessive memory for big partitions. Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
On Wednesday, September 11, 2024, Tatsuo Ishii <ishii@postgresql.org> wrote:
test=# SELECT row_number() IGNORE NULLS OVER w FROM t1 WINDOW w AS (ORDER BY i);
row_number
------------
1
2
(2 rows)
The t1 table only contains NULL rows. By using IGNORE NULLS, I think
it's no wonder that a user expects 0 rows returned, if there's no
mention in the docs that actually IGNORE NULLS/RESPECT NULLS are just
ignored in some window functions.
My nieve understanding of the nulls treatment is computations are affected, therefore a zero-argument function is incapable of abiding by this clause (it should error…). Your claim that this should somehow produce zero rows confuses me on two fronts. One, window function should be incapable of affecting how many rows are returned. The query must output two rows regardless of the result of the window expression (it should at worse produce the null value). Two, to produce said null value you have to be ignoring the row due to the order by clause seeing a null. But the order by isn’t part of the computation.
David J.
> On Wednesday, September 11, 2024, Tatsuo Ishii <ishii@postgresql.org> wrote: > >> >> test=# SELECT row_number() IGNORE NULLS OVER w FROM t1 WINDOW w AS (ORDER >> BY i); >> row_number >> ------------ >> 1 >> 2 >> (2 rows) >> >> The t1 table only contains NULL rows. By using IGNORE NULLS, I think >> it's no wonder that a user expects 0 rows returned, if there's no >> mention in the docs that actually IGNORE NULLS/RESPECT NULLS are just >> ignored in some window functions. >> > > My nieve understanding of the nulls treatment is computations are affected, > therefore a zero-argument function is incapable of abiding by this clause > (it should error…). Yes. I actually claimed that row_number() should error out if the clause is provided. > Instead I think it's better that other than lead, lag, first_value, > last_value and nth_value each window function errors out if IGNORE > NULLS/RESPECT NULL are passed to these window functions. > Your claim that this should somehow produce zero rows > confuses me on two fronts. One, window function should be incapable of > affecting how many rows are returned. The query must output two rows > regardless of the result of the window expression (it should at worse > produce the null value). Two, to produce said null value you have to be > ignoring the row due to the order by clause seeing a null. But the order > by isn’t part of the computation. Well I did not claim that. I just gave a possible example what users could misunderstand. Probably my example was not so good. Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Thanks for updating the patch. >> It seems you allow to use IGNORE NULLS for all window functions. If >> the case, you should explicitely stat that in the docs. Otherwise >> users will be confused because; > > The latest version restricts it to lag, lead, first_value, last_value, > and nth_value. We can extend it in a subsequent patch if there's > demand? The restriction is required by the SQL standard. So I don't think we need to extend to other window functions. >> I take a look at the patch and noticed that following functions have >> no comments on what they are doing and what are the arguments. Please >> look into other functions in nodeWindowAgg.c and add appropriate >> comments to those functions. > > Latest version has more comments and should be in the standard coding style. Still I see non standard coding stiles and indentations. See attached patch for nodeWindowAgg.c, which is fixed by pgindent, for example. (Other files may need fixing too). >> I also notice that you have an array in memory which records non-null >> row positions in a partition. The position is represented in int64, >> which means 1 entry consumes 8 bytes. If my understanding is correct, >> the array continues to grow up to the partition size. Also the array >> is created for each window function (is it really necessary?). I worry >> about this because it might consume excessive memory for big >> partitions. > > It's an int64 because it stores the abs_pos/mark_pos which are int64. > Keeping an array for each function is needed for the mark optimization > to work correctly. Ok. Here are some comments regarding the patch: (1) I noticed that ignorenulls_getfuncarginframe() does not take account EXCLUSION frame options. The code path is in WinGetFuncArgInFrame(): /* * Account for exclusion option if one is active, but advance only * abs_pos not mark_pos. This prevents changes of the current * row's peer group from resulting in trying to fetch a row before * some previous mark position. : : I guess ignorenulls_getfuncarginframe() was created by modifying WinGetFuncArgInFrame() so I don't see the reason why ignorenulls_getfuncarginframe() does not take account EXCLUSION frame options. (2) New member ignore_nulls are added to some structs. Its value is 0, 1 or -1. It's better to use a DEFINE for the value of ignore_nulls, rather than 0, 1, or -1. Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c index e1117857dc0..520e7e1bfcb 100644 --- a/src/backend/executor/nodeWindowAgg.c +++ b/src/backend/executor/nodeWindowAgg.c @@ -2624,7 +2624,10 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags) elog(ERROR, "WindowFunc with winref %u assigned to WindowAgg with winref %u", wfunc->winref, node->winref); - /* Look for a previous duplicate window function, which needs the same ignore_nulls value */ + /* + * Look for a previous duplicate window function, which needs the same + * ignore_nulls value + */ for (i = 0; i <= wfuncno; i++) { if (equal(wfunc, perfunc[i].wfunc) && @@ -3379,8 +3382,8 @@ increment_nonnulls(WindowObject winobj, int64 pos) winobj->nonnulls_size *= 2; winobj->win_nonnulls = repalloc_array(winobj->win_nonnulls, - int64, - winobj->nonnulls_size); + int64, + winobj->nonnulls_size); } winobj->win_nonnulls[winobj->nonnulls_len] = pos; winobj->nonnulls_len++; @@ -3394,7 +3397,8 @@ increment_nonnulls(WindowObject winobj, int64 pos) static Datum ignorenulls_getfuncarginpartition(WindowObject winobj, int argno, int relpos, int seektype, bool set_mark, - bool *isnull, bool *isout) { + bool *isnull, bool *isout) +{ WindowAggState *winstate; ExprContext *econtext; TupleTableSlot *slot; @@ -3416,27 +3420,27 @@ ignorenulls_getfuncarginpartition(WindowObject winobj, int argno, switch (seektype) { - case WINDOW_SEEK_CURRENT: - abs_pos = winstate->currentpos; - break; - case WINDOW_SEEK_HEAD: - abs_pos = 0; - break; - case WINDOW_SEEK_TAIL: - spool_tuples(winstate, -1); - abs_pos = winstate->spooled_rows - 1; - break; - default: - elog(ERROR, "unrecognized window seek type: %d", seektype); - abs_pos = 0; /* keep compiler quiet */ - break; + case WINDOW_SEEK_CURRENT: + abs_pos = winstate->currentpos; + break; + case WINDOW_SEEK_HEAD: + abs_pos = 0; + break; + case WINDOW_SEEK_TAIL: + spool_tuples(winstate, -1); + abs_pos = winstate->spooled_rows - 1; + break; + default: + elog(ERROR, "unrecognized window seek type: %d", seektype); + abs_pos = 0; /* keep compiler quiet */ + break; } if (forward == -1) goto check_partition; /* if we're moving forward, store previous rows */ - for (i=0; i < winobj->nonnulls_len; ++i) + for (i = 0; i < winobj->nonnulls_len; ++i) { if (winobj->win_nonnulls[i] > abs_pos) { @@ -3448,7 +3452,7 @@ ignorenulls_getfuncarginpartition(WindowObject winobj, int argno, *isout = false; window_gettupleslot(winobj, abs_pos, slot); econtext->ecxt_outertuple = slot; - return ExecEvalExpr((ExprState *)list_nth(winobj->argstates, argno), + return ExecEvalExpr((ExprState *) list_nth(winobj->argstates, argno), econtext, isnull); } } @@ -3465,13 +3469,13 @@ check_partition: if (isout) *isout = true; *isnull = true; - return (Datum)0; + return (Datum) 0; } if (isout) *isout = false; econtext->ecxt_outertuple = slot; - datum = ExecEvalExpr((ExprState *)list_nth(winobj->argstates, argno), + datum = ExecEvalExpr((ExprState *) list_nth(winobj->argstates, argno), econtext, isnull); if (!*isnull) @@ -3494,7 +3498,8 @@ check_partition: static Datum ignorenulls_getfuncarginframe(WindowObject winobj, int argno, int relpos, int seektype, bool set_mark, - bool *isnull, bool *isout) { + bool *isnull, bool *isout) +{ WindowAggState *winstate; ExprContext *econtext; TupleTableSlot *slot; @@ -3511,7 +3516,7 @@ ignorenulls_getfuncarginframe(WindowObject winobj, int argno, winstate = winobj->winstate; econtext = winstate->ss.ps.ps_ExprContext; slot = winstate->temp_slot_1; - datum = (Datum)0; + datum = (Datum) 0; notnull_offset = 0; notnull_relpos = abs(relpos); @@ -3549,70 +3554,72 @@ ignorenulls_getfuncarginframe(WindowObject winobj, int argno, */ for (i = 0; i < winobj->nonnulls_len; ++i) { - int inframe; - if (winobj->win_nonnulls[i] < winobj->markpos) - continue; - if (!window_gettupleslot(winobj, winobj->win_nonnulls[i], slot)) - continue; + int inframe; - inframe = row_is_in_frame(winstate, winobj->win_nonnulls[i], slot); - if (inframe <= 0) - { - if (inframe == -1 && set_mark) - WinSetMarkPosition(winobj, winobj->win_nonnulls[i]); - continue; - } + if (winobj->win_nonnulls[i] < winobj->markpos) + continue; + if (!window_gettupleslot(winobj, winobj->win_nonnulls[i], slot)) + continue; - abs_pos = winobj->win_nonnulls[i] + 1; - ++notnull_offset; + inframe = row_is_in_frame(winstate, winobj->win_nonnulls[i], slot); + if (inframe <= 0) + { + if (inframe == -1 && set_mark) + WinSetMarkPosition(winobj, winobj->win_nonnulls[i]); + continue; + } - if (notnull_offset > notnull_relpos) - { - if (isout) + abs_pos = winobj->win_nonnulls[i] + 1; + ++notnull_offset; + + if (notnull_offset > notnull_relpos) + { + if (isout) *isout = false; - econtext->ecxt_outertuple = slot; - return ExecEvalExpr((ExprState *)list_nth(winobj->argstates, argno), - econtext, isnull); - } + econtext->ecxt_outertuple = slot; + return ExecEvalExpr((ExprState *) list_nth(winobj->argstates, argno), + econtext, isnull); + } } check_frame: do { - int inframe; - if (!window_gettupleslot(winobj, abs_pos, slot)) - goto out_of_frame; + int inframe; - inframe = row_is_in_frame(winstate, abs_pos, slot); - if (inframe == -1) - goto out_of_frame; - else if (inframe == 0) - goto advance; + if (!window_gettupleslot(winobj, abs_pos, slot)) + goto out_of_frame; - gottuple = window_gettupleslot(winobj, abs_pos, slot); + inframe = row_is_in_frame(winstate, abs_pos, slot); + if (inframe == -1) + goto out_of_frame; + else if (inframe == 0) + goto advance; - if (!gottuple) - { - if (isout) - *isout = true; - *isnull = true; - return (Datum)0; - } + gottuple = window_gettupleslot(winobj, abs_pos, slot); + if (!gottuple) + { if (isout) - *isout = false; - econtext->ecxt_outertuple = slot; - datum = ExecEvalExpr((ExprState *)list_nth(winobj->argstates, argno), - econtext, isnull); + *isout = true; + *isnull = true; + return (Datum) 0; + } - if (!*isnull) - { - ++notnull_offset; - increment_nonnulls(winobj, abs_pos); - } + if (isout) + *isout = false; + econtext->ecxt_outertuple = slot; + datum = ExecEvalExpr((ExprState *) list_nth(winobj->argstates, argno), + econtext, isnull); + + if (!*isnull) + { + ++notnull_offset; + increment_nonnulls(winobj, abs_pos); + } advance: - abs_pos += forward; + abs_pos += forward; } while (notnull_offset <= notnull_relpos); return datum; @@ -3660,7 +3667,7 @@ WinGetFuncArgInPartition(WindowObject winobj, int argno, if (winobj->ignore_nulls == 1 && relpos != 0) return ignorenulls_getfuncarginpartition(winobj, argno, relpos, seektype, - set_mark, isnull, isout); + set_mark, isnull, isout); switch (seektype) { @@ -3752,7 +3759,7 @@ WinGetFuncArgInFrame(WindowObject winobj, int argno, if (winobj->ignore_nulls == 1) return ignorenulls_getfuncarginframe(winobj, argno, relpos, seektype, - set_mark, isnull, isout); + set_mark, isnull, isout); switch (seektype) {
Tatsuo Ishii <ishii@postgresql.org> writes: >> The latest version restricts it to lag, lead, first_value, last_value, >> and nth_value. We can extend it in a subsequent patch if there's >> demand? > The restriction is required by the SQL standard. So I don't think we > need to extend to other window functions. The SQL spec does not believe that user-defined window functions are a thing. So its opinion on this point is useless. I would think that IGNORE NULLS is potentially useful for user-defined window functions, and we should not be building anything that restricts the feature to specific functions. regards, tom lane
> Tatsuo Ishii <ishii@postgresql.org> writes: >>> The latest version restricts it to lag, lead, first_value, last_value, >>> and nth_value. We can extend it in a subsequent patch if there's >>> demand? > >> The restriction is required by the SQL standard. So I don't think we >> need to extend to other window functions. > > The SQL spec does not believe that user-defined window functions are a > thing. So its opinion on this point is useless. Of course the standard does not mention anything about the user defined window functions and the restriction is not apply to the user defined window functions. > I would think that > IGNORE NULLS is potentially useful for user-defined window functions, > and we should not be building anything that restricts the feature to > specific functions. So you want to allow to use IGNORE NULLS to other built-in window functions? -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Tatsuo Ishii <ishii@postgresql.org> writes: >> I would think that >> IGNORE NULLS is potentially useful for user-defined window functions, >> and we should not be building anything that restricts the feature to >> specific functions. > So you want to allow to use IGNORE NULLS to other built-in window > functions? No, there needs to be a way for the individual window function to throw error if that's specified for a function that can't handle it. I'm just saying I don't want that to be hard-wired in some centralized spot. regards, tom lane
>> So you want to allow to use IGNORE NULLS to other built-in window >> functions? > > No, there needs to be a way for the individual window function to > throw error if that's specified for a function that can't handle it. > I'm just saying I don't want that to be hard-wired in some centralized > spot. I agree. That's the right direction. -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
On Mon, Jan 20, 2025 at 12:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Tatsuo Ishii <ishii@postgresql.org> writes: > >> I would think that > >> IGNORE NULLS is potentially useful for user-defined window functions, > >> and we should not be building anything that restricts the feature to > >> specific functions. > > > So you want to allow to use IGNORE NULLS to other built-in window > > functions? > > No, there needs to be a way for the individual window function to > throw error if that's specified for a function that can't handle it. > I'm just saying I don't want that to be hard-wired in some centralized > spot. Would it be acceptable to add a bool column to pg_proc, say "pronulltreatment"? It would default to false, and an error would be thrown if the null clause is specified for a function where it's set to false?
>> No, there needs to be a way for the individual window function to >> throw error if that's specified for a function that can't handle it. >> I'm just saying I don't want that to be hard-wired in some centralized >> spot. > > Would it be acceptable to add a bool column to pg_proc, say > "pronulltreatment"? It would default to false, and an error would be > thrown if the null clause is specified for a function where it's set > to false? It needs lots of work including modifying CREATE FUNCTION command. Instead you could add an API to WinObject access functions to export ignore_nulls value. Then let each window function check it. If the window function should not take IGNORE/RESPECT NULLS option, throw an error. Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Tatsuo Ishii <ishii@postgresql.org> writes: > It needs lots of work including modifying CREATE FUNCTION > command. Instead you could add an API to WinObject access functions to > export ignore_nulls value. Then let each window function check it. If > the window function should not take IGNORE/RESPECT NULLS option, throw > an error. Yeah, that would be my first thought too. The only question is whether a function that fails to check that could crash. If it merely gives surprising answers, I think this way is fine. regards, tom lane
> Attached version moves the setting of IGNORE_NULLS to the window > function itself, with the functions that don't allow it erroring out. > This is done with a new api: WinCheckAndInitializeNullTreatment. > > Custom functions that don't call this will simply not have the > IGNORE_NULLS option set as this api initializes the option and the > array. As per the previous discussion, it should have correct > formatting and handle the Exclusion clauses correctly. I played with the v4 patch. It seems lead() produces incorrect result: test=# SELECT x,y,lead(y) IGNORE NULLS OVER (ORDER BY x) FROM (VALUES(1,NULL),(2,2),(3,NULL)) AS v(x,y); x | y | lead ---+---+------ 1 | | 2 2 | 2 | 2 3 | | 2 (3 rows) I think correct result of "lead" column is 2, NULL, NULL. Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
> Hello, > I also played with the v4 patch and it produces correct result: > test=# SELECT x,y,lead(y) IGNORE NULLS OVER (ORDER BY x) FROM > (VALUES(1,NULL),(2,2),(3,NULL)) AS v(x,y); > x | y | lead > ---+---+------ > 1 | | 2 > 2 | 2 | > 3 | | > (3 rows) > > test=# > It is from today's git, clean compile and install with only v4 patch > applied, make check also passes without errors. I guess you are just lucky. In my case I enabled --enable-cassert to build PostgreSQL and it automatically turn on CLOBBER_FREED_MEMORY and freed memory area is scrambled. If I look the patch closer, I found a problem: +void +WinCheckAndInitializeNullTreatment(WindowObject winobj, : : + winobj->win_nonnulls = palloc_array(int64, 16); WinCheckAndInitializeNullTreatment is called in each built-in window function. Window functions are called in the per tuple memory context, which means win_nonnulls disappears when next tuple is supplied to the window function. If my understanding is correct, winobj->win_nonnulls needs to survive across processing tuples. Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Hi,
I was able to reproduce exactly the problem, with clean compile and --enable-cassert:
test=# SELECT x,y,lead(y) IGNORE NULLS OVER (ORDER BY x) FROM (VALUES(1,NULL),(2,2),(3,NULL)) AS v(x,y);
x | y | lead
---+---+------
1 | | 2
2 | 2 | 2
3 | | 2
(3 rows)
test=#
x | y | lead
---+---+------
1 | | 2
2 | 2 | 2
3 | | 2
(3 rows)
test=#
Also, make check errors out at window test (without --enable-cassert it was passed in previous compile):
krasiyan@fedora:~/pgsql-src/postgresql$ cat /home/krasiyan/pgsql-src/postgresql/src/test/regress/regression.diffs
diff -U3 /home/krasiyan/pgsql-src/postgresql/src/test/regress/expected/window.out /home/krasiyan/pgsql-src/postgresql/src/test/regress/results/window.out
--- /home/krasiyan/pgsql-src/postgresql/src/test/regress/expected/window.out 2025-01-22 21:25:47.114508215 +0200
+++ /home/krasiyan/pgsql-src/postgresql/src/test/regress/results/window.out 2025-01-23 07:58:26.784659592 +0200
@@ -5477,12 +5477,12 @@
name | orbit | lead | lead_respect | lead_ignore
---------+-------+-------+--------------+-------------
earth | | 4332 | 4332 | 4332
- jupiter | 4332 | | | 88
+ jupiter | 4332 | | |
mars | | 88 | 88 | 88
mercury | 88 | 60182 | 60182 | 60182
neptune | 60182 | 90560 | 90560 | 90560
pluto | 90560 | 24491 | 24491 | 24491
- saturn | 24491 | | | 224
+ saturn | 24491 | | |
uranus | | 224 | 224 | 224
venus | 224 | | |
xyzzy | | | |
@@ -5577,13 +5577,13 @@
name | orbit | first_value | last_value | nth_value | lead_ignore | lag_ignore
---------+-------+-------------+------------+-----------+-------------+------------
earth | | 4332 | 4332 | | 4332 |
- jupiter | 4332 | 88 | 88 | | 88 |
- mars | | 4332 | 60182 | 88 | 88 | 4332
- mercury | 88 | 4332 | 90560 | 60182 | 60182 | 4332
+ jupiter | 4332 | 88 | 88 | | 60182 |
+ mars | | 88 | 60182 | 60182 | 60182 | 4332
+ mercury | 88 | 4332 | 90560 | 90560 | 90560 | 4332
neptune | 60182 | 88 | 24491 | 90560 | 90560 | 88
- pluto | 90560 | 88 | 24491 | 60182 | 24491 | 60182
- saturn | 24491 | 60182 | 224 | 90560 | 224 | 90560
- uranus | | 90560 | 224 | 24491 | 224 | 24491
+ pluto | 90560 | 88 | 24491 | 60182 | 60182 | 60182
+ saturn | 24491 | 60182 | 224 | 90560 | 90560 | 90560
+ uranus | | 90560 | 224 | 24491 | 24491 | 24491
venus | 224 | 24491 | 24491 | | | 24491
xyzzy | | 224 | 224 | | | 224
(10 rows)
@@ -5646,14 +5646,14 @@
name | orbit | first_value | last_value | nth_value | lead_ignore | lag_ignore
---------+-------+-------------+------------+-----------+-------------+------------
earth | | | | | 88 |
- jupiter | | 88 | 88 | | 88 |
- mars | | 88 | 60182 | 60182 | 88 |
+ jupiter | | 88 | 88 | | 60182 |
+ mars | | 88 | 60182 | 60182 | 60182 |
mercury | 88 | 88 | 90560 | 60182 | 60182 |
- neptune | 60182 | 88 | 24491 | 60182 | 90560 | 88
- pluto | 90560 | 88 | 24491 | 60182 | 24491 | 60182
- saturn | 24491 | 60182 | 224 | 90560 | 224 | 90560
- uranus | | 90560 | 224 | 24491 | 224 | 24491
- venus | 224 | 24491 | 224 | 224 | | 24491
+ neptune | 60182 | 88 | 24491 | 60182 | 60182 | 88
+ pluto | 90560 | 88 | 24491 | 60182 | 60182 | 60182
+ saturn | 24491 | 60182 | 224 | 90560 | 90560 | 90560
+ uranus | | 90560 | 224 | 24491 | 24491 | 24491
+ venus | 224 | 24491 | 224 | 224 | 224 | 24491
xyzzy | | 224 | 224 | | | 224
(10 rows)
diff -U3 /home/krasiyan/pgsql-src/postgresql/src/test/regress/expected/window.out /home/krasiyan/pgsql-src/postgresql/src/test/regress/results/window.out
--- /home/krasiyan/pgsql-src/postgresql/src/test/regress/expected/window.out 2025-01-22 21:25:47.114508215 +0200
+++ /home/krasiyan/pgsql-src/postgresql/src/test/regress/results/window.out 2025-01-23 07:58:26.784659592 +0200
@@ -5477,12 +5477,12 @@
name | orbit | lead | lead_respect | lead_ignore
---------+-------+-------+--------------+-------------
earth | | 4332 | 4332 | 4332
- jupiter | 4332 | | | 88
+ jupiter | 4332 | | |
mars | | 88 | 88 | 88
mercury | 88 | 60182 | 60182 | 60182
neptune | 60182 | 90560 | 90560 | 90560
pluto | 90560 | 24491 | 24491 | 24491
- saturn | 24491 | | | 224
+ saturn | 24491 | | |
uranus | | 224 | 224 | 224
venus | 224 | | |
xyzzy | | | |
@@ -5577,13 +5577,13 @@
name | orbit | first_value | last_value | nth_value | lead_ignore | lag_ignore
---------+-------+-------------+------------+-----------+-------------+------------
earth | | 4332 | 4332 | | 4332 |
- jupiter | 4332 | 88 | 88 | | 88 |
- mars | | 4332 | 60182 | 88 | 88 | 4332
- mercury | 88 | 4332 | 90560 | 60182 | 60182 | 4332
+ jupiter | 4332 | 88 | 88 | | 60182 |
+ mars | | 88 | 60182 | 60182 | 60182 | 4332
+ mercury | 88 | 4332 | 90560 | 90560 | 90560 | 4332
neptune | 60182 | 88 | 24491 | 90560 | 90560 | 88
- pluto | 90560 | 88 | 24491 | 60182 | 24491 | 60182
- saturn | 24491 | 60182 | 224 | 90560 | 224 | 90560
- uranus | | 90560 | 224 | 24491 | 224 | 24491
+ pluto | 90560 | 88 | 24491 | 60182 | 60182 | 60182
+ saturn | 24491 | 60182 | 224 | 90560 | 90560 | 90560
+ uranus | | 90560 | 224 | 24491 | 24491 | 24491
venus | 224 | 24491 | 24491 | | | 24491
xyzzy | | 224 | 224 | | | 224
(10 rows)
@@ -5646,14 +5646,14 @@
name | orbit | first_value | last_value | nth_value | lead_ignore | lag_ignore
---------+-------+-------------+------------+-----------+-------------+------------
earth | | | | | 88 |
- jupiter | | 88 | 88 | | 88 |
- mars | | 88 | 60182 | 60182 | 88 |
+ jupiter | | 88 | 88 | | 60182 |
+ mars | | 88 | 60182 | 60182 | 60182 |
mercury | 88 | 88 | 90560 | 60182 | 60182 |
- neptune | 60182 | 88 | 24491 | 60182 | 90560 | 88
- pluto | 90560 | 88 | 24491 | 60182 | 24491 | 60182
- saturn | 24491 | 60182 | 224 | 90560 | 224 | 90560
- uranus | | 90560 | 224 | 24491 | 224 | 24491
- venus | 224 | 24491 | 224 | 224 | | 24491
+ neptune | 60182 | 88 | 24491 | 60182 | 60182 | 88
+ pluto | 90560 | 88 | 24491 | 60182 | 60182 | 60182
+ saturn | 24491 | 60182 | 224 | 90560 | 90560 | 90560
+ uranus | | 90560 | 224 | 24491 | 24491 | 24491
+ venus | 224 | 24491 | 224 | 224 | 224 | 24491
xyzzy | | 224 | 224 | | | 224
(10 rows)
На чт, 23.01.2025 г. в 6:25 Tatsuo Ishii <ishii@postgresql.org> написа:
> Hello,
> I also played with the v4 patch and it produces correct result:
> test=# SELECT x,y,lead(y) IGNORE NULLS OVER (ORDER BY x) FROM
> (VALUES(1,NULL),(2,2),(3,NULL)) AS v(x,y);
> x | y | lead
> ---+---+------
> 1 | | 2
> 2 | 2 |
> 3 | |
> (3 rows)
>
> test=#
> It is from today's git, clean compile and install with only v4 patch
> applied, make check also passes without errors.
I guess you are just lucky. In my case I enabled --enable-cassert to
build PostgreSQL and it automatically turn on CLOBBER_FREED_MEMORY and
freed memory area is scrambled. If I look the patch closer, I found a
problem:
+void
+WinCheckAndInitializeNullTreatment(WindowObject winobj,
:
:
+ winobj->win_nonnulls = palloc_array(int64, 16);
WinCheckAndInitializeNullTreatment is called in each built-in window
function. Window functions are called in the per tuple memory context,
which means win_nonnulls disappears when next tuple is supplied to the
window function. If my understanding is correct, winobj->win_nonnulls
needs to survive across processing tuples.
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
> Hi, > I was able to reproduce exactly the problem, with clean compile > and --enable-cassert: > test=# SELECT x,y,lead(y) IGNORE NULLS OVER (ORDER BY x) FROM > (VALUES(1,NULL),(2,2),(3,NULL)) AS v(x,y); > x | y | lead > ---+---+------ > 1 | | 2 > 2 | 2 | 2 > 3 | | 2 > (3 rows) > > test=# > Also, make check errors out at window test (without --enable-cassert it was > passed in previous compile): Yeah, same here. Another possible problem is, probably the code does not work well if there are multiple partitions. Since win_nonnulls stores currentpos in a partition, when the partition ends, win_nonnulls needs to be reset. Otherwise, it mistakenly represents currentpos in the previous partition. Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
> The attached patch should fix both of these. I've added extra tests > with a PARTITION BY in the window clause to test for multiple > partitions. I have looked through the v5 patch. Here are review comments. From 5268754b33103fefc511b57ec546103899f70dbe Mon Sep 17 00:00:00 2001 From: Oliver Ford <ojford@gmail.com> Date: Thu, 23 Jan 2025 20:11:17 +0000 Subject: [PATCH] :ignore nulls --- diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c index 9a1acce2b5..d93a44633e 100644 --- a/src/backend/executor/nodeWindowAgg.c +++ b/src/backend/executor/nodeWindowAgg.c @@ -69,6 +69,10 @@ typedef struct WindowObjectData int readptr; /* tuplestore read pointer for this fn */ int64 markpos; /* row that markptr is positioned on */ int64 seekpos; /* row that readptr is positioned on */ + int ignore_nulls; /* ignore nulls */ + int64 *win_nonnulls; /* tracks non-nulls in ignore nulls mode */ After ignore_nulls, there will be a 4-byte hole because win_nonnulls is an 8-byte variable. It would be better to swap them. @@ -1263,6 +1268,15 @@ begin_partition(WindowAggState *winstate) winobj->markpos = -1; winobj->seekpos = -1; + + + /* reallocate null check */ + if (perfuncstate->winobj->ignore_nulls == IGNORE_NULLS) + { + perfuncstate->winobj->win_nonnulls = palloc_array(int64, 16); + perfuncstate->winobj->nonnulls_size = 16; Those 2 lines above are not necessary. Since win_nonnulls are allocated in ExecInitWindowAgg() in the per query query context, it survives across partitions. You only need initialize nonnulls_len to 0. + perfuncstate->winobj->nonnulls_len = 0; + } } } @@ -1383,7 +1397,9 @@ release_partition(WindowAggState *winstate) /* Release any partition-local state of this window function */ if (perfuncstate->winobj) + { perfuncstate->winobj->localmem = NULL; + } You accidentally added unnecessary curly braces. @@ -2679,6 +2698,13 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags) winobj->argstates = wfuncstate->args; winobj->localmem = NULL; perfuncstate->winobj = winobj; + winobj->ignore_nulls = wfunc->ignore_nulls; + if (winobj->ignore_nulls == PARSER_IGNORE_NULLS) + { + winobj->win_nonnulls = palloc_array(int64, 16); + winobj->nonnulls_size = 16; + winobj->nonnulls_len = 0; + } I don't like to see two "16" here. It would better to use #define or something. It will be better to declare the prototype of increment_nonnulls, ignorenulls_getfuncarginpartition, and ignorenulls_getfuncarginframe in the begging of the file as other static functions already do. +/* + * ignorenulls_getfuncarginframe + * For IGNORE NULLS, get the next nonnull value in the frame, moving forward or backward + * until we find a value or reach the frame's end. + */ +static Datum +ignorenulls_getfuncarginframe(WindowObject winobj, int argno, Do you assume that win_nonnulls is sorted by pos? I think it's necessarily true that pos in win_nonnulls array is sorted. Is that ok? + /* + * Store previous rows. Only possible in SEEK_HEAD mode + */ + for (i = 0; i < winobj->nonnulls_len; ++i) + { + int inframe; + + if (winobj->win_nonnulls[i] < winobj->markpos) There are too many "winobj->win_nonnulls[i]". You could assign to a variable "winobj->win_nonnulls[i]" and use the variable. + continue; + if (!window_gettupleslot(winobj, winobj->win_nonnulls[i], slot)) + continue; + + inframe = row_is_in_frame(winstate, winobj->win_nonnulls[i], slot); + if (inframe <= 0) + { + if (inframe == -1 && set_mark) + WinSetMarkPosition(winobj, winobj->win_nonnulls[i]); I think in most cases inframe returns 0 and WinSetMarkPosition is not called. What use case do you have in your mind when inframe is -1? +check_frame: + do + { + int inframe; + + if (!window_gettupleslot(winobj, abs_pos, slot)) + goto out_of_frame; + + inframe = row_is_in_frame(winstate, abs_pos, slot); + if (inframe == -1) + goto out_of_frame; + else if (inframe == 0) + goto advance; + + gottuple = window_gettupleslot(winobj, abs_pos, slot); Do you really need to call window_gettupleslot here? It's already called above. --- a/src/include/nodes/primnodes.h +++ b/src/include/nodes/primnodes.h @@ -576,6 +576,18 @@ typedef struct GroupingFunc * Collation information is irrelevant for the query jumbling, as is the * internal state information of the node like "winstar" and "winagg". */ + +/* + * Null Treatment options. If specified, initially set to PARSER_IGNORE + * or PARSER_RESPECT. PARSER_IGNORE_NULLS is then converted to IGNORE_NULLS + * if the window function allows the null treatment clause. + */ +#define IGNORE_NULLS 4 +#define RESPECT_NULLS 3 +#define PARSER_IGNORE_NULLS 2 +#define PARSER_RESPECT_NULLS 1 +#define NO_NULLTREATMENT 0 This looks strange to me. Why do you start the define value from 4 down to 0? Also there is no place to use RESPECT_NULLS. Do we need it? Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
>> +/* >> + * ignorenulls_getfuncarginframe >> + * For IGNORE NULLS, get the next nonnull value in the frame, moving forward or backward >> + * until we find a value or reach the frame's end. >> + */ >> +static Datum >> +ignorenulls_getfuncarginframe(WindowObject winobj, int argno, >> >> Do you assume that win_nonnulls is sorted by pos? I think it's >> necessarily true that pos in win_nonnulls array is sorted. Is that ok? > > Yes it must be sorted on my understanding of the code. Then the patch has a problem. I ran a query below and examined win_nonnulls. It seems it was not sorted out. SELECT x,y, nth_value(y,1) IGNORE NULLS OVER w FROM (VALUES (1,1), (2,2), (3,NULL), (4,4), (5,NULL), (6,6), (7,7)) AS t(x,y) WINDOW w AS (ORDER BY x ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE CURRENT ROW); (gdb) p *winobj->win_nonnulls @ winobj->nonnulls_len $8 = {1, 0, 3, 6, 5} Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
On Tue, Jan 28, 2025 at 9:02 AM Tatsuo Ishii <ishii@postgresql.org> wrote: > > >> +/* > >> + * ignorenulls_getfuncarginframe > >> + * For IGNORE NULLS, get the next nonnull value in the frame, moving forward or backward > >> + * until we find a value or reach the frame's end. > >> + */ > >> +static Datum > >> +ignorenulls_getfuncarginframe(WindowObject winobj, int argno, > >> > >> Do you assume that win_nonnulls is sorted by pos? I think it's > >> necessarily true that pos in win_nonnulls array is sorted. Is that ok? > > > > Yes it must be sorted on my understanding of the code. > > Then the patch has a problem. I ran a query below and examined > win_nonnulls. It seems it was not sorted out. > > SELECT > x,y, > nth_value(y,1) IGNORE NULLS OVER w > FROM (VALUES (1,1), (2,2), (3,NULL), (4,4), (5,NULL), (6,6), (7,7)) AS t(x,y) > WINDOW w AS (ORDER BY x ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE CURRENT ROW); > > > (gdb) p *winobj->win_nonnulls @ winobj->nonnulls_len > $8 = {1, 0, 3, 6, 5} > I've looked at it again and I think the code is correct, but I miswrote that the array needs to be sorted. The above query returns: x | y | nth_value ---+---+----------- 1 | 1 | 2 2 | 2 | 1 3 | | 2 4 | 4 | 5 | | 4 6 | 6 | 7 7 | 7 | 6 (7 rows) This is correct, for values of x: 1: The first non-null value of y is at position 0, however we have EXCLUDE CURRENT ROW so it picks the next non-null value at position 1 and stores it in the array, returning 2. 2: We can now take the first non-null value of y at position 0 and store it in the array, returning 1. 3. We take 1 preceding, using the position stored in the array, returning 2. 4. 1 preceding and 1 following are both null, and we exclude the current row, so returning null. 5. 1 preceding is at position 3, store it in the array, returning 4. 6. 1 preceding is null and we exclude the current row, so store position 6 in the array, returning 7. 7. 1 preceding is at position 5, store it in the array and return 6. It will be unordered when the EXCLUDE clause is used but the code should handle this correctly.
> I've looked at it again and I think the code is correct, Good news! I will look into your explanation. > but I > miswrote that the array needs to be sorted. The above query returns: > x | y | nth_value > ---+---+----------- > 1 | 1 | 2 > 2 | 2 | 1 > 3 | | 2 > 4 | 4 | > 5 | | 4 > 6 | 6 | 7 > 7 | 7 | 6 > (7 rows) > > This is correct, for values of x: > > 1: The first non-null value of y is at position 0, however we have > EXCLUDE CURRENT ROW so it picks the next non-null value at position 1 > and stores it in the array, returning 2. > 2: We can now take the first non-null value of y at position 0 and > store it in the array, returning 1. > 3. We take 1 preceding, using the position stored in the array, returning 2. > 4. 1 preceding and 1 following are both null, and we exclude the > current row, so returning null. > 5. 1 preceding is at position 3, store it in the array, returning 4. > 6. 1 preceding is null and we exclude the current row, so store > position 6 in the array, returning 7. > 7. 1 preceding is at position 5, store it in the array and return 6. > > It will be unordered when the EXCLUDE clause is used but the code > should handle this correctly. Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
> I've looked at it again and I think the code is correct, but I > miswrote that the array needs to be sorted. The above query returns: > x | y | nth_value > ---+---+----------- > 1 | 1 | 2 > 2 | 2 | 1 > 3 | | 2 > 4 | 4 | > 5 | | 4 > 6 | 6 | 7 > 7 | 7 | 6 > (7 rows) > > This is correct, for values of x: > > 1: The first non-null value of y is at position 0, however we have > EXCLUDE CURRENT ROW so it picks the next non-null value at position 1 > and stores it in the array, returning 2. > 2: We can now take the first non-null value of y at position 0 and > store it in the array, returning 1. > 3. We take 1 preceding, using the position stored in the array, returning 2. > 4. 1 preceding and 1 following are both null, and we exclude the > current row, so returning null. > 5. 1 preceding is at position 3, store it in the array, returning 4. > 6. 1 preceding is null and we exclude the current row, so store > position 6 in the array, returning 7. > 7. 1 preceding is at position 5, store it in the array and return 6. > > It will be unordered when the EXCLUDE clause is used but the code > should handle this correctly. I ran this query (not using IGNORE NULLS) and get a result. SELECT x, nth_value(x,2) OVER w FROM generate_series(1,5) g(x) WINDOW w AS (ORDER BY x ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING EXCLUDE CURRENT ROW); x | nth_value ---+----------- 1 | 3 2 | 3 3 | 2 4 | 3 5 | 4 (5 rows) Since there's no NULL in x column, I expected the same result using IGNORE NULLS, but it was not: SELECT x, nth_value(x,2) IGNORE NULLS OVER w FROM generate_series(1,5) g(x) WINDOW w AS (ORDER BY x ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING EXCLUDE CURRENT ROW); x | nth_value ---+----------- 1 | 3 2 | 4 3 | 4 4 | 3 5 | 4 (5 rows) I suspect the difference is in the code path of ignorenulls_getfuncarginframe and the code path in WinGetFuncArgInFrame, which takes care of EXCLUDE like this. case FRAMEOPTION_EXCLUDE_CURRENT_ROW: if (abs_pos >= winstate->currentpos && winstate->currentpos >= winstate->frameheadpos) abs_pos++; Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
> Attached version doesn't use the nonnulls array if an Exclude is > specified, as I think it's not going to work with exclusions (as it's > only an optimization, this is ok and can be taken out entirely if you > prefer). I've also added your tests above to the tests. I applied the v7 patch and ran regression and tap test. There was no errors. Great! BTW, I noticed that in the code path where ignorenulls_getfuncarginframe() is called, WinSetMarkPosition() is never called? Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
>> BTW, I noticed that in the code path where >> ignorenulls_getfuncarginframe() is called, WinSetMarkPosition() is >> never called? >> >> Attached version uses the mark_pos at the end. I did simple performance test against v8. EXPLAIN ANALYZE SELECT x, nth_value(x,2) IGNORE NULLS OVER w FROM generate_series(1,$i) g(x) WINDOW w AS (ORDER BY x ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING); I changed $i = 1k, 2k, 3k, 4k, 5k... 10k and got this: Number Time (ms) of rows ---------------- 1000 28.977 2000 96.556 3000 212.019 4000 383.615 5000 587.05 6000 843.23 7000 1196.177 8000 1508.52 9000 1920.593 10000 2514.069 As you can see, when the number of rows = 1k, it took 28 ms. For 10k rows, it took 2514 ms, which is 86 times slower than the 1k case. Can we enhance this? Graph attached. Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Вложения
On Fri, Feb 28, 2025 at 11:49 AM Tatsuo Ishii <ishii@postgresql.org> wrote:
>> BTW, I noticed that in the code path where
>> ignorenulls_getfuncarginframe() is called, WinSetMarkPosition() is
>> never called?
>>
>> Attached version uses the mark_pos at the end.
I did simple performance test against v8.
EXPLAIN ANALYZE
SELECT
x,
nth_value(x,2) IGNORE NULLS OVER w
FROM generate_series(1,$i) g(x)
WINDOW w AS (ORDER BY x ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING);
I changed $i = 1k, 2k, 3k, 4k, 5k... 10k and got this:
Number Time (ms)
of rows
----------------
1000 28.977
2000 96.556
3000 212.019
4000 383.615
5000 587.05
6000 843.23
7000 1196.177
8000 1508.52
9000 1920.593
10000 2514.069
As you can see, when the number of rows = 1k, it took 28 ms. For 10k
rows, it took 2514 ms, which is 86 times slower than the 1k case. Can
we enhance this?
Attached version removes the non-nulls array. That seems to speed everything up. Running the above query with 1 million rows averages 450ms, similar when using lead/lag.
Вложения
> Attached version removes the non-nulls array. That seems to speed > everything up. Running the above query with 1 million rows averages 450ms, > similar when using lead/lag. Great. However, CFbot complains about the patch: https://cirrus-ci.com/task/6364194477441024 Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
On Sun, Mar 9, 2025 at 6:40 AM Tatsuo Ishii <ishii@postgresql.org> wrote:
> Attached version removes the non-nulls array. That seems to speed
> everything up. Running the above query with 1 million rows averages 450ms,
> similar when using lead/lag.
Great. However, CFbot complains about the patch:
https://cirrus-ci.com/task/6364194477441024
Attached fixes the headerscheck locally.
Вложения
On Sun, 9 Mar 2025, 20:07 Oliver Ford, <ojford@gmail.com> wrote:
On Sun, Mar 9, 2025 at 6:40 AM Tatsuo Ishii <ishii@postgresql.org> wrote:> Attached version removes the non-nulls array. That seems to speed
> everything up. Running the above query with 1 million rows averages 450ms,
> similar when using lead/lag.
Great. However, CFbot complains about the patch:
https://cirrus-ci.com/task/6364194477441024Attached fixes the headerscheck locally.
v11 attached because the previous version was broken by commit 8b1b342
Вложения
Hi,
Patch applies and compiles, all included tests passed and after the latest fixes for non-nulls array, performance is near to lead/lag without support of "ignore nulls".
I have been using the last version for more than one month in a production environment with real data and didn't find any bugs, so It is ready for committer status.
Patch applies and compiles, all included tests passed and after the latest fixes for non-nulls array, performance is near to lead/lag without support of "ignore nulls".
I have been using the last version for more than one month in a production environment with real data and didn't find any bugs, so It is ready for committer status.
На чт, 13.03.2025 г. в 9:49 Oliver Ford <ojford@gmail.com> написа:
On Sun, 9 Mar 2025, 20:07 Oliver Ford, <ojford@gmail.com> wrote:On Sun, Mar 9, 2025 at 6:40 AM Tatsuo Ishii <ishii@postgresql.org> wrote:> Attached version removes the non-nulls array. That seems to speed
> everything up. Running the above query with 1 million rows averages 450ms,
> similar when using lead/lag.
Great. However, CFbot complains about the patch:
https://cirrus-ci.com/task/6364194477441024Attached fixes the headerscheck locally.v11 attached because the previous version was broken by commit 8b1b342
> Hi, > Patch applies and compiles, all included tests passed and after the latest > fixes for non-nulls array, performance is near to lead/lag without support > of "ignore nulls". > I have been using the last version for more than one month in a production > environment with real data and didn't find any bugs, so It is ready for > committer status. One thing I worry about the patch is, now the non-nulls array optimization was removed. Since then I have been thinking about if there could be other way to optimize searching for non null rows. Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
> One thing I worry about the patch is, now the non-nulls array > optimization was removed. Since then I have been thinking about if > there could be other way to optimize searching for non null rows. Here is the v12 patch to implement the optimization on top of Oliver's v11 patch. Only src/backend/executor/nodeWindowAgg.c was modified (especially ignorenulls_getfuncarginframe). In the patch I created 2-bit not null information array, representing following status for each row: UNKNOWN: the row is not determined whether it's NULL or NOT yet. This is the initial value. NULL: the row has been determined to be NULL. NOT NULL: the row has been determined to be NOT NULL. In ignorenulls_getfuncarginframe: For the first time window function visits a row in a frame, the row is fetched using window_gettupleslot() and it is checked whether it is in the frame using row_is_in_frame(). If it's in the frame and the information in the array is UNKNOWN, ExecEvalExpr() is executed to find out if the expression on the function argument is NULL or not. And the result (NULL or NOT NULL) is stored in the array. If the information in the array is not UNKNOWN, we can skip calling ExecEvalExpr() because the information is already in the array. Note that I do not skip calling window_gettupleslot() and row_is_in_frame(), skip only calling ExecEvalExpr(), because whether a row is in a frame or not could be changing as the current row position moves while processing window functions. With this technique I observed around 40% speed up in my environment using the script attached, comparing with Oliver's v11 patch. v11: rows duration (msec) 1000 41.019 2000 148.957 3000 248.291 4000 442.478 5000 687.395 v12: rows duration (msec) 1000 27.515 2000 78.913 3000 174.737 4000 311.412 5000 482.156 The patch is now generated using the standard git format-patch. Also I have slightly adjusted the coding style so that it aligns with the one used in nodeWindowAgg.c, and ran pgindent. Note that I have not modified ignorenulls_getfuncarginpartition yet. I think we could optimize it using the not null info infrastructure as well. Will come up with it. Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp for i in 1000 2000 3000 4000 5000 do echo "$i rows: " pos=`expr $i / 2` psql -a test <<EOF \timing explain analyze SELECT x, nth_value(x,$pos) IGNORE NULLS OVER w FROM generate_series(1,$i) g(x) WINDOW w AS (ORDER BY x ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING); EOF done | egrep "rows:|Time:" | egrep -v "Planning|Execution"| sed -e 's/rows: *//' -e 's/Time: //' -e 's/ms//'
Вложения
> Here is the v12 patch to implement the optimization on top of Oliver's > v11 patch. Only src/backend/executor/nodeWindowAgg.c was modified > (especially ignorenulls_getfuncarginframe). In the patch I created > 2-bit not null information array, representing following status for > each row: > > UNKNOWN: the row is not determined whether it's NULL or NOT yet. > This is the initial value. > NULL: the row has been determined to be NULL. > NOT NULL: the row has been determined to be NOT NULL. > > In ignorenulls_getfuncarginframe: > > For the first time window function visits a row in a frame, the row is > fetched using window_gettupleslot() and it is checked whether it is in > the frame using row_is_in_frame(). If it's in the frame and the > information in the array is UNKNOWN, ExecEvalExpr() is executed to > find out if the expression on the function argument is NULL or > not. And the result (NULL or NOT NULL) is stored in the array. > > If the information in the array is not UNKNOWN, we can skip calling > ExecEvalExpr() because the information is already in the array. > > Note that I do not skip calling window_gettupleslot() and > row_is_in_frame(), skip only calling ExecEvalExpr(), because whether a > row is in a frame or not could be changing as the current row position > moves while processing window functions. > > With this technique I observed around 40% speed up in my environment > using the script attached, comparing with Oliver's v11 patch. > > v11: > rows duration (msec) > 1000 41.019 > 2000 148.957 > 3000 248.291 > 4000 442.478 > 5000 687.395 > > v12: > rows duration (msec) > 1000 27.515 > 2000 78.913 > 3000 174.737 > 4000 311.412 > 5000 482.156 > > The patch is now generated using the standard git format-patch. Also > I have slightly adjusted the coding style so that it aligns with the > one used in nodeWindowAgg.c, and ran pgindent. > > Note that I have not modified ignorenulls_getfuncarginpartition yet. I > think we could optimize it using the not null info infrastructure as > well. Will come up with it. Attached is the v13 patch to address this: i.e. optimize window functions (lead/lag) working in a partition. In summary I get 40x speed up comparing with v12 patch. Here is the test script. EXPLAIN ANALYZE SELECT lead(x, 5000) IGNORE NULLS OVER () FROM generate_series(1,10000) g(x); This looks for the 5k th row in a partition including 10k rows. The average duration of 3 trials are: v12: 2563.665 ms v13: 126.259 ms So I got 40x speed up with the v13 patch. In v12, we needed to scan 10k row partition over and over again. In v13 I used the same NULL/NOT NULL cache infrastructure created in v12, and we need to scan the partition only once. This is the reason for the speed up. Also I removed ignorenulls_getfuncarginpartition(), which was the work horse for null treatment for window functions working for partitions in v12 patch. Basically it was a copy and modified version of WinGetFuncArgInPartition(), thus had quite a few code duplication. In v13, instead I modified WinGetFuncArgInPartition() so that it can handle directly null treatment procedures. BTW I am still not satisfied by the performance improvement for window functions for frames, that was only 40%. I will study the code to look for more optimization. Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Вложения
> BTW I am still not satisfied by the performance improvement for window > functions for frames, that was only 40%. I will study the code to look > for more optimization. So I come up with more optimization for window functions working on frames (i.e. first_value, last_value and nth_value). Attached v14 patch does it. There are 3 major functions used here. 1) window_gettupleslot (get a row) 2) row_is_in_frame (check whether row is in frame or not) 3) ExecEvalExpr (evaluate arg on the row) In v12 (and v13), we eliminate #3 in some cases but the saving was only 40%. In v14, I found some cases where we don't need to call #1. row_is_in_frame requires a row ("tuple" argument), which is provided by #1. However row_is_in_frame actually uses the row argument only when frame clause is "RANGE" or "GROUPS" and frame end is "CURRENT ROW". In other cases it does not use "tuple" argument at all. So I check the frame clause and the frame end, and if they are not the case, I can omit #1. Plus if the not null cache for the row has been already created, we can omit #3 as well. The optimization contributes to the performance. I observe 2.7x (1k rows case) to 5.2x (3k rows case) speed up when I compare the performance of v13 patch and v14 patch using the same script (see attached). v13: rows duration (msec) 1000 34.740 2000 91.169 3000 205.847 4000 356.142 5000 557.063 v14: rows duration (msec) 1000 12.807 2000 21.782 3000 39.248 4000 69.123 5000 101.220 I am not sure how the case where frame clause is "RANGE" or "GROUPS" and frame end is "CURRENT ROW" is majority of window function use cases. If it's majority, the optimization in v14 does not help much because v14 does not optimize the case. However if it's not, the v14 patch is close to commitable form, I think. Comments are welcome. Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp for i in 1000 2000 3000 4000 5000 do echo "$i rows: " pos=`expr $i / 2` psql -a test <<EOF \timing explain analyze SELECT x, nth_value(x,$pos) IGNORE NULLS OVER w FROM generate_series(1,$i) g(x) WINDOW w AS (ORDER BY x ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING); EOF done | egrep "rows:|Time:" | egrep -v "Planning|Execution"| sed -e 's/rows: *//' -e 's/Time: //' -e 's/ms//'
Вложения
Hi,
Patch applies and compiles, all included tests passed and performance gain is really impressive. I have been using the latest versions for months with real data and didn't find any bugs, so It is definitely ready for committer status.
Patch applies and compiles, all included tests passed and performance gain is really impressive. I have been using the latest versions for months with real data and didn't find any bugs, so It is definitely ready for committer status.
На пн, 30.06.2025 г. в 8:26 Tatsuo Ishii <ishii@postgresql.org> написа:
Attached is the v15 patch to fix CFbot complains.
Other than that, nothing has been changed since v14.
Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
Krasiyan, > Hi, > Patch applies and compiles, all included tests passed and performance gain > is really impressive. I have been using the latest versions for months with > real data and didn't find any bugs, so It is definitely ready for committer > status. Thanks for testing the patch. The CF status has been already set to "ready for committer". I just changed the target version from 18 to 19. > I am not sure how the case where frame clause is "RANGE" or "GROUPS" > and frame end is "CURRENT ROW" is majority of window function use > cases. If it's majority, the optimization in v14 does not help much > because v14 does not optimize the case. However if it's not, the v14 > patch is close to commitable form, I think. Comments are welcome. Have you tested cases where the frame option is "RANGE" or "GROUPS" and the frame end is "CURRENT ROW"? I am asking because in these cases the optimization in the v14 (and v15) patches do not apply and you may not be satisfied by the performance. Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Attached is the v16 patch. In this patch I have changed row_is_in_frame() API from: static int row_is_in_frame(WindowAggState *winstate, int64 pos, TupleTableSlot *slot); to: static int row_is_in_frame(WindowObject winobj, int64 pos, TupleTableSlot *slot, bool fetch_tuple); The function is used to decide whether a row specified by pos is in a frame or not. Previously we needed to always pass "slot" parameter which is the row in question, fetched by window_gettupleslot. If IGNORE NULLS option is not passed to window functions, this is fine because they need to return the row anyway. However if IGNORE NULLS specified, we need to throw away null rows until we find the non null row requested by the caller. In reality, not in all window frames it is required to pass a row to row_is_in_frame: only when specific frame options are specified. For example RANGE or GROUP options plus CURRENT ROW frame end option. In previous patch, I explicitly checked these frame options before calling row_is_in_frame. However I dislike the way because it's a layer abstraction violation. So in this patch I added "fetch_tuple" option to row_is_in_frame so that it fetches row itself when necessary. A caller now don't need to fetch the row to pass if fetch_tuple is false. This way, not only we can avoid the layer violation problem, but performance is enhanced because tuple is fetched only when it's necessary. Note that now the first argument of row_is_in_frame has been changed from WindowAggState to WindowObject so that row_is_in_frame can call window_gettupleslot inside. Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Вложения
Currently the patch set include some regression test additions. I wanted to expand the test coverage and ended up an idea: generate new test cases from the existing window function regression test (window.sql). Attached "insert_ignore_nulls.sh" script reads window.sql and inserts "ignore nulls" before "over" clause of each window functions that accept IGNORE NULLS option. This way, although the generated test cases do not cover the case where NULL is included, at least covers all non NULL cases, which is better than nothing, I think. I replaced the existing window.sql with the modified one, and ran the regression test. Indeed the test failed because expected file is for non IGNORE NULLS options. However, the differences should be just for SQL statements, not the output of the SQL statements since the data set used does not include NULLs. I did an eyeball check the diff and the result was what I expected. For those who are interested this test, I attached some files. insert_ignore_nulls.sh: shell script to insert "ignore nulls" window.sql: modified regression script by insert_ignore_nulls.sh window.diff: diff of original window.out and modified window.out Question is, how could we put this kind of test into core if it worth the effort? The simplest idea is just adding the modified window.sql to the end of existing window.sql and update window.out. Thoughts? Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp #! /bin/sh # Script to generate test cases for IGNORE NULLS option of some window # functions (lead, lag, first_value, last_value, and nth_value) in the # regression test SQL (window.sql). This script supposed to read # window.sql from stdin and inserts "ignore null" before "over" on all # occurrences of the window functions that accept IGNORE NULLS, then # print it to stdout. Since currently the data set used for the # existing regression test (window.sql) does not include NULLs, the # test result using the results of this script is expected be # identical to the existing results of existing window.sql (of course # except the SQL statements). sed -r \ -e "s/lead\(([^)]*)\) over/lead(\1) ignore nulls over/g" \ -e "s/lag\(([^)]*)\) over/lag(\1) ignore nulls over/g" \ -e "s/first_value\(([^)]*)\) over/first_value(\1) ignore nulls over/g" \ -e "s/last_value\(([^)]*)\) over/last_value(\1) ignore nulls over/g" \ -e "s/nth_value\(([^)]*)\) over/nth_value(\1) ignore nulls over/g" -- -- WINDOW FUNCTIONS -- CREATE TEMPORARY TABLE empsalary ( depname varchar, empno bigint, salary int, enroll_date date ); INSERT INTO empsalary VALUES ('develop', 10, 5200, '2007-08-01'), ('sales', 1, 5000, '2006-10-01'), ('personnel', 5, 3500, '2007-12-10'), ('sales', 4, 4800, '2007-08-08'), ('personnel', 2, 3900, '2006-12-23'), ('develop', 7, 4200, '2008-01-01'), ('develop', 9, 4500, '2008-01-01'), ('sales', 3, 4800, '2007-08-01'), ('develop', 8, 6000, '2006-10-01'), ('develop', 11, 5200, '2007-08-15'); SELECT depname, empno, salary, sum(salary) OVER (PARTITION BY depname) FROM empsalary ORDER BY depname, salary; SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary) FROM empsalary; -- with GROUP BY SELECT four, ten, SUM(SUM(four)) OVER (PARTITION BY four), AVG(ten) FROM tenk1 GROUP BY four, ten ORDER BY four, ten; SELECT depname, empno, salary, sum(salary) OVER w FROM empsalary WINDOW w AS (PARTITION BY depname); SELECT depname, empno, salary, rank() OVER w FROM empsalary WINDOW w AS (PARTITION BY depname ORDER BY salary) ORDER BY rank()OVER w; -- empty window specification SELECT COUNT(*) OVER () FROM tenk1 WHERE unique2 < 10; SELECT COUNT(*) OVER w FROM tenk1 WHERE unique2 < 10 WINDOW w AS (); -- no window operation SELECT four FROM tenk1 WHERE FALSE WINDOW w AS (PARTITION BY ten); -- cumulative aggregate SELECT sum(four) OVER (PARTITION BY ten ORDER BY unique2) AS sum_1, ten, four FROM tenk1 WHERE unique2 < 10; SELECT row_number() OVER (ORDER BY unique2) FROM tenk1 WHERE unique2 < 10; SELECT rank() OVER (PARTITION BY four ORDER BY ten) AS rank_1, ten, four FROM tenk1 WHERE unique2 < 10; SELECT dense_rank() OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10; SELECT percent_rank() OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10; SELECT cume_dist() OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10; SELECT ntile(3) OVER (ORDER BY ten, four), ten, four FROM tenk1 WHERE unique2 < 10; SELECT ntile(NULL) OVER (ORDER BY ten, four), ten, four FROM tenk1 LIMIT 2; SELECT lag(ten) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10; SELECT lag(ten, four) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10; SELECT lag(ten, four, 0) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10; SELECT lag(ten, four, 0.7) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10 ORDER BY four,ten; SELECT lead(ten) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10; SELECT lead(ten * 2, 1) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10; SELECT lead(ten * 2, 1, -1) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10; SELECT lead(ten * 2, 1, -1.4) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10 ORDER BY four,ten; SELECT first_value(ten) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10; -- last_value returns the last row of the frame, which is CURRENT ROW in ORDER BY window. SELECT last_value(four) OVER (ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10; SELECT last_value(ten) OVER (PARTITION BY four), ten, four FROM (SELECT * FROM tenk1 WHERE unique2 < 10 ORDER BY four, ten)s ORDER BY four, ten; SELECT nth_value(ten, four + 1) OVER (PARTITION BY four), ten, four FROM (SELECT * FROM tenk1 WHERE unique2 < 10 ORDER BY four, ten)s; SELECT ten, two, sum(hundred) AS gsum, sum(sum(hundred)) OVER (PARTITION BY two ORDER BY ten) AS wsum FROM tenk1 GROUP BY ten, two; SELECT count(*) OVER (PARTITION BY four), four FROM (SELECT * FROM tenk1 WHERE two = 1)s WHERE unique2 < 10; SELECT (count(*) OVER (PARTITION BY four ORDER BY ten) + sum(hundred) OVER (PARTITION BY four ORDER BY ten))::varchar AS cntsum FROM tenk1 WHERE unique2 < 10; -- opexpr with different windows evaluation. SELECT * FROM( SELECT count(*) OVER (PARTITION BY four ORDER BY ten) + sum(hundred) OVER (PARTITION BY two ORDER BY ten) AS total, count(*) OVER (PARTITION BY four ORDER BY ten) AS fourcount, sum(hundred) OVER (PARTITION BY two ORDER BY ten) AS twosum FROM tenk1 )sub WHERE total <> fourcount + twosum; SELECT avg(four) OVER (PARTITION BY four ORDER BY thousand / 100) FROM tenk1 WHERE unique2 < 10; SELECT ten, two, sum(hundred) AS gsum, sum(sum(hundred)) OVER win AS wsum FROM tenk1 GROUP BY ten, two WINDOW win AS (PARTITION BY two ORDER BY ten); -- more than one window with GROUP BY SELECT sum(salary), row_number() OVER (ORDER BY depname), sum(sum(salary)) OVER (ORDER BY depname DESC) FROM empsalary GROUP BY depname; -- identical windows with different names SELECT sum(salary) OVER w1, count(*) OVER w2 FROM empsalary WINDOW w1 AS (ORDER BY salary), w2 AS (ORDER BY salary); -- subplan SELECT lead(ten, (SELECT two FROM tenk1 WHERE s.unique2 = unique2)) OVER (PARTITION BY four ORDER BY ten) FROM tenk1 s WHERE unique2 < 10; -- empty table SELECT count(*) OVER (PARTITION BY four) FROM (SELECT * FROM tenk1 WHERE FALSE)s; -- mixture of agg/wfunc in the same window SELECT sum(salary) OVER w, rank() OVER w FROM empsalary WINDOW w AS (PARTITION BY depname ORDER BY salary DESC); -- strict aggs SELECT empno, depname, salary, bonus, depadj, MIN(bonus) OVER (ORDER BY empno), MAX(depadj) OVER () FROM( SELECT *, CASE WHEN enroll_date < '2008-01-01' THEN 2008 - extract(YEAR FROM enroll_date) END * 500 AS bonus, CASE WHEN AVG(salary) OVER (PARTITION BY depname) < salary THEN 200 END AS depadj FROM empsalary )s; -- window function over ungrouped agg over empty row set (bug before 9.1) SELECT SUM(COUNT(f1)) OVER () FROM int4_tbl WHERE f1=42; -- window function with ORDER BY an expression involving aggregates (9.1 bug) select ten, sum(unique1) + sum(unique2) as res, rank() over (order by sum(unique1) + sum(unique2)) as rank from tenk1 group by ten order by ten; -- window and aggregate with GROUP BY expression (9.2 bug) explain (costs off) select first_value(max(x)) over (), y from (select unique1 as x, ten+four as y from tenk1) ss group by y; -- window functions returning pass-by-ref values from different rows select x, lag(x, 1) ignore nulls over (order by x), lead(x, 3) ignore nulls over (order by x) from (select x::numeric as x from generate_series(1,10) x); -- test non-default frame specifications SELECT four, ten, sum(ten) over (partition by four order by ten), last_value(ten) ignore nulls over (partition by four order by ten) FROM (select distinct ten, four from tenk1) ss; SELECT four, ten, sum(ten) over (partition by four order by ten range between unbounded preceding and current row), last_value(ten) ignore nulls over (partition by four order by ten range between unbounded preceding and current row) FROM (select distinct ten, four from tenk1) ss; SELECT four, ten, sum(ten) over (partition by four order by ten range between unbounded preceding and unbounded following), last_value(ten) ignore nulls over (partition by four order by ten range between unbounded preceding and unbounded following) FROM (select distinct ten, four from tenk1) ss; SELECT four, ten/4 as two, sum(ten/4) over (partition by four order by ten/4 range between unbounded preceding and current row), last_value(ten/4) ignore nulls over (partition by four order by ten/4 range between unbounded preceding and current row) FROM (select distinct ten, four from tenk1) ss; SELECT four, ten/4 as two, sum(ten/4) over (partition by four order by ten/4 rows between unbounded preceding and current row), last_value(ten/4) ignore nulls over (partition by four order by ten/4 rows between unbounded preceding and current row) FROM (select distinct ten, four from tenk1) ss; SELECT sum(unique1) over (order by four range between current row and unbounded following), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (rows between current row and unbounded following), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (rows between 2 preceding and 2 following), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (rows between 2 preceding and 2 following exclude no others), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (rows between 2 preceding and 2 following exclude current row), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (rows between 2 preceding and 2 following exclude group), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (rows between 2 preceding and 2 following exclude ties), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT first_value(unique1) ignore nulls over (ORDER BY four rows between current row and 2 following exclude current row), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT first_value(unique1) ignore nulls over (ORDER BY four rows between current row and 2 following exclude group), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT first_value(unique1) ignore nulls over (ORDER BY four rows between current row and 2 following exclude ties), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT last_value(unique1) ignore nulls over (ORDER BY four rows between current row and 2 following exclude current row), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT last_value(unique1) ignore nulls over (ORDER BY four rows between current row and 2 following exclude group), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT last_value(unique1) ignore nulls over (ORDER BY four rows between current row and 2 following exclude ties), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (rows between 2 preceding and 1 preceding), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (rows between 1 following and 3 following), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (rows between unbounded preceding and 1 following), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (w range between current row and unbounded following), unique1, four FROM tenk1 WHERE unique1 < 10 WINDOW w AS (order by four); SELECT sum(unique1) over (w range between unbounded preceding and current row exclude current row), unique1, four FROM tenk1 WHERE unique1 < 10 WINDOW w AS (order by four); SELECT sum(unique1) over (w range between unbounded preceding and current row exclude group), unique1, four FROM tenk1 WHERE unique1 < 10 WINDOW w AS (order by four); SELECT sum(unique1) over (w range between unbounded preceding and current row exclude ties), unique1, four FROM tenk1 WHERE unique1 < 10 WINDOW w AS (order by four); SELECT first_value(unique1) ignore nulls over w, nth_value(unique1, 2) ignore nulls over w AS nth_2, last_value(unique1) ignore nulls over w, unique1, four FROM tenk1 WHERE unique1 < 10 WINDOW w AS (order by four range between current row and unbounded following); SELECT sum(unique1) over (order by unique1 rows (SELECT unique1 FROM tenk1 ORDER BY unique1 LIMIT 1) + 1 PRECEDING), unique1 FROM tenk1 WHERE unique1 < 10; CREATE TEMP VIEW v_window AS SELECT i, sum(i) over (order by i rows between 1 preceding and 1 following) as sum_rows FROM generate_series(1, 10) i; SELECT * FROM v_window; SELECT pg_get_viewdef('v_window'); CREATE OR REPLACE TEMP VIEW v_window AS SELECT i, sum(i) over (order by i rows between 1 preceding and 1 following exclude current row) as sum_rows FROM generate_series(1, 10) i; SELECT * FROM v_window; SELECT pg_get_viewdef('v_window'); CREATE OR REPLACE TEMP VIEW v_window AS SELECT i, sum(i) over (order by i rows between 1 preceding and 1 following exclude group) as sum_rows FROM generate_series(1, 10) i; SELECT * FROM v_window; SELECT pg_get_viewdef('v_window'); CREATE OR REPLACE TEMP VIEW v_window AS SELECT i, sum(i) over (order by i rows between 1 preceding and 1 following exclude ties) as sum_rows FROM generate_series(1, 10) i; SELECT * FROM v_window; SELECT pg_get_viewdef('v_window'); CREATE OR REPLACE TEMP VIEW v_window AS SELECT i, sum(i) over (order by i rows between 1 preceding and 1 following exclude no others) as sum_rows FROM generate_series(1, 10) i; SELECT * FROM v_window; SELECT pg_get_viewdef('v_window'); CREATE OR REPLACE TEMP VIEW v_window AS SELECT i, sum(i) over (order by i groups between 1 preceding and 1 following) as sum_rows FROM generate_series(1, 10)i; SELECT * FROM v_window; SELECT pg_get_viewdef('v_window'); DROP VIEW v_window; CREATE TEMP VIEW v_window AS SELECT i, min(i) over (order by i range between '1 day' preceding and '10 days' following) as min_i FROM generate_series(now(), now()+'100 days'::interval, '1 hour') i; SELECT pg_get_viewdef('v_window'); -- RANGE offset PRECEDING/FOLLOWING tests SELECT sum(unique1) over (order by four range between 2::int8 preceding and 1::int2 preceding), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (order by four desc range between 2::int8 preceding and 1::int2 preceding), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (order by four range between 2::int8 preceding and 1::int2 preceding exclude no others), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (order by four range between 2::int8 preceding and 1::int2 preceding exclude current row), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (order by four range between 2::int8 preceding and 1::int2 preceding exclude group), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (order by four range between 2::int8 preceding and 1::int2 preceding exclude ties), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (order by four range between 2::int8 preceding and 6::int2 following exclude ties), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (order by four range between 2::int8 preceding and 6::int2 following exclude group), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (partition by four order by unique1 range between 5::int8 preceding and 6::int2 following), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (partition by four order by unique1 range between 5::int8 preceding and 6::int2 following exclude current row),unique1, four FROM tenk1 WHERE unique1 < 10; select sum(salary) over (order by enroll_date range between '1 year'::interval preceding and '1 year'::interval following), salary, enroll_date from empsalary; select sum(salary) over (order by enroll_date desc range between '1 year'::interval preceding and '1 year'::interval following), salary, enroll_date from empsalary; select sum(salary) over (order by enroll_date desc range between '1 year'::interval following and '1 year'::interval following), salary, enroll_date from empsalary; select sum(salary) over (order by enroll_date range between '1 year'::interval preceding and '1 year'::interval following exclude current row), salary, enroll_date from empsalary; select sum(salary) over (order by enroll_date range between '1 year'::interval preceding and '1 year'::interval following exclude group), salary, enroll_date from empsalary; select sum(salary) over (order by enroll_date range between '1 year'::interval preceding and '1 year'::interval following exclude ties), salary, enroll_date from empsalary; select first_value(salary) ignore nulls over(order by salary range between 1000 preceding and 1000 following), lead(salary) ignore nulls over(order by salary range between 1000 preceding and 1000 following), nth_value(salary, 1) ignore nulls over(order by salary range between 1000 preceding and 1000 following), salary from empsalary; select last_value(salary) ignore nulls over(order by salary range between 1000 preceding and 1000 following), lag(salary) ignore nulls over(order by salary range between 1000 preceding and 1000 following), salary from empsalary; select first_value(salary) ignore nulls over(order by salary range between 1000 following and 3000 following exclude current row), lead(salary) ignore nulls over(order by salary range between 1000 following and 3000 following exclude ties), nth_value(salary, 1) ignore nulls over(order by salary range between 1000 following and 3000 following exclude ties), salary from empsalary; select last_value(salary) ignore nulls over(order by salary range between 1000 following and 3000 following exclude group), lag(salary) ignore nulls over(order by salary range between 1000 following and 3000 following exclude group), salary from empsalary; select first_value(salary) ignore nulls over(order by enroll_date range between unbounded preceding and '1 year'::intervalfollowing exclude ties), last_value(salary) ignore nulls over(order by enroll_date range between unbounded preceding and '1 year'::interval following), salary, enroll_date from empsalary; select first_value(salary) ignore nulls over(order by enroll_date range between unbounded preceding and '1 year'::intervalfollowing exclude ties), last_value(salary) ignore nulls over(order by enroll_date range between unbounded preceding and '1 year'::interval following exclude ties), salary, enroll_date from empsalary; select first_value(salary) ignore nulls over(order by enroll_date range between unbounded preceding and '1 year'::intervalfollowing exclude group), last_value(salary) ignore nulls over(order by enroll_date range between unbounded preceding and '1 year'::interval following exclude group), salary, enroll_date from empsalary; select first_value(salary) ignore nulls over(order by enroll_date range between unbounded preceding and '1 year'::intervalfollowing exclude current row), last_value(salary) ignore nulls over(order by enroll_date range between unbounded preceding and '1 year'::interval following exclude current row), salary, enroll_date from empsalary; -- RANGE offset PRECEDING/FOLLOWING with null values select x, y, first_value(y) ignore nulls over w, last_value(y) ignore nulls over w from (select x, x as y from generate_series(1,5) as x union all select null, 42 union all select null, 43) ss window w as (order by x asc nulls first range between 2 preceding and 2 following); select x, y, first_value(y) ignore nulls over w, last_value(y) ignore nulls over w from (select x, x as y from generate_series(1,5) as x union all select null, 42 union all select null, 43) ss window w as (order by x asc nulls last range between 2 preceding and 2 following); select x, y, first_value(y) ignore nulls over w, last_value(y) ignore nulls over w from (select x, x as y from generate_series(1,5) as x union all select null, 42 union all select null, 43) ss window w as (order by x desc nulls first range between 2 preceding and 2 following); select x, y, first_value(y) ignore nulls over w, last_value(y) ignore nulls over w from (select x, x as y from generate_series(1,5) as x union all select null, 42 union all select null, 43) ss window w as (order by x desc nulls last range between 2 preceding and 2 following); -- There is a syntactic ambiguity in the SQL standard. Since -- UNBOUNDED is a non-reserved word, it could be the name of a -- function parameter and be used as an expression. There is a -- grammar hack to resolve such cases as the keyword. The following -- tests record this behavior. CREATE FUNCTION unbounded_syntax_test1a(x int) RETURNS TABLE (a int, b int, c int) LANGUAGE SQL BEGIN ATOMIC SELECT sum(unique1) over (rows between x preceding and x following), unique1, four FROM tenk1 WHERE unique1 < 10; END; CREATE FUNCTION unbounded_syntax_test1b(x int) RETURNS TABLE (a int, b int, c int) LANGUAGE SQL AS $$ SELECT sum(unique1) over (rows between x preceding and x following), unique1, four FROM tenk1 WHERE unique1 < 10; $$; -- These will apply the argument to the window specification inside the function. SELECT * FROM unbounded_syntax_test1a(2); SELECT * FROM unbounded_syntax_test1b(2); CREATE FUNCTION unbounded_syntax_test2a(unbounded int) RETURNS TABLE (a int, b int, c int) LANGUAGE SQL BEGIN ATOMIC SELECT sum(unique1) over (rows between unbounded preceding and unbounded following), unique1, four FROM tenk1 WHERE unique1 < 10; END; CREATE FUNCTION unbounded_syntax_test2b(unbounded int) RETURNS TABLE (a int, b int, c int) LANGUAGE SQL AS $$ SELECT sum(unique1) over (rows between unbounded preceding and unbounded following), unique1, four FROM tenk1 WHERE unique1 < 10; $$; -- These will not apply the argument but instead treat UNBOUNDED as a keyword. SELECT * FROM unbounded_syntax_test2a(2); SELECT * FROM unbounded_syntax_test2b(2); DROP FUNCTION unbounded_syntax_test1a, unbounded_syntax_test1b, unbounded_syntax_test2a, unbounded_syntax_test2b; -- Other tests with token UNBOUNDED in potentially problematic position CREATE FUNCTION unbounded(x int) RETURNS int LANGUAGE SQL IMMUTABLE RETURN x; SELECT sum(unique1) over (rows between 1 preceding and 1 following), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (rows between unbounded(1) preceding and unbounded(1) following), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (rows between unbounded.x preceding and unbounded.x following), unique1, four FROM tenk1, (values (1)) as unbounded(x) WHERE unique1 < 10; DROP FUNCTION unbounded; -- Check overflow behavior for various integer sizes select x, last_value(x) ignore nulls over (order by x::smallint range between current row and 2147450884 following) from generate_series(32764, 32766) x; select x, last_value(x) ignore nulls over (order by x::smallint desc range between current row and 2147450885 following) from generate_series(-32766, -32764) x; select x, last_value(x) ignore nulls over (order by x range between current row and 4 following) from generate_series(2147483644, 2147483646) x; select x, last_value(x) ignore nulls over (order by x desc range between current row and 5 following) from generate_series(-2147483646, -2147483644) x; select x, last_value(x) ignore nulls over (order by x range between current row and 4 following) from generate_series(9223372036854775804, 9223372036854775806) x; select x, last_value(x) ignore nulls over (order by x desc range between current row and 5 following) from generate_series(-9223372036854775806, -9223372036854775804) x; -- Test in_range for other numeric datatypes create temp table numerics( id int, f_float4 float4, f_float8 float8, f_numeric numeric ); insert into numerics values (0, '-infinity', '-infinity', '-infinity'), (1, -3, -3, -3), (2, -1, -1, -1), (3, 0, 0, 0), (4, 1.1, 1.1, 1.1), (5, 1.12, 1.12, 1.12), (6, 2, 2, 2), (7, 100, 100, 100), (8, 'infinity', 'infinity', 'infinity'), (9, 'NaN', 'NaN', 'NaN'); select id, f_float4, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from numerics window w as (order by f_float4 range between 1 preceding and 1 following); select id, f_float4, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from numerics window w as (order by f_float4 range between 1 preceding and 1.1::float4 following); select id, f_float4, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from numerics window w as (order by f_float4 range between 'inf' preceding and 'inf' following); select id, f_float4, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from numerics window w as (order by f_float4 range between 'inf' preceding and 'inf' preceding); select id, f_float4, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from numerics window w as (order by f_float4 range between 'inf' following and 'inf' following); select id, f_float4, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from numerics window w as (order by f_float4 range between 1.1 preceding and 'NaN' following); -- error, NaN disallowed select id, f_float8, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from numerics window w as (order by f_float8 range between 1 preceding and 1 following); select id, f_float8, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from numerics window w as (order by f_float8 range between 1 preceding and 1.1::float8 following); select id, f_float8, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from numerics window w as (order by f_float8 range between 'inf' preceding and 'inf' following); select id, f_float8, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from numerics window w as (order by f_float8 range between 'inf' preceding and 'inf' preceding); select id, f_float8, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from numerics window w as (order by f_float8 range between 'inf' following and 'inf' following); select id, f_float8, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from numerics window w as (order by f_float8 range between 1.1 preceding and 'NaN' following); -- error, NaN disallowed select id, f_numeric, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from numerics window w as (order by f_numeric range between 1 preceding and 1 following); select id, f_numeric, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from numerics window w as (order by f_numeric range between 1 preceding and 1.1::numeric following); select id, f_numeric, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from numerics window w as (order by f_numeric range between 1 preceding and 1.1::float8 following); -- currently unsupported select id, f_numeric, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from numerics window w as (order by f_numeric range between 'inf' preceding and 'inf' following); select id, f_numeric, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from numerics window w as (order by f_numeric range between 'inf' preceding and 'inf' preceding); select id, f_numeric, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from numerics window w as (order by f_numeric range between 'inf' following and 'inf' following); select id, f_numeric, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from numerics window w as (order by f_numeric range between 1.1 preceding and 'NaN' following); -- error, NaN disallowed -- Test in_range for other datetime datatypes create temp table datetimes( id int, f_time time, f_timetz timetz, f_interval interval, f_timestamptz timestamptz, f_timestamp timestamp ); insert into datetimes values (0, '10:00', '10:00 BST', '-infinity', '-infinity', '-infinity'), (1, '11:00', '11:00 BST', '1 year', '2000-10-19 10:23:54+01', '2000-10-19 10:23:54'), (2, '12:00', '12:00 BST', '2 years', '2001-10-19 10:23:54+01', '2001-10-19 10:23:54'), (3, '13:00', '13:00 BST', '3 years', '2001-10-19 10:23:54+01', '2001-10-19 10:23:54'), (4, '14:00', '14:00 BST', '4 years', '2002-10-19 10:23:54+01', '2002-10-19 10:23:54'), (5, '15:00', '15:00 BST', '5 years', '2003-10-19 10:23:54+01', '2003-10-19 10:23:54'), (6, '15:00', '15:00 BST', '5 years', '2004-10-19 10:23:54+01', '2004-10-19 10:23:54'), (7, '17:00', '17:00 BST', '7 years', '2005-10-19 10:23:54+01', '2005-10-19 10:23:54'), (8, '18:00', '18:00 BST', '8 years', '2006-10-19 10:23:54+01', '2006-10-19 10:23:54'), (9, '19:00', '19:00 BST', '9 years', '2007-10-19 10:23:54+01', '2007-10-19 10:23:54'), (10, '20:00', '20:00 BST', '10 years', '2008-10-19 10:23:54+01', '2008-10-19 10:23:54'), (11, '21:00', '21:00 BST', 'infinity', 'infinity', 'infinity'); select id, f_time, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_time range between '70 min'::interval preceding and '2 hours'::interval following); select id, f_time, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_time desc range between '70 min' preceding and '2 hours' following); select id, f_time, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_time desc range between '-70 min' preceding and '2 hours' following); -- error, negative offset disallowed select id, f_time, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_time range between 'infinity'::interval preceding and 'infinity'::interval following); select id, f_time, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_time range between 'infinity'::interval preceding and 'infinity'::interval preceding); select id, f_time, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_time range between 'infinity'::interval following and 'infinity'::interval following); select id, f_time, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_time range between '-infinity'::interval following and 'infinity'::interval following); -- error, negative offset disallowed select id, f_timetz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_timetz range between '70 min'::interval preceding and '2 hours'::interval following); select id, f_timetz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_timetz desc range between '70 min' preceding and '2 hours' following); select id, f_timetz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_timetz desc range between '70 min' preceding and '-2 hours' following); -- error, negative offset disallowed select id, f_timetz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_timetz range between 'infinity'::interval preceding and 'infinity'::interval following); select id, f_timetz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_timetz range between 'infinity'::interval preceding and 'infinity'::interval preceding); select id, f_timetz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_timetz range between 'infinity'::interval following and 'infinity'::interval following); select id, f_timetz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_timetz range between 'infinity'::interval following and '-infinity'::interval following); -- error, negative offset disallowed select id, f_interval, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_interval range between '1 year'::interval preceding and '1 year'::interval following); select id, f_interval, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_interval desc range between '1 year' preceding and '1 year' following); select id, f_interval, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_interval desc range between '-1 year' preceding and '1 year' following); -- error, negative offset disallowed select id, f_interval, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_interval range between 'infinity'::interval preceding and 'infinity'::interval following); select id, f_interval, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_interval range between 'infinity'::interval preceding and 'infinity'::interval preceding); select id, f_interval, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_interval range between 'infinity'::interval following and 'infinity'::interval following); select id, f_interval, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_interval range between '-infinity'::interval following and 'infinity'::interval following); -- error, negative offset disallowed select id, f_timestamptz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_timestamptz range between '1 year'::interval preceding and '1 year'::interval following); select id, f_timestamptz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_timestamptz desc range between '1 year' preceding and '1 year' following); select id, f_timestamptz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_timestamptz desc range between '1 year' preceding and '-1 year' following); -- error, negative offset disallowed select id, f_timestamptz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_timestamptz range between 'infinity'::interval preceding and 'infinity'::interval following); select id, f_timestamptz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_timestamptz range between 'infinity'::interval preceding and 'infinity'::interval preceding); select id, f_timestamptz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_timestamptz range between 'infinity'::interval following and 'infinity'::interval following); select id, f_timestamptz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_timestamptz range between '-infinity'::interval following and 'infinity'::interval following); -- error, negative offset disallowed select id, f_timestamp, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_timestamp range between '1 year'::interval preceding and '1 year'::interval following); select id, f_timestamp, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_timestamp desc range between '1 year' preceding and '1 year' following); select id, f_timestamp, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_timestamp desc range between '-1 year' preceding and '1 year' following); -- error, negative offset disallowed select id, f_timestamp, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_timestamp range between 'infinity'::interval preceding and 'infinity'::interval following); select id, f_timestamp, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_timestamp range between 'infinity'::interval preceding and 'infinity'::interval preceding); select id, f_timestamp, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_timestamp range between 'infinity'::interval following and 'infinity'::interval following); select id, f_timestamp, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w from datetimes window w as (order by f_timestamp range between '-infinity'::interval following and 'infinity'::interval following); -- error, negative offset disallowed -- RANGE offset PRECEDING/FOLLOWING error cases select sum(salary) over (order by enroll_date, salary range between '1 year'::interval preceding and '2 years'::intervalfollowing exclude ties), salary, enroll_date from empsalary; select sum(salary) over (range between '1 year'::interval preceding and '2 years'::interval following exclude ties), salary, enroll_date from empsalary; select sum(salary) over (order by depname range between '1 year'::interval preceding and '2 years'::interval following exclude ties), salary, enroll_date from empsalary; select max(enroll_date) over (order by enroll_date range between 1 preceding and 2 following exclude ties), salary, enroll_date from empsalary; select max(enroll_date) over (order by salary range between -1 preceding and 2 following exclude ties), salary, enroll_date from empsalary; select max(enroll_date) over (order by salary range between 1 preceding and -2 following exclude ties), salary, enroll_date from empsalary; select max(enroll_date) over (order by salary range between '1 year'::interval preceding and '2 years'::interval following exclude ties), salary, enroll_date from empsalary; select max(enroll_date) over (order by enroll_date range between '1 year'::interval preceding and '-2 years'::interval following exclude ties), salary, enroll_date from empsalary; -- GROUPS tests SELECT sum(unique1) over (order by four groups between unbounded preceding and current row), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (order by four groups between unbounded preceding and unbounded following), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (order by four groups between current row and unbounded following), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (order by four groups between 1 preceding and unbounded following), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (order by four groups between 1 following and unbounded following), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (order by four groups between unbounded preceding and 2 following), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (order by four groups between 2 preceding and 1 preceding), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (order by four groups between 2 preceding and 1 following), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (order by four groups between 0 preceding and 0 following), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (order by four groups between 2 preceding and 1 following exclude current row), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (order by four groups between 2 preceding and 1 following exclude group), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (order by four groups between 2 preceding and 1 following exclude ties), unique1, four FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (partition by ten order by four groups between 0 preceding and 0 following),unique1, four, ten FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (partition by ten order by four groups between 0 preceding and 0 following exclude current row), unique1, four, ten FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (partition by ten order by four groups between 0 preceding and 0 following exclude group), unique1, four, ten FROM tenk1 WHERE unique1 < 10; SELECT sum(unique1) over (partition by ten order by four groups between 0 preceding and 0 following exclude ties), unique1, four, ten FROM tenk1 WHERE unique1 < 10; select first_value(salary) ignore nulls over(order by enroll_date groups between 1 preceding and 1 following), lead(salary) ignore nulls over(order by enroll_date groups between 1 preceding and 1 following), nth_value(salary, 1) ignore nulls over(order by enroll_date groups between 1 preceding and 1 following), salary, enroll_date from empsalary; select last_value(salary) ignore nulls over(order by enroll_date groups between 1 preceding and 1 following), lag(salary) ignore nulls over(order by enroll_date groups between 1 preceding and 1 following), salary, enroll_date from empsalary; select first_value(salary) ignore nulls over(order by enroll_date groups between 1 following and 3 following exclude current row), lead(salary) ignore nulls over(order by enroll_date groups between 1 following and 3 following exclude ties), nth_value(salary, 1) ignore nulls over(order by enroll_date groups between 1 following and 3 following exclude ties), salary, enroll_date from empsalary; select last_value(salary) ignore nulls over(order by enroll_date groups between 1 following and 3 following exclude group), lag(salary) ignore nulls over(order by enroll_date groups between 1 following and 3 following exclude group), salary, enroll_date from empsalary; -- Show differences in offset interpretation between ROWS, RANGE, and GROUPS WITH cte (x) AS ( SELECT * FROM generate_series(1, 35, 2) ) SELECT x, (sum(x) over w) FROM cte WINDOW w AS (ORDER BY x rows between 1 preceding and 1 following); WITH cte (x) AS ( SELECT * FROM generate_series(1, 35, 2) ) SELECT x, (sum(x) over w) FROM cte WINDOW w AS (ORDER BY x range between 1 preceding and 1 following); WITH cte (x) AS ( SELECT * FROM generate_series(1, 35, 2) ) SELECT x, (sum(x) over w) FROM cte WINDOW w AS (ORDER BY x groups between 1 preceding and 1 following); WITH cte (x) AS ( select 1 union all select 1 union all select 1 union all SELECT * FROM generate_series(5, 49, 2) ) SELECT x, (sum(x) over w) FROM cte WINDOW w AS (ORDER BY x rows between 1 preceding and 1 following); WITH cte (x) AS ( select 1 union all select 1 union all select 1 union all SELECT * FROM generate_series(5, 49, 2) ) SELECT x, (sum(x) over w) FROM cte WINDOW w AS (ORDER BY x range between 1 preceding and 1 following); WITH cte (x) AS ( select 1 union all select 1 union all select 1 union all SELECT * FROM generate_series(5, 49, 2) ) SELECT x, (sum(x) over w) FROM cte WINDOW w AS (ORDER BY x groups between 1 preceding and 1 following); -- with UNION SELECT count(*) OVER (PARTITION BY four) FROM (SELECT * FROM tenk1 UNION ALL SELECT * FROM tenk2)s LIMIT 0; -- check some degenerate cases create temp table t1 (f1 int, f2 int8); insert into t1 values (1,1),(1,2),(2,2); select f1, sum(f1) over (partition by f1 range between 1 preceding and 1 following) from t1 where f1 = f2; -- error, must have order by explain (costs off) select f1, sum(f1) over (partition by f1 order by f2 range between 1 preceding and 1 following) from t1 where f1 = f2; select f1, sum(f1) over (partition by f1 order by f2 range between 1 preceding and 1 following) from t1 where f1 = f2; select f1, sum(f1) over (partition by f1, f1 order by f2 range between 2 preceding and 1 preceding) from t1 where f1 = f2; select f1, sum(f1) over (partition by f1, f2 order by f2 range between 1 following and 2 following) from t1 where f1 = f2; select f1, sum(f1) over (partition by f1 groups between 1 preceding and 1 following) from t1 where f1 = f2; -- error, must have order by explain (costs off) select f1, sum(f1) over (partition by f1 order by f2 groups between 1 preceding and 1 following) from t1 where f1 = f2; select f1, sum(f1) over (partition by f1 order by f2 groups between 1 preceding and 1 following) from t1 where f1 = f2; select f1, sum(f1) over (partition by f1, f1 order by f2 groups between 2 preceding and 1 preceding) from t1 where f1 = f2; select f1, sum(f1) over (partition by f1, f2 order by f2 groups between 1 following and 2 following) from t1 where f1 = f2; -- ordering by a non-integer constant is allowed SELECT rank() OVER (ORDER BY length('abc')); -- can't order by another window function SELECT rank() OVER (ORDER BY rank() OVER (ORDER BY random())); -- some other errors SELECT * FROM empsalary WHERE row_number() OVER (ORDER BY salary) < 10; SELECT * FROM empsalary INNER JOIN tenk1 ON row_number() OVER (ORDER BY salary) < 10; SELECT rank() OVER (ORDER BY 1), count(*) FROM empsalary GROUP BY 1; SELECT * FROM rank() OVER (ORDER BY random()); DELETE FROM empsalary WHERE (rank() OVER (ORDER BY random())) > 10; DELETE FROM empsalary RETURNING rank() OVER (ORDER BY random()); SELECT count(*) OVER w FROM tenk1 WINDOW w AS (ORDER BY unique1), w AS (ORDER BY unique1); SELECT rank() OVER (PARTITION BY four, ORDER BY ten) FROM tenk1; SELECT count() OVER () FROM tenk1; SELECT generate_series(1, 100) OVER () FROM empsalary; SELECT ntile(0) OVER (ORDER BY ten), ten, four FROM tenk1; SELECT nth_value(four, 0) OVER (ORDER BY ten), ten, four FROM tenk1; -- filter SELECT sum(salary), row_number() OVER (ORDER BY depname), sum( sum(salary) FILTER (WHERE enroll_date > '2007-01-01') ) FILTER (WHERE depname <> 'sales') OVER (ORDER BY depname DESC) AS "filtered_sum", depname FROM empsalary GROUP BY depname; -- -- Test SupportRequestOptimizeWindowClause's ability to de-duplicate -- WindowClauses -- -- Ensure WindowClause frameOptions are changed so that only a single -- WindowAgg exists in the plan. EXPLAIN (COSTS OFF) SELECT empno, depname, row_number() OVER (PARTITION BY depname ORDER BY enroll_date) rn, rank() OVER (PARTITION BY depname ORDER BY enroll_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) rnk, dense_rank() OVER (PARTITION BY depname ORDER BY enroll_date RANGE BETWEEN CURRENT ROW AND CURRENT ROW) drnk, ntile(10) OVER (PARTITION BY depname ORDER BY enroll_date RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) nt, percent_rank() OVER (PARTITION BY depname ORDER BY enroll_date ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) pr, cume_dist() OVER (PARTITION BY depname ORDER BY enroll_date RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) cd FROM empsalary; -- Ensure WindowFuncs which cannot support their WindowClause's frameOptions -- being changed are untouched EXPLAIN (COSTS OFF, VERBOSE) SELECT empno, depname, row_number() OVER (PARTITION BY depname ORDER BY enroll_date) rn, rank() OVER (PARTITION BY depname ORDER BY enroll_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) rnk, count(*) OVER (PARTITION BY depname ORDER BY enroll_date RANGE BETWEEN CURRENT ROW AND CURRENT ROW) cnt FROM empsalary; -- Ensure the above query gives us the expected results SELECT empno, depname, row_number() OVER (PARTITION BY depname ORDER BY enroll_date) rn, rank() OVER (PARTITION BY depname ORDER BY enroll_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) rnk, count(*) OVER (PARTITION BY depname ORDER BY enroll_date RANGE BETWEEN CURRENT ROW AND CURRENT ROW) cnt FROM empsalary; -- Test pushdown of quals into a subquery containing window functions -- pushdown is safe because all PARTITION BY clauses include depname: EXPLAIN (COSTS OFF) SELECT * FROM (SELECT depname, sum(salary) OVER (PARTITION BY depname) depsalary, min(salary) OVER (PARTITION BY depname || 'A', depname) depminsalary FROM empsalary) emp WHERE depname = 'sales'; -- pushdown is unsafe because there's a PARTITION BY clause without depname: EXPLAIN (COSTS OFF) SELECT * FROM (SELECT depname, sum(salary) OVER (PARTITION BY enroll_date) enroll_salary, min(salary) OVER (PARTITION BY depname) depminsalary FROM empsalary) emp WHERE depname = 'sales'; -- Test window function run conditions are properly pushed down into the -- WindowAgg EXPLAIN (COSTS OFF) SELECT * FROM (SELECT empno, row_number() OVER (ORDER BY empno) rn FROM empsalary) emp WHERE rn < 3; -- The following 3 statements should result the same result. SELECT * FROM (SELECT empno, row_number() OVER (ORDER BY empno) rn FROM empsalary) emp WHERE rn < 3; SELECT * FROM (SELECT empno, row_number() OVER (ORDER BY empno) rn FROM empsalary) emp WHERE 3 > rn; SELECT * FROM (SELECT empno, row_number() OVER (ORDER BY empno) rn FROM empsalary) emp WHERE 2 >= rn; -- Ensure r <= 3 is pushed down into the run condition of the window agg EXPLAIN (COSTS OFF) SELECT * FROM (SELECT empno, salary, rank() OVER (ORDER BY salary DESC) r FROM empsalary) emp WHERE r <= 3; SELECT * FROM (SELECT empno, salary, rank() OVER (ORDER BY salary DESC) r FROM empsalary) emp WHERE r <= 3; -- Ensure dr = 1 is converted to dr <= 1 to get all rows leading up to dr = 1 EXPLAIN (COSTS OFF) SELECT * FROM (SELECT empno, salary, dense_rank() OVER (ORDER BY salary DESC) dr FROM empsalary) emp WHERE dr = 1; SELECT * FROM (SELECT empno, salary, dense_rank() OVER (ORDER BY salary DESC) dr FROM empsalary) emp WHERE dr = 1; -- Check COUNT() and COUNT(*) EXPLAIN (COSTS OFF) SELECT * FROM (SELECT empno, salary, count(*) OVER (ORDER BY salary DESC) c FROM empsalary) emp WHERE c <= 3; SELECT * FROM (SELECT empno, salary, count(*) OVER (ORDER BY salary DESC) c FROM empsalary) emp WHERE c <= 3; EXPLAIN (COSTS OFF) SELECT * FROM (SELECT empno, salary, count(empno) OVER (ORDER BY salary DESC) c FROM empsalary) emp WHERE c <= 3; SELECT * FROM (SELECT empno, salary, count(empno) OVER (ORDER BY salary DESC) c FROM empsalary) emp WHERE c <= 3; EXPLAIN (COSTS OFF) SELECT * FROM (SELECT empno, salary, count(*) OVER (ORDER BY salary DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) c FROM empsalary) emp WHERE c >= 3; EXPLAIN (COSTS OFF) SELECT * FROM (SELECT empno, salary, count(*) OVER () c FROM empsalary) emp WHERE 11 <= c; EXPLAIN (COSTS OFF) SELECT * FROM (SELECT empno, salary, count(*) OVER (ORDER BY salary DESC) c, dense_rank() OVER (ORDER BY salary DESC) dr FROM empsalary) emp WHERE dr = 1; -- Ensure we get a run condition when there's a PARTITION BY clause EXPLAIN (COSTS OFF) SELECT * FROM (SELECT empno, depname, row_number() OVER (PARTITION BY depname ORDER BY empno) rn FROM empsalary) emp WHERE rn < 3; -- and ensure we get the correct results from the above plan SELECT * FROM (SELECT empno, depname, row_number() OVER (PARTITION BY depname ORDER BY empno) rn FROM empsalary) emp WHERE rn < 3; -- ensure that "unused" subquery columns are not removed when the column only -- exists in the run condition EXPLAIN (COSTS OFF) SELECT empno, depname FROM (SELECT empno, depname, row_number() OVER (PARTITION BY depname ORDER BY empno) rn FROM empsalary) emp WHERE rn < 3; -- likewise with count(empno) instead of row_number() EXPLAIN (COSTS OFF) SELECT * FROM (SELECT empno, depname, salary, count(empno) OVER (PARTITION BY depname ORDER BY salary DESC) c FROM empsalary) emp WHERE c <= 3; -- and again, check the results are what we expect. SELECT * FROM (SELECT empno, depname, salary, count(empno) OVER (PARTITION BY depname ORDER BY salary DESC) c FROM empsalary) emp WHERE c <= 3; -- Ensure we get the correct run condition when the window function is both -- monotonically increasing and decreasing. EXPLAIN (COSTS OFF) SELECT * FROM (SELECT empno, depname, salary, count(empno) OVER () c FROM empsalary) emp WHERE c = 1; -- Try another case with a WindowFunc with a byref return type SELECT * FROM (SELECT row_number() OVER (PARTITION BY salary) AS rn, lead(depname) OVER (PARTITION BY salary) || ' Department' AS n_dep FROM empsalary) emp WHERE rn < 1; -- Some more complex cases with multiple window clauses EXPLAIN (COSTS OFF) SELECT * FROM (SELECT *, count(salary) OVER (PARTITION BY depname || '') c1, -- w1 row_number() OVER (PARTITION BY depname) rn, -- w2 count(*) OVER (PARTITION BY depname) c2, -- w2 count(*) OVER (PARTITION BY '' || depname) c3, -- w3 ntile(2) OVER (PARTITION BY depname) nt -- w2 FROM empsalary ) e WHERE rn <= 1 AND c1 <= 3 AND nt < 2; -- Ensure we correctly filter out all of the run conditions from each window SELECT * FROM (SELECT *, count(salary) OVER (PARTITION BY depname || '') c1, -- w1 row_number() OVER (PARTITION BY depname) rn, -- w2 count(*) OVER (PARTITION BY depname) c2, -- w2 count(*) OVER (PARTITION BY '' || depname) c3, -- w3 ntile(2) OVER (PARTITION BY depname) nt -- w2 FROM empsalary ) e WHERE rn <= 1 AND c1 <= 3 AND nt < 2; -- Ensure we remove references to reduced outer joins as nulling rels in run -- conditions EXPLAIN (COSTS OFF) SELECT 1 FROM (SELECT ntile(e2.salary) OVER (PARTITION BY e1.depname) AS c FROM empsalary e1 LEFT JOIN empsalary e2 ON TRUE WHERE e1.empno = e2.empno) s WHERE s.c = 1; -- Ensure the run condition optimization is used in cases where the WindowFunc -- has a Var from another query level EXPLAIN (COSTS OFF) SELECT 1 FROM (SELECT ntile(s1.x) OVER () AS c FROM (SELECT (SELECT 1) AS x) AS s1) s WHERE s.c = 1; -- Tests to ensure we don't push down the run condition when it's not valid to -- do so. -- Ensure we don't push down when the frame options show that the window -- function is not monotonically increasing EXPLAIN (COSTS OFF) SELECT * FROM (SELECT empno, salary, count(*) OVER (ORDER BY salary DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) c FROM empsalary) emp WHERE c <= 3; -- Ensure we don't push down when the window function's monotonic properties -- don't match that of the clauses. EXPLAIN (COSTS OFF) SELECT * FROM (SELECT empno, salary, count(*) OVER (ORDER BY salary) c FROM empsalary) emp WHERE 3 <= c; -- Ensure we don't use a run condition when there's a volatile function in the -- WindowFunc EXPLAIN (COSTS OFF) SELECT * FROM (SELECT empno, salary, count(random()) OVER (ORDER BY empno DESC) c FROM empsalary) emp WHERE c = 1; -- Ensure we don't use a run condition when the WindowFunc contains subplans EXPLAIN (COSTS OFF) SELECT * FROM (SELECT empno, salary, count((SELECT 1)) OVER (ORDER BY empno DESC) c FROM empsalary) emp WHERE c = 1; -- Test Sort node collapsing EXPLAIN (COSTS OFF) SELECT * FROM (SELECT depname, sum(salary) OVER (PARTITION BY depname order by empno) depsalary, min(salary) OVER (PARTITION BY depname, empno order by enroll_date) depminsalary FROM empsalary) emp WHERE depname = 'sales'; -- Ensure that the evaluation order of the WindowAggs results in the WindowAgg -- with the same sort order that's required by the ORDER BY is evaluated last. EXPLAIN (COSTS OFF) SELECT empno, enroll_date, depname, sum(salary) OVER (PARTITION BY depname order by empno) depsalary, min(salary) OVER (PARTITION BY depname order by enroll_date) depminsalary FROM empsalary ORDER BY depname, empno; -- As above, but with an adjusted ORDER BY to ensure the above plan didn't -- perform only 2 sorts by accident. EXPLAIN (COSTS OFF) SELECT empno, enroll_date, depname, sum(salary) OVER (PARTITION BY depname order by empno) depsalary, min(salary) OVER (PARTITION BY depname order by enroll_date) depminsalary FROM empsalary ORDER BY depname, enroll_date; SET enable_hashagg TO off; -- Ensure we don't get a sort for both DISTINCT and ORDER BY. We expect the -- sort for the DISTINCT to provide presorted input for the ORDER BY. EXPLAIN (COSTS OFF) SELECT DISTINCT empno, enroll_date, depname, sum(salary) OVER (PARTITION BY depname order by empno) depsalary, min(salary) OVER (PARTITION BY depname order by enroll_date) depminsalary FROM empsalary ORDER BY depname, enroll_date; -- As above but adjust the ORDER BY clause to help ensure the plan with the -- minimum amount of sorting wasn't a fluke. EXPLAIN (COSTS OFF) SELECT DISTINCT empno, enroll_date, depname, sum(salary) OVER (PARTITION BY depname order by empno) depsalary, min(salary) OVER (PARTITION BY depname order by enroll_date) depminsalary FROM empsalary ORDER BY depname, empno; RESET enable_hashagg; -- Test Sort node reordering EXPLAIN (COSTS OFF) SELECT lead(1) OVER (PARTITION BY depname ORDER BY salary, enroll_date), lag(1) OVER (PARTITION BY depname ORDER BY salary,enroll_date,empno) FROM empsalary; -- Test incremental sorting EXPLAIN (COSTS OFF) SELECT * FROM (SELECT depname, empno, salary, enroll_date, row_number() OVER (PARTITION BY depname ORDER BY enroll_date) AS first_emp, row_number() OVER (PARTITION BY depname ORDER BY enroll_date DESC) AS last_emp FROM empsalary) emp WHERE first_emp = 1 OR last_emp = 1; SELECT * FROM (SELECT depname, empno, salary, enroll_date, row_number() OVER (PARTITION BY depname ORDER BY enroll_date) AS first_emp, row_number() OVER (PARTITION BY depname ORDER BY enroll_date DESC) AS last_emp FROM empsalary) emp WHERE first_emp = 1 OR last_emp = 1; -- cleanup DROP TABLE empsalary; -- test user-defined window function with named args and default args CREATE FUNCTION nth_value_def(val anyelement, n integer = 1) RETURNS anyelement LANGUAGE internal WINDOW IMMUTABLE STRICT AS 'window_nth_value'; SELECT nth_value_def(n := 2, val := ten) OVER (PARTITION BY four), ten, four FROM (SELECT * FROM tenk1 WHERE unique2 < 10 ORDER BY four, ten) s; SELECT nth_value_def(ten) OVER (PARTITION BY four), ten, four FROM (SELECT * FROM tenk1 WHERE unique2 < 10 ORDER BY four, ten) s; -- -- Test the basic moving-aggregate machinery -- -- create aggregates that record the series of transform calls (these are -- intentionally not true inverses) CREATE FUNCTION logging_sfunc_nonstrict(text, anyelement) RETURNS text AS $$ SELECT COALESCE($1, '') || '*' || quote_nullable($2) $$ LANGUAGE SQL IMMUTABLE; CREATE FUNCTION logging_msfunc_nonstrict(text, anyelement) RETURNS text AS $$ SELECT COALESCE($1, '') || '+' || quote_nullable($2) $$ LANGUAGE SQL IMMUTABLE; CREATE FUNCTION logging_minvfunc_nonstrict(text, anyelement) RETURNS text AS $$ SELECT $1 || '-' || quote_nullable($2) $$ LANGUAGE SQL IMMUTABLE; CREATE AGGREGATE logging_agg_nonstrict (anyelement) ( stype = text, sfunc = logging_sfunc_nonstrict, mstype = text, msfunc = logging_msfunc_nonstrict, minvfunc = logging_minvfunc_nonstrict ); CREATE AGGREGATE logging_agg_nonstrict_initcond (anyelement) ( stype = text, sfunc = logging_sfunc_nonstrict, mstype = text, msfunc = logging_msfunc_nonstrict, minvfunc = logging_minvfunc_nonstrict, initcond = 'I', minitcond = 'MI' ); CREATE FUNCTION logging_sfunc_strict(text, anyelement) RETURNS text AS $$ SELECT $1 || '*' || quote_nullable($2) $$ LANGUAGE SQL STRICT IMMUTABLE; CREATE FUNCTION logging_msfunc_strict(text, anyelement) RETURNS text AS $$ SELECT $1 || '+' || quote_nullable($2) $$ LANGUAGE SQL STRICT IMMUTABLE; CREATE FUNCTION logging_minvfunc_strict(text, anyelement) RETURNS text AS $$ SELECT $1 || '-' || quote_nullable($2) $$ LANGUAGE SQL STRICT IMMUTABLE; CREATE AGGREGATE logging_agg_strict (text) ( stype = text, sfunc = logging_sfunc_strict, mstype = text, msfunc = logging_msfunc_strict, minvfunc = logging_minvfunc_strict ); CREATE AGGREGATE logging_agg_strict_initcond (anyelement) ( stype = text, sfunc = logging_sfunc_strict, mstype = text, msfunc = logging_msfunc_strict, minvfunc = logging_minvfunc_strict, initcond = 'I', minitcond = 'MI' ); -- test strict and non-strict cases SELECT p::text || ',' || i::text || ':' || COALESCE(v::text, 'NULL') AS row, logging_agg_nonstrict(v) over wnd as nstrict, logging_agg_nonstrict_initcond(v) over wnd as nstrict_init, logging_agg_strict(v::text) over wnd as strict, logging_agg_strict_initcond(v) over wnd as strict_init FROM (VALUES (1, 1, NULL), (1, 2, 'a'), (1, 3, 'b'), (1, 4, NULL), (1, 5, NULL), (1, 6, 'c'), (2, 1, NULL), (2, 2, 'x'), (3, 1, 'z') ) AS t(p, i, v) WINDOW wnd AS (PARTITION BY P ORDER BY i ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) ORDER BY p, i; -- and again, but with filter SELECT p::text || ',' || i::text || ':' || CASE WHEN f THEN COALESCE(v::text, 'NULL') ELSE '-' END as row, logging_agg_nonstrict(v) filter(where f) over wnd as nstrict_filt, logging_agg_nonstrict_initcond(v) filter(where f) over wnd as nstrict_init_filt, logging_agg_strict(v::text) filter(where f) over wnd as strict_filt, logging_agg_strict_initcond(v) filter(where f) over wnd as strict_init_filt FROM (VALUES (1, 1, true, NULL), (1, 2, false, 'a'), (1, 3, true, 'b'), (1, 4, false, NULL), (1, 5, false, NULL), (1, 6, false, 'c'), (2, 1, false, NULL), (2, 2, true, 'x'), (3, 1, true, 'z') ) AS t(p, i, f, v) WINDOW wnd AS (PARTITION BY p ORDER BY i ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) ORDER BY p, i; -- test that volatile arguments disable moving-aggregate mode SELECT i::text || ':' || COALESCE(v::text, 'NULL') as row, logging_agg_strict(v::text) over wnd as inverse, logging_agg_strict(v::text || CASE WHEN random() < 0 then '?' ELSE '' END) over wnd as noinverse FROM (VALUES (1, 'a'), (2, 'b'), (3, 'c') ) AS t(i, v) WINDOW wnd AS (ORDER BY i ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) ORDER BY i; SELECT i::text || ':' || COALESCE(v::text, 'NULL') as row, logging_agg_strict(v::text) filter(where true) over wnd as inverse, logging_agg_strict(v::text) filter(where random() >= 0) over wnd as noinverse FROM (VALUES (1, 'a'), (2, 'b'), (3, 'c') ) AS t(i, v) WINDOW wnd AS (ORDER BY i ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) ORDER BY i; -- test that non-overlapping windows don't use inverse transitions SELECT logging_agg_strict(v::text) OVER wnd FROM (VALUES (1, 'a'), (2, 'b'), (3, 'c') ) AS t(i, v) WINDOW wnd AS (ORDER BY i ROWS BETWEEN CURRENT ROW AND CURRENT ROW) ORDER BY i; -- test that returning NULL from the inverse transition functions -- restarts the aggregation from scratch. The second aggregate is supposed -- to test cases where only some aggregates restart, the third one checks -- that one aggregate restarting doesn't cause others to restart. CREATE FUNCTION sum_int_randrestart_minvfunc(int4, int4) RETURNS int4 AS $$ SELECT CASE WHEN random() < 0.2 THEN NULL ELSE $1 - $2 END $$ LANGUAGE SQL STRICT; CREATE AGGREGATE sum_int_randomrestart (int4) ( stype = int4, sfunc = int4pl, mstype = int4, msfunc = int4pl, minvfunc = sum_int_randrestart_minvfunc ); WITH vs AS ( SELECT i, (random() * 100)::int4 AS v FROM generate_series(1, 100) AS i ), sum_following AS ( SELECT i, SUM(v) OVER (ORDER BY i DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS s FROM vs ) SELECT DISTINCT sum_following.s = sum_int_randomrestart(v) OVER fwd AS eq1, -sum_following.s = sum_int_randomrestart(-v) OVER fwd AS eq2, 100*3+(vs.i-1)*3 = length(logging_agg_nonstrict(''::text) OVER fwd) AS eq3 FROM vs JOIN sum_following ON sum_following.i = vs.i WINDOW fwd AS ( ORDER BY vs.i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ); -- -- Test various built-in aggregates that have moving-aggregate support -- -- test inverse transition functions handle NULLs properly SELECT i,AVG(v::bigint) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,1),(2,2),(3,NULL),(4,NULL)) t(i,v); SELECT i,AVG(v::int) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,1),(2,2),(3,NULL),(4,NULL)) t(i,v); SELECT i,AVG(v::smallint) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,1),(2,2),(3,NULL),(4,NULL)) t(i,v); SELECT i,AVG(v::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,1.5),(2,2.5),(3,NULL),(4,NULL)) t(i,v); SELECT i,AVG(v::interval) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,'1 sec'),(2,'2 sec'),(3,NULL),(4,NULL)) t(i,v); -- moving aggregates over infinite intervals SELECT x ,avg(x) OVER(ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING ) as curr_next_avg ,avg(x) OVER(ROWS BETWEEN 1 PRECEDING AND CURRENT ROW ) as prev_curr_avg ,sum(x) OVER(ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING ) as curr_next_sum ,sum(x) OVER(ROWS BETWEEN 1 PRECEDING AND CURRENT ROW ) as prev_curr_sum FROM (VALUES (NULL::interval), ('infinity'::interval), ('-2147483648 days -2147483648 months -9223372036854775807 usecs'), -- extreme interval value ('-infinity'::interval), ('2147483647 days 2147483647 months 9223372036854775806 usecs'), -- extreme interval value ('infinity'::interval), ('6 days'::interval), ('7 days'::interval), (NULL::interval), ('-infinity'::interval)) v(x); --should fail. SELECT x, avg(x) OVER(ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) FROM (VALUES (NULL::interval), ('3 days'::interval), ('infinity'::timestamptz - now()), ('6 days'::interval), ('-infinity'::interval)) v(x); --should fail. SELECT x, sum(x) OVER(ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) FROM (VALUES (NULL::interval), ('3 days'::interval), ('infinity'::timestamptz - now()), ('6 days'::interval), ('-infinity'::interval)) v(x); SELECT i,SUM(v::smallint) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,1),(2,2),(3,NULL),(4,NULL)) t(i,v); SELECT i,SUM(v::int) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,1),(2,2),(3,NULL),(4,NULL)) t(i,v); SELECT i,SUM(v::bigint) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,1),(2,2),(3,NULL),(4,NULL)) t(i,v); SELECT i,SUM(v::money) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,'1.10'),(2,'2.20'),(3,NULL),(4,NULL)) t(i,v); SELECT i,SUM(v::interval) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,'1 sec'),(2,'2 sec'),(3,NULL),(4,NULL)) t(i,v); SELECT i,SUM(v::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,1.1),(2,2.2),(3,NULL),(4,NULL)) t(i,v); SELECT SUM(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,1.01),(2,2),(3,3)) v(i,n); SELECT i,COUNT(v) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,1),(2,2),(3,NULL),(4,NULL)) t(i,v); SELECT i,COUNT(*) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,1),(2,2),(3,NULL),(4,NULL)) t(i,v); SELECT VAR_POP(n::bigint) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,600),(2,470),(3,170),(4,430),(5,300)) r(i,n); SELECT VAR_POP(n::int) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,600),(2,470),(3,170),(4,430),(5,300)) r(i,n); SELECT VAR_POP(n::smallint) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,600),(2,470),(3,170),(4,430),(5,300)) r(i,n); SELECT VAR_POP(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,600),(2,470),(3,170),(4,430),(5,300)) r(i,n); SELECT VAR_SAMP(n::bigint) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,600),(2,470),(3,170),(4,430),(5,300)) r(i,n); SELECT VAR_SAMP(n::int) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,600),(2,470),(3,170),(4,430),(5,300)) r(i,n); SELECT VAR_SAMP(n::smallint) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,600),(2,470),(3,170),(4,430),(5,300)) r(i,n); SELECT VAR_SAMP(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,600),(2,470),(3,170),(4,430),(5,300)) r(i,n); SELECT VARIANCE(n::bigint) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,600),(2,470),(3,170),(4,430),(5,300)) r(i,n); SELECT VARIANCE(n::int) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,600),(2,470),(3,170),(4,430),(5,300)) r(i,n); SELECT VARIANCE(n::smallint) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,600),(2,470),(3,170),(4,430),(5,300)) r(i,n); SELECT VARIANCE(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,600),(2,470),(3,170),(4,430),(5,300)) r(i,n); SELECT STDDEV_POP(n::bigint) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,NULL),(2,600),(3,470),(4,170),(5,430),(6,300)) r(i,n); SELECT STDDEV_POP(n::int) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,NULL),(2,600),(3,470),(4,170),(5,430),(6,300)) r(i,n); SELECT STDDEV_POP(n::smallint) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,NULL),(2,600),(3,470),(4,170),(5,430),(6,300)) r(i,n); SELECT STDDEV_POP(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,NULL),(2,600),(3,470),(4,170),(5,430),(6,300)) r(i,n); SELECT STDDEV_SAMP(n::bigint) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,NULL),(2,600),(3,470),(4,170),(5,430),(6,300)) r(i,n); SELECT STDDEV_SAMP(n::int) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,NULL),(2,600),(3,470),(4,170),(5,430),(6,300)) r(i,n); SELECT STDDEV_SAMP(n::smallint) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,NULL),(2,600),(3,470),(4,170),(5,430),(6,300)) r(i,n); SELECT STDDEV_SAMP(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,NULL),(2,600),(3,470),(4,170),(5,430),(6,300)) r(i,n); SELECT STDDEV(n::bigint) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(0,NULL),(1,600),(2,470),(3,170),(4,430),(5,300)) r(i,n); SELECT STDDEV(n::int) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(0,NULL),(1,600),(2,470),(3,170),(4,430),(5,300)) r(i,n); SELECT STDDEV(n::smallint) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(0,NULL),(1,600),(2,470),(3,170),(4,430),(5,300)) r(i,n); SELECT STDDEV(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(0,NULL),(1,600),(2,470),(3,170),(4,430),(5,300)) r(i,n); -- test that inverse transition functions work with various frame options SELECT i,SUM(v::int) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND CURRENT ROW) FROM (VALUES(1,1),(2,2),(3,NULL),(4,NULL)) t(i,v); SELECT i,SUM(v::int) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) FROM (VALUES(1,1),(2,2),(3,NULL),(4,NULL)) t(i,v); SELECT i,SUM(v::int) OVER (ORDER BY i ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM (VALUES(1,1),(2,2),(3,3),(4,4)) t(i,v); -- ensure aggregate over numeric properly recovers from NaN values SELECT a, b, SUM(b) OVER(ORDER BY A ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) FROM (VALUES(1,1::numeric),(2,2),(3,'NaN'),(4,3),(5,4)) t(a,b); -- It might be tempting for someone to add an inverse trans function for -- float and double precision. This should not be done as it can give incorrect -- results. This test should fail if anyone ever does this without thinking too -- hard about it. SELECT to_char(SUM(n::float8) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING),'999999999999999999999D9') FROM (VALUES(1,1e20),(2,1)) n(i,n); SELECT i, b, bool_and(b) OVER w, bool_or(b) OVER w FROM (VALUES (1,true), (2,true), (3,false), (4,false), (5,true)) v(i,b) WINDOW w AS (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING); -- -- Test WindowAgg costing takes into account the number of rows that need to -- be fetched before the first row can be output. -- -- Ensure we get a cheap start up plan as the WindowAgg can output the first -- row after reading 1 row from the join. EXPLAIN (COSTS OFF) SELECT COUNT(*) OVER (ORDER BY t1.unique1) FROM tenk1 t1 INNER JOIN tenk1 t2 ON t1.unique1 = t2.tenthous LIMIT 1; -- Ensure we get a cheap total plan. Lack of ORDER BY in the WindowClause -- means that all rows must be read from the join, so a cheap startup plan -- isn't a good choice. EXPLAIN (COSTS OFF) SELECT COUNT(*) OVER () FROM tenk1 t1 INNER JOIN tenk1 t2 ON t1.unique1 = t2.tenthous WHERE t2.two = 1 LIMIT 1; -- Ensure we get a cheap total plan. This time use UNBOUNDED FOLLOWING, which -- needs to read all join rows to output the first WindowAgg row. EXPLAIN (COSTS OFF) SELECT COUNT(*) OVER (ORDER BY t1.unique1 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) FROM tenk1 t1 INNER JOIN tenk1 t2 ON t1.unique1 = t2.tenthous LIMIT 1; -- Ensure we get a cheap total plan. This time use 10000 FOLLOWING so we need -- to read all join rows. EXPLAIN (COSTS OFF) SELECT COUNT(*) OVER (ORDER BY t1.unique1 ROWS BETWEEN UNBOUNDED PRECEDING AND 10000 FOLLOWING) FROM tenk1 t1 INNER JOIN tenk1 t2 ON t1.unique1 = t2.tenthous LIMIT 1; -- Tests for problems with failure to walk or mutate expressions -- within window frame clauses. -- test walker (fails with collation error if expressions are not walked) SELECT array_agg(i) OVER w FROM generate_series(1,5) i WINDOW w AS (ORDER BY i ROWS BETWEEN (('foo' < 'foobar')::integer) PRECEDING AND CURRENT ROW); -- test mutator (fails when inlined if expressions are not mutated) CREATE FUNCTION pg_temp.f(group_size BIGINT) RETURNS SETOF integer[] AS $$ SELECT array_agg(s) OVER w FROM generate_series(1,5) s WINDOW w AS (ORDER BY s ROWS BETWEEN CURRENT ROW AND GROUP_SIZE FOLLOWING) $$ LANGUAGE SQL STABLE; EXPLAIN (costs off) SELECT * FROM pg_temp.f(2); SELECT * FROM pg_temp.f(2); -- IGNORE NULLS tests CREATE TEMPORARY TABLE planets ( name text, distance text, orbit integer ); INSERT INTO planets VALUES ('mercury', 'close', 88), ('venus', 'close', 224), ('earth', 'close', NULL), ('mars', 'close', NULL), ('jupiter', 'close', 4332), ('saturn', 'far', 24491), ('uranus', 'far', NULL), ('neptune', 'far', 60182), ('pluto', 'far', 90560), ('xyzzy', 'far', NULL); -- test ruleutils CREATE VIEW planets_view AS SELECT name, orbit, lag(orbit) OVER w AS lag, lag(orbit) RESPECT NULLS OVER w AS lag_respect, lag(orbit) IGNORE NULLS OVER w AS lag_ignore FROM planets WINDOW w AS (ORDER BY name) ; SELECT pg_get_viewdef('planets_view'); -- lag SELECT name, orbit, lag(orbit) OVER w AS lag, lag(orbit) RESPECT NULLS OVER w AS lag_respect, lag(orbit) IGNORE NULLS OVER w AS lag_ignore FROM planets WINDOW w AS (ORDER BY name) ; -- lead SELECT name, orbit, lead(orbit) OVER w AS lead, lead(orbit) RESPECT NULLS OVER w AS lead_respect, lead(orbit) IGNORE NULLS OVER w AS lead_ignore FROM planets WINDOW w AS (ORDER BY name) ; -- first_value SELECT name, orbit, first_value(orbit) RESPECT NULLS OVER w1, first_value(orbit) IGNORE NULLS OVER w1, first_value(orbit) RESPECT NULLS OVER w2, first_value(orbit) IGNORE NULLS OVER w2 FROM planets WINDOW w1 AS (ORDER BY name ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), w2 AS (ORDER BY name ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) ; -- nth_value SELECT name, orbit, nth_value(orbit, 2) RESPECT NULLS OVER w1, nth_value(orbit, 2) IGNORE NULLS OVER w1, nth_value(orbit, 2) RESPECT NULLS OVER w2, nth_value(orbit, 2) IGNORE NULLS OVER w2 FROM planets WINDOW w1 AS (ORDER BY name ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), w2 AS (ORDER BY name ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) ; -- last_value SELECT name, orbit, last_value(orbit) RESPECT NULLS OVER w1, last_value(orbit) IGNORE NULLS OVER w1, last_value(orbit) RESPECT NULLS OVER w2, last_value(orbit) IGNORE NULLS OVER w2 FROM planets WINDOW w1 AS (ORDER BY name ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), w2 AS (ORDER BY name ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) ; -- exclude current row SELECT name, orbit, first_value(orbit) IGNORE NULLS OVER w, last_value(orbit) IGNORE NULLS OVER w, nth_value(orbit, 2) IGNORE NULLS OVER w, lead(orbit, 1) IGNORE NULLS OVER w AS lead_ignore, lag(orbit, 1) IGNORE NULLS OVER w AS lag_ignore FROM planets WINDOW w AS (ORDER BY name ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING EXCLUDE CURRENT ROW) ; -- valid and invalid functions SELECT sum(orbit) OVER () FROM planets; -- succeeds SELECT sum(orbit) RESPECT NULLS OVER () FROM planets; -- fails SELECT sum(orbit) IGNORE NULLS OVER () FROM planets; -- fails SELECT row_number() OVER () FROM planets; -- succeeds SELECT row_number() RESPECT NULLS OVER () FROM planets; -- fails SELECT row_number() IGNORE NULLS OVER () FROM planets; -- fails -- test two consecutive nulls update planets set orbit=null where name='jupiter'; SELECT name, orbit, first_value(orbit) IGNORE NULLS OVER w, last_value(orbit) IGNORE NULLS OVER w, nth_value(orbit, 2) IGNORE NULLS OVER w, lead(orbit, 1) IGNORE NULLS OVER w AS lead_ignore, lag(orbit, 1) IGNORE NULLS OVER w AS lag_ignore FROM planets WINDOW w AS (ORDER BY name ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) ; -- test partitions SELECT name, distance, orbit, first_value(orbit) IGNORE NULLS OVER w, last_value(orbit) IGNORE NULLS OVER w, nth_value(orbit, 2) IGNORE NULLS OVER w, lead(orbit, 1) IGNORE NULLS OVER w AS lead_ignore, lag(orbit, 1) IGNORE NULLS OVER w AS lag_ignore FROM planets WINDOW w AS (PARTITION BY distance ORDER BY name ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) ; -- nth_value without nulls SELECT x, nth_value(x,2) IGNORE NULLS OVER w FROM generate_series(1,5) g(x) WINDOW w AS (ORDER BY x ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING EXCLUDE CURRENT ROW); SELECT x, nth_value(x,2) IGNORE NULLS OVER w FROM generate_series(1,5) g(x) WINDOW w AS (ORDER BY x ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING); --cleanup DROP TABLE planets CASCADE; 662c662 < select x, lag(x, 1) over (order by x), lead(x, 3) over (order by x) --- > select x, lag(x, 1) ignore nulls over (order by x), lead(x, 3) ignore nulls over (order by x) 681c681 < last_value(ten) over (partition by four order by ten) --- > last_value(ten) ignore nulls over (partition by four order by ten) 709c709 < last_value(ten) over (partition by four order by ten range between unbounded preceding and current row) --- > last_value(ten) ignore nulls over (partition by four order by ten range between unbounded preceding and current row) 737c737 < last_value(ten) over (partition by four order by ten range between unbounded preceding and unbounded following) --- > last_value(ten) ignore nulls over (partition by four order by ten range between unbounded preceding and unbounded following) 765c765 < last_value(ten/4) over (partition by four order by ten/4 range between unbounded preceding and current row) --- > last_value(ten/4) ignore nulls over (partition by four order by ten/4 range between unbounded preceding and currentrow) 793c793 < last_value(ten/4) over (partition by four order by ten/4 rows between unbounded preceding and current row) --- > last_value(ten/4) ignore nulls over (partition by four order by ten/4 rows between unbounded preceding and currentrow) 938c938 < SELECT first_value(unique1) over (ORDER BY four rows between current row and 2 following exclude current row), --- > SELECT first_value(unique1) ignore nulls over (ORDER BY four rows between current row and 2 following exclude current row), 955c955 < SELECT first_value(unique1) over (ORDER BY four rows between current row and 2 following exclude group), --- > SELECT first_value(unique1) ignore nulls over (ORDER BY four rows between current row and 2 following exclude group), 972c972 < SELECT first_value(unique1) over (ORDER BY four rows between current row and 2 following exclude ties), --- > SELECT first_value(unique1) ignore nulls over (ORDER BY four rows between current row and 2 following exclude ties), 989c989 < SELECT last_value(unique1) over (ORDER BY four rows between current row and 2 following exclude current row), --- > SELECT last_value(unique1) ignore nulls over (ORDER BY four rows between current row and 2 following exclude current row), 1006c1006 < SELECT last_value(unique1) over (ORDER BY four rows between current row and 2 following exclude group), --- > SELECT last_value(unique1) ignore nulls over (ORDER BY four rows between current row and 2 following exclude group), 1023c1023 < SELECT last_value(unique1) over (ORDER BY four rows between current row and 2 following exclude ties), --- > SELECT last_value(unique1) ignore nulls over (ORDER BY four rows between current row and 2 following exclude ties), 1159,1161c1159,1161 < SELECT first_value(unique1) over w, < nth_value(unique1, 2) over w AS nth_2, < last_value(unique1) over w, unique1, four --- > SELECT first_value(unique1) ignore nulls over w, > nth_value(unique1, 2) ignore nulls over w AS nth_2, > last_value(unique1) ignore nulls over w, unique1, four 1631,1633c1631,1633 < select first_value(salary) over(order by salary range between 1000 preceding and 1000 following), < lead(salary) over(order by salary range between 1000 preceding and 1000 following), < nth_value(salary, 1) over(order by salary range between 1000 preceding and 1000 following), --- > select first_value(salary) ignore nulls over(order by salary range between 1000 preceding and 1000 following), > lead(salary) ignore nulls over(order by salary range between 1000 preceding and 1000 following), > nth_value(salary, 1) ignore nulls over(order by salary range between 1000 preceding and 1000 following), 1649,1650c1649,1650 < select last_value(salary) over(order by salary range between 1000 preceding and 1000 following), < lag(salary) over(order by salary range between 1000 preceding and 1000 following), --- > select last_value(salary) ignore nulls over(order by salary range between 1000 preceding and 1000 following), > lag(salary) ignore nulls over(order by salary range between 1000 preceding and 1000 following), 1666c1666 < select first_value(salary) over(order by salary range between 1000 following and 3000 following --- > select first_value(salary) ignore nulls over(order by salary range between 1000 following and 3000 following 1668,1669c1668,1669 < lead(salary) over(order by salary range between 1000 following and 3000 following exclude ties), < nth_value(salary, 1) over(order by salary range between 1000 following and 3000 following --- > lead(salary) ignore nulls over(order by salary range between 1000 following and 3000 following exclude ties), > nth_value(salary, 1) ignore nulls over(order by salary range between 1000 following and 3000 following 1686c1686 < select last_value(salary) over(order by salary range between 1000 following and 3000 following --- > select last_value(salary) ignore nulls over(order by salary range between 1000 following and 3000 following 1688c1688 < lag(salary) over(order by salary range between 1000 following and 3000 following exclude group), --- > lag(salary) ignore nulls over(order by salary range between 1000 following and 3000 following exclude group), 1704c1704 < select first_value(salary) over(order by enroll_date range between unbounded preceding and '1 year'::interval following --- > select first_value(salary) ignore nulls over(order by enroll_date range between unbounded preceding and '1 year'::intervalfollowing 1706c1706 < last_value(salary) over(order by enroll_date range between unbounded preceding and '1 year'::interval following), --- > last_value(salary) ignore nulls over(order by enroll_date range between unbounded preceding and '1 year'::intervalfollowing), 1722c1722 < select first_value(salary) over(order by enroll_date range between unbounded preceding and '1 year'::interval following --- > select first_value(salary) ignore nulls over(order by enroll_date range between unbounded preceding and '1 year'::intervalfollowing 1724c1724 < last_value(salary) over(order by enroll_date range between unbounded preceding and '1 year'::interval following --- > last_value(salary) ignore nulls over(order by enroll_date range between unbounded preceding and '1 year'::intervalfollowing 1741c1741 < select first_value(salary) over(order by enroll_date range between unbounded preceding and '1 year'::interval following --- > select first_value(salary) ignore nulls over(order by enroll_date range between unbounded preceding and '1 year'::intervalfollowing 1743c1743 < last_value(salary) over(order by enroll_date range between unbounded preceding and '1 year'::interval following --- > last_value(salary) ignore nulls over(order by enroll_date range between unbounded preceding and '1 year'::intervalfollowing 1760c1760 < select first_value(salary) over(order by enroll_date range between unbounded preceding and '1 year'::interval following --- > select first_value(salary) ignore nulls over(order by enroll_date range between unbounded preceding and '1 year'::intervalfollowing 1762c1762 < last_value(salary) over(order by enroll_date range between unbounded preceding and '1 year'::interval following --- > last_value(salary) ignore nulls over(order by enroll_date range between unbounded preceding and '1 year'::intervalfollowing 1781,1782c1781,1782 < first_value(y) over w, < last_value(y) over w --- > first_value(y) ignore nulls over w, > last_value(y) ignore nulls over w 1801,1802c1801,1802 < first_value(y) over w, < last_value(y) over w --- > first_value(y) ignore nulls over w, > last_value(y) ignore nulls over w 1821,1822c1821,1822 < first_value(y) over w, < last_value(y) over w --- > first_value(y) ignore nulls over w, > last_value(y) ignore nulls over w 1841,1842c1841,1842 < first_value(y) over w, < last_value(y) over w --- > first_value(y) ignore nulls over w, > last_value(y) ignore nulls over w 2001c2001 < select x, last_value(x) over (order by x::smallint range between current row and 2147450884 following) --- > select x, last_value(x) ignore nulls over (order by x::smallint range between current row and 2147450884 following) 2010c2010 < select x, last_value(x) over (order by x::smallint desc range between current row and 2147450885 following) --- > select x, last_value(x) ignore nulls over (order by x::smallint desc range between current row and 2147450885 following) 2019c2019 < select x, last_value(x) over (order by x range between current row and 4 following) --- > select x, last_value(x) ignore nulls over (order by x range between current row and 4 following) 2028c2028 < select x, last_value(x) over (order by x desc range between current row and 5 following) --- > select x, last_value(x) ignore nulls over (order by x desc range between current row and 5 following) 2037c2037 < select x, last_value(x) over (order by x range between current row and 4 following) --- > select x, last_value(x) ignore nulls over (order by x range between current row and 4 following) 2046c2046 < select x, last_value(x) over (order by x desc range between current row and 5 following) --- > select x, last_value(x) ignore nulls over (order by x desc range between current row and 5 following) 2073c2073 < select id, f_float4, first_value(id) over w, last_value(id) over w --- > select id, f_float4, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2091c2091 < select id, f_float4, first_value(id) over w, last_value(id) over w --- > select id, f_float4, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2109c2109 < select id, f_float4, first_value(id) over w, last_value(id) over w --- > select id, f_float4, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2127c2127 < select id, f_float4, first_value(id) over w, last_value(id) over w --- > select id, f_float4, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2145c2145 < select id, f_float4, first_value(id) over w, last_value(id) over w --- > select id, f_float4, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2163c2163 < select id, f_float4, first_value(id) over w, last_value(id) over w --- > select id, f_float4, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2168c2168 < select id, f_float8, first_value(id) over w, last_value(id) over w --- > select id, f_float8, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2186c2186 < select id, f_float8, first_value(id) over w, last_value(id) over w --- > select id, f_float8, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2204c2204 < select id, f_float8, first_value(id) over w, last_value(id) over w --- > select id, f_float8, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2222c2222 < select id, f_float8, first_value(id) over w, last_value(id) over w --- > select id, f_float8, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2240c2240 < select id, f_float8, first_value(id) over w, last_value(id) over w --- > select id, f_float8, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2258c2258 < select id, f_float8, first_value(id) over w, last_value(id) over w --- > select id, f_float8, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2263c2263 < select id, f_numeric, first_value(id) over w, last_value(id) over w --- > select id, f_numeric, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2281c2281 < select id, f_numeric, first_value(id) over w, last_value(id) over w --- > select id, f_numeric, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2299c2299 < select id, f_numeric, first_value(id) over w, last_value(id) over w --- > select id, f_numeric, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2307c2307 < select id, f_numeric, first_value(id) over w, last_value(id) over w --- > select id, f_numeric, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2325c2325 < select id, f_numeric, first_value(id) over w, last_value(id) over w --- > select id, f_numeric, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2343c2343 < select id, f_numeric, first_value(id) over w, last_value(id) over w --- > select id, f_numeric, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2361c2361 < select id, f_numeric, first_value(id) over w, last_value(id) over w --- > select id, f_numeric, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2388c2388 < select id, f_time, first_value(id) over w, last_value(id) over w --- > select id, f_time, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2408c2408 < select id, f_time, first_value(id) over w, last_value(id) over w --- > select id, f_time, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2428c2428 < select id, f_time, first_value(id) over w, last_value(id) over w --- > select id, f_time, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2433c2433 < select id, f_time, first_value(id) over w, last_value(id) over w --- > select id, f_time, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2453c2453 < select id, f_time, first_value(id) over w, last_value(id) over w --- > select id, f_time, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2473c2473 < select id, f_time, first_value(id) over w, last_value(id) over w --- > select id, f_time, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2493c2493 < select id, f_time, first_value(id) over w, last_value(id) over w --- > select id, f_time, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2499c2499 < select id, f_timetz, first_value(id) over w, last_value(id) over w --- > select id, f_timetz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2519c2519 < select id, f_timetz, first_value(id) over w, last_value(id) over w --- > select id, f_timetz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2539c2539 < select id, f_timetz, first_value(id) over w, last_value(id) over w --- > select id, f_timetz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2544c2544 < select id, f_timetz, first_value(id) over w, last_value(id) over w --- > select id, f_timetz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2564c2564 < select id, f_timetz, first_value(id) over w, last_value(id) over w --- > select id, f_timetz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2584c2584 < select id, f_timetz, first_value(id) over w, last_value(id) over w --- > select id, f_timetz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2604c2604 < select id, f_timetz, first_value(id) over w, last_value(id) over w --- > select id, f_timetz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2610c2610 < select id, f_interval, first_value(id) over w, last_value(id) over w --- > select id, f_interval, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2630c2630 < select id, f_interval, first_value(id) over w, last_value(id) over w --- > select id, f_interval, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2650c2650 < select id, f_interval, first_value(id) over w, last_value(id) over w --- > select id, f_interval, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2655c2655 < select id, f_interval, first_value(id) over w, last_value(id) over w --- > select id, f_interval, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2675c2675 < select id, f_interval, first_value(id) over w, last_value(id) over w --- > select id, f_interval, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2695c2695 < select id, f_interval, first_value(id) over w, last_value(id) over w --- > select id, f_interval, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2715c2715 < select id, f_interval, first_value(id) over w, last_value(id) over w --- > select id, f_interval, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2721c2721 < select id, f_timestamptz, first_value(id) over w, last_value(id) over w --- > select id, f_timestamptz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2741c2741 < select id, f_timestamptz, first_value(id) over w, last_value(id) over w --- > select id, f_timestamptz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2761c2761 < select id, f_timestamptz, first_value(id) over w, last_value(id) over w --- > select id, f_timestamptz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2766c2766 < select id, f_timestamptz, first_value(id) over w, last_value(id) over w --- > select id, f_timestamptz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2786c2786 < select id, f_timestamptz, first_value(id) over w, last_value(id) over w --- > select id, f_timestamptz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2806c2806 < select id, f_timestamptz, first_value(id) over w, last_value(id) over w --- > select id, f_timestamptz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2826c2826 < select id, f_timestamptz, first_value(id) over w, last_value(id) over w --- > select id, f_timestamptz, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2832c2832 < select id, f_timestamp, first_value(id) over w, last_value(id) over w --- > select id, f_timestamp, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2852c2852 < select id, f_timestamp, first_value(id) over w, last_value(id) over w --- > select id, f_timestamp, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2872c2872 < select id, f_timestamp, first_value(id) over w, last_value(id) over w --- > select id, f_timestamp, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2877c2877 < select id, f_timestamp, first_value(id) over w, last_value(id) over w --- > select id, f_timestamp, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2897c2897 < select id, f_timestamp, first_value(id) over w, last_value(id) over w --- > select id, f_timestamp, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2917c2917 < select id, f_timestamp, first_value(id) over w, last_value(id) over w --- > select id, f_timestamp, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 2937c2937 < select id, f_timestamp, first_value(id) over w, last_value(id) over w --- > select id, f_timestamp, first_value(id) ignore nulls over w, last_value(id) ignore nulls over w 3253,3255c3253,3255 < select first_value(salary) over(order by enroll_date groups between 1 preceding and 1 following), < lead(salary) over(order by enroll_date groups between 1 preceding and 1 following), < nth_value(salary, 1) over(order by enroll_date groups between 1 preceding and 1 following), --- > select first_value(salary) ignore nulls over(order by enroll_date groups between 1 preceding and 1 following), > lead(salary) ignore nulls over(order by enroll_date groups between 1 preceding and 1 following), > nth_value(salary, 1) ignore nulls over(order by enroll_date groups between 1 preceding and 1 following), 3271,3272c3271,3272 < select last_value(salary) over(order by enroll_date groups between 1 preceding and 1 following), < lag(salary) over(order by enroll_date groups between 1 preceding and 1 following), --- > select last_value(salary) ignore nulls over(order by enroll_date groups between 1 preceding and 1 following), > lag(salary) ignore nulls over(order by enroll_date groups between 1 preceding and 1 following), 3288c3288 < select first_value(salary) over(order by enroll_date groups between 1 following and 3 following --- > select first_value(salary) ignore nulls over(order by enroll_date groups between 1 following and 3 following 3290,3291c3290,3291 < lead(salary) over(order by enroll_date groups between 1 following and 3 following exclude ties), < nth_value(salary, 1) over(order by enroll_date groups between 1 following and 3 following --- > lead(salary) ignore nulls over(order by enroll_date groups between 1 following and 3 following exclude ties), > nth_value(salary, 1) ignore nulls over(order by enroll_date groups between 1 following and 3 following 3308c3308 < select last_value(salary) over(order by enroll_date groups between 1 following and 3 following --- > select last_value(salary) ignore nulls over(order by enroll_date groups between 1 following and 3 following 3310c3310 < lag(salary) over(order by enroll_date groups between 1 following and 3 following exclude group), --- > lag(salary) ignore nulls over(order by enroll_date groups between 1 following and 3 following exclude group),
Attached are the v17 patches for adding RESPECT/IGNORE NULLS options defined in the standard to some window functions. FROM FIRST/LAST options are not considered in the patch (yet). This time I split the patch into 6 patches for reviewer's convenience. Also each patch has a short commit message to explain the patch. 0001: parse and analysis 0002: rewriter 0003: planner 0004: executor 0005: documents 0006: tests Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Вложения
- v17-0001-Modify-parse-analysis-modules-to-accept-RESPECT-.patch
- v17-0002-Modify-get_windowfunc_expr_helper-to-handle-IGNO.patch
- v17-0003-Modify-eval_const_expressions_mutator-to-handle-.patch
- v17-0004-Modify-executor-and-window-functions-to-handle-I.patch
- v17-0005-Modify-documents-to-add-null-treatment-clause.patch
- v17-0006-Modify-window-function-regression-tests-to-test-.patch
Attached are the v18 patches for adding RESPECT/IGNORE NULLS options to some window functions. Recent changes to doc/src/sgml/func.sgml required v17 to be rebased. Other than that, nothing has been changed. Oliver, do you have any comments on the patches? > Attached are the v17 patches for adding RESPECT/IGNORE NULLS options > defined in the standard to some window functions. FROM FIRST/LAST > options are not considered in the patch (yet). > > This time I split the patch into 6 > patches for reviewer's convenience. Also each patch has a short commit > message to explain the patch. > > 0001: parse and analysis > 0002: rewriter > 0003: planner > 0004: executor > 0005: documents > 0006: tests Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Вложения
- v18-0001-Modify-parse-analysis-modules-to-accept-RESPECT-.patch
- v18-0002-Modify-get_windowfunc_expr_helper-to-handle-IGNO.patch
- v18-0003-Modify-eval_const_expressions_mutator-to-handle-.patch
- v18-0004-Modify-executor-and-window-functions-to-handle-I.patch
- v18-0005-Modify-documents-to-add-null-treatment-clause.patch
- v18-0006-Modify-window-function-regression-tests-to-test-.patch
On Sat, Aug 16, 2025 at 10:33 AM Tatsuo Ishii <ishii@postgresql.org> wrote:
Attached are the v18 patches for adding RESPECT/IGNORE NULLS options
to some window functions. Recent changes to doc/src/sgml/func.sgml
required v17 to be rebased. Other than that, nothing has been changed.
Oliver, do you have any comments on the patches?
Looks good, tried it on the nth_value test script from a bit ago - I added a 1 million rows test and it takes an average of 12 seconds on my i7.
>> Attached are the v18 patches for adding RESPECT/IGNORE NULLS options >> to some window functions. Recent changes to doc/src/sgml/func.sgml >> required v17 to be rebased. Other than that, nothing has been changed. >> >> Oliver, do you have any comments on the patches? >> > > Looks good, tried it on the nth_value test script from a bit ago - I added > a 1 million rows test and it takes an average of 12 seconds on my i7. Thanks. I have moved the CF entry from PG19-1 to PG19-2 as PG19-1 has been already closed on July 31. Hope this help CF bot to catch the v18 patches. Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
>> Attached are the v18 patches for adding RESPECT/IGNORE NULLS options >> to some window functions. Recent changes to doc/src/sgml/func.sgml >> required v17 to be rebased. Other than that, nothing has been changed. >> >> Oliver, do you have any comments on the patches? >> > > Looks good, tried it on the nth_value test script from a bit ago - I added > a 1 million rows test and it takes an average of 12 seconds on my i7. I would like to push the patch by the end of this month or early in October if there's no objection. Comments/suggestions are welcome. -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Overall LGTM. Just a few small comments:
On Sep 12, 2025, at 17:53, Tatsuo Ishii <ishii@postgresql.org> wrote:
Comments/suggestions are welcome.
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
1 - 0001
```
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -98,6 +98,7 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
bool agg_star = (fn ? fn->agg_star : false);
bool agg_distinct = (fn ? fn->agg_distinct : false);
bool func_variadic = (fn ? fn->func_variadic : false);
+ int ignore_nulls = (fn ? fn->ignore_nulls : 0);
```
Should we use the constant NO_NULLTREATMENT here for 0?
2 - 0001
```
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -579,6 +579,17 @@ typedef struct GroupingFunc
* Collation information is irrelevant for the query jumbling, as is the
* internal state information of the node like "winstar" and "winagg".
*/
+
+/*
+ * Null Treatment options. If specified, initially set to PARSER_IGNORE_NULLS
+ * which is then converted to IGNORE_NULLS if the window function allows the
+ * null treatment clause.
+ */
+#define NO_NULLTREATMENT 0
+#define PARSER_IGNORE_NULLS 1
+#define PARSER_RESPECT_NULLS 2
+#define IGNORE_NULLS 3
+
typedef struct WindowFunc
{
Expr xpr;
@@ -602,6 +613,8 @@ typedef struct WindowFunc
bool winstar pg_node_attr(query_jumble_ignore);
/* is function a simple aggregate? */
bool winagg pg_node_attr(query_jumble_ignore);
+ /* ignore nulls. One of the Null Treatment options */
+ int ignore_nulls;
```
Maybe we can use “uint8” type for “ignore_nulls”. Because the previous two are both of type “bool”, an uint8 will just fit to the padding bytes, so that new field won’t add extra memory to the structure.
3 - 0004
```
winobj->markpos = -1;
winobj->seekpos = -1;
+
+ /* reset null map */
+ if (perfuncstate->winobj->ignore_nulls == IGNORE_NULLS)
+ memset(perfuncstate->winobj->notnull_info, 0,
+ NN_POS_TO_BYTES(perfuncstate->winobj->num_notnull_info));
}
```
Where in “if” and “memset()”, we can just use “winobj”.
4 - 0004
```
+ if (!HeapTupleIsValid(proctup))
+ elog(ERROR, "cache lookup failed for function %u", funcid);
+ procform = (Form_pg_proc) GETSTRUCT(proctup);
+ elog(ERROR, "function %s does not allow RESPECT/IGNORE NULLS",
+ NameStr(procform->proname));
```
“Procform” is assigned but not used.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
HighGo Software Co., Ltd.
https://www.highgo.com/
> Overall LGTM. Just a few small comments: > 1 - 0001 > ``` > --- a/src/backend/parser/parse_func.c > +++ b/src/backend/parser/parse_func.c > @@ -98,6 +98,7 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs, > bool agg_star = (fn ? fn->agg_star : false); > bool agg_distinct = (fn ? fn->agg_distinct : false); > bool func_variadic = (fn ? fn->func_variadic : false); > + int ignore_nulls = (fn ? fn->ignore_nulls : 0); > ``` > > Should we use the constant NO_NULLTREATMENT here for 0? Good suggestion. Will fix. > 2 - 0001 > ``` > --- a/src/include/nodes/primnodes.h > +++ b/src/include/nodes/primnodes.h > @@ -579,6 +579,17 @@ typedef struct GroupingFunc > * Collation information is irrelevant for the query jumbling, as is the > * internal state information of the node like "winstar" and "winagg". > */ > + > +/* > + * Null Treatment options. If specified, initially set to PARSER_IGNORE_NULLS > + * which is then converted to IGNORE_NULLS if the window function allows the > + * null treatment clause. > + */ > +#define NO_NULLTREATMENT 0 > +#define PARSER_IGNORE_NULLS 1 > +#define PARSER_RESPECT_NULLS 2 > +#define IGNORE_NULLS 3 > + > typedef struct WindowFunc > { > Expr xpr; > @@ -602,6 +613,8 @@ typedef struct WindowFunc > bool winstar pg_node_attr(query_jumble_ignore); > /* is function a simple aggregate? */ > bool winagg pg_node_attr(query_jumble_ignore); > + /* ignore nulls. One of the Null Treatment options */ > + int ignore_nulls; > ``` > > Maybe we can use “uint8” type for “ignore_nulls”. Because the previous two are both of type “bool”, an uint8 will justfit to the padding bytes, so that new field won’t add extra memory to the structure. If we change the data type for ignore_nulls in WindowFunc, we may also want to change it elsewhere (FuncCall, WindowObjectData, WindowStatePerFuncData) for consistency? > 3 - 0004 > ``` > winobj->markpos = -1; > winobj->seekpos = -1; > + > + /* reset null map */ > + if (perfuncstate->winobj->ignore_nulls == IGNORE_NULLS) > + memset(perfuncstate->winobj->notnull_info, 0, > + NN_POS_TO_BYTES(perfuncstate->winobj->num_notnull_info)); > } > ``` > Where in “if” and “memset()”, we can just use “winobj”. Good catch. Will fix. > 4 - 0004 > ``` > + if (!HeapTupleIsValid(proctup)) > + elog(ERROR, "cache lookup failed for function %u", funcid); > + procform = (Form_pg_proc) GETSTRUCT(proctup); > + elog(ERROR, "function %s does not allow RESPECT/IGNORE NULLS", > + NameStr(procform->proname)); > ``` > > “Procform” is assigned but not used. I think procform is used in the following elog(ERROR, ...). Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Attached is the updated v19 patches. Mostly applied changes suggested by Chao. >> Overall LGTM. Just a few small comments: > >> 1 - 0001 >> ``` >> --- a/src/backend/parser/parse_func.c >> +++ b/src/backend/parser/parse_func.c >> @@ -98,6 +98,7 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs, >> bool agg_star = (fn ? fn->agg_star : false); >> bool agg_distinct = (fn ? fn->agg_distinct : false); >> bool func_variadic = (fn ? fn->func_variadic : false); >> + int ignore_nulls = (fn ? fn->ignore_nulls : 0); >> ``` >> >> Should we use the constant NO_NULLTREATMENT here for 0? > > Good suggestion. Will fix. Done. >> 2 - 0001 >> ``` >> --- a/src/include/nodes/primnodes.h >> +++ b/src/include/nodes/primnodes.h >> @@ -579,6 +579,17 @@ typedef struct GroupingFunc >> * Collation information is irrelevant for the query jumbling, as is the >> * internal state information of the node like "winstar" and "winagg". >> */ >> + >> +/* >> + * Null Treatment options. If specified, initially set to PARSER_IGNORE_NULLS >> + * which is then converted to IGNORE_NULLS if the window function allows the >> + * null treatment clause. >> + */ >> +#define NO_NULLTREATMENT 0 >> +#define PARSER_IGNORE_NULLS 1 >> +#define PARSER_RESPECT_NULLS 2 >> +#define IGNORE_NULLS 3 >> + >> typedef struct WindowFunc >> { >> Expr xpr; >> @@ -602,6 +613,8 @@ typedef struct WindowFunc >> bool winstar pg_node_attr(query_jumble_ignore); >> /* is function a simple aggregate? */ >> bool winagg pg_node_attr(query_jumble_ignore); >> + /* ignore nulls. One of the Null Treatment options */ >> + int ignore_nulls; >> ``` >> >> Maybe we can use “uint8” type for “ignore_nulls”. Because the previous two are both of type “bool”, an uint8 will justfit to the padding bytes, so that new field won’t add extra memory to the structure. > > If we change the data type for ignore_nulls in WindowFunc, we may also > want to change it elsewhere (FuncCall, WindowObjectData, > WindowStatePerFuncData) for consistency? I tried to change all "int ignore_nulls;" to "uint8 ignore_nulls;" but gen_node_support.pl dislikes it and complains like: could not handle type "uint8" in struct "FuncCall" field "ignore_nulls" >> 3 - 0004 >> ``` >> winobj->markpos = -1; >> winobj->seekpos = -1; >> + >> + /* reset null map */ >> + if (perfuncstate->winobj->ignore_nulls == IGNORE_NULLS) >> + memset(perfuncstate->winobj->notnull_info, 0, >> + NN_POS_TO_BYTES(perfuncstate->winobj->num_notnull_info)); >> } >> ``` >> Where in “if” and “memset()”, we can just use “winobj”. > > Good catch. Will fix. Done. >> 4 - 0004 >> ``` >> + if (!HeapTupleIsValid(proctup)) >> + elog(ERROR, "cache lookup failed for function %u", funcid); >> + procform = (Form_pg_proc) GETSTRUCT(proctup); >> + elog(ERROR, "function %s does not allow RESPECT/IGNORE NULLS", >> + NameStr(procform->proname)); >> ``` >> >> “Procform” is assigned but not used. > > I think procform is used in the following elog(ERROR, ...). I added more tests for functions (rank(), dense_rank(), percent_rank(), cume_dist() and ntile()) that do not support RESPECT/IGNORE NULLS options to confirm that they throw errors if the options are given. Previously there was only test cases for row_number(). Also I have made small cosmetic changes to executor/nodeWindowAgg.c to make too long lines shorter. Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Вложения
- v19-0001-Modify-parse-analysis-modules-to-accept-RESPECT-.patch
- v19-0002-Modify-get_windowfunc_expr_helper-to-handle-IGNO.patch
- v19-0003-Modify-eval_const_expressions_mutator-to-handle-.patch
- v19-0004-Modify-executor-and-window-functions-to-handle-I.patch
- v19-0005-Modify-documents-to-add-null-treatment-clause.patch
- v19-0006-Modify-window-function-regression-tests-to-test-.patch
On Wed, Sep 24, 2025 at 6:39 AM Tatsuo Ishii <ishii@postgresql.org> wrote:
I tried to change all "int ignore_nulls;" to "uint8 ignore_nulls;" but
gen_node_support.pl dislikes it and complains like:
could not handle type "uint8" in struct "FuncCall" field "ignore_nulls"
uint8 was missing in one place in that perl script. The attached patch silences it for uint8/uint16.
Вложения
Hi Oliver, > On Wed, Sep 24, 2025 at 6:39 AM Tatsuo Ishii <ishii@postgresql.org> wrote: > >> I tried to change all "int ignore_nulls;" to "uint8 ignore_nulls;" but >> gen_node_support.pl dislikes it and complains like: >> >> could not handle type "uint8" in struct "FuncCall" field "ignore_nulls" >> >> > uint8 was missing in one place in that perl script. The attached patch > silences it for uint8/uint16. Thank you for the patch. (I noticed int8 is also missing). I have looked into the commit 964d01ae90c3 which was made by Peter. I have quick read through the discussion to know why uint8/uint16 (and int8) are missing in gen_node_support.pl. Unfortunately I have no clear idea why these data types are missing in the script. Peter, Maybe you wanted to limit the data types that are actually used at that point? If so, probably we should only add uint8 support this time (uint8 is only needed to implement $Subject for now). What do you think? Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
> Thank you for the patch. (I noticed int8 is also missing). > > I have looked into the commit 964d01ae90c3 which was made by Peter. I > have quick read through the discussion to know why uint8/uint16 (and > int8) are missing in gen_node_support.pl. Unfortunately I have no > clear idea why these data types are missing in the script. > > Peter, > Maybe you wanted to limit the data types that are actually used at > that point? If so, probably we should only add uint8 support this time > (uint8 is only needed to implement $Subject for now). What do you > think? I decided not to include the fix to gen_node_support.pl for now and commit the patch without it. We could revisit it later on. So here is the commit message I would like to propose. For the technical part please look at the message. Non technical part: First of all the author is Oliver (no doubt). I would like to be listed as a co-author since I wrote the not null cache part. Next is reviewers. Actually the first effor to implement null treatment clause was back to 9.3 era (2013) at least. After that multiple trials to implemnt the feature happend but they had faded away. I think we don't need to include all of those who joined the old discussions as reviewers. So I started to check from the discussion: https://postgr.es/m/flat/CAGMVOdsbtRwE_4+v8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A@mail.gmail.com because it's refered to by the commit fest entry. Oliver and others, I love to hear your comment! BTW, Oliver's last patch made the CF bot to misunderstand the patch contents, which was not actually the main patch. So I attach the same patch as v20. ---------------------------------------------------------------------- Add IGNORE NULLS/RESPECT NULLS option to Window functions. Add IGNORE NULLS/RESPECT NULLS option (null treatment clause) to lead, lag, first_value, last_value and nth_value window functions. If unspecified, the default is RESPECT NULLS which includes NULL values in any result calculation. IGNORE NULLS ignores NULL values. Built-in window functions are modified to call new API WinCheckAndInitializeNullTreatment() to indicate whether they accept IGNORE NULLS/RESPECT NULLS option or not (the API can be called by user defined window functions as well). If WinGetFuncArgInPartition's allowNullTreatment argument is true and IGNORE NULLS option is given, WinGetFuncArgInPartition() or WinGetFuncArgInFrame() will return evaluated function's argument expression on specified non NULL row (if it exists) in the partition or the frame. When IGNORE NULLS option is given, window functions need to visit and evaluate same rows over and over again to look for non null rows. To mitigate the issue, 2-bit not null information array is created while executing window functions to remember whether the row has been already evaluated to NULL or NOT NULL. If already evaluated, we could skip some the evaluation work, thus we could get better performance. Author: Oliver Ford <ojford@gmail.com> Co-authored-by: Tatsuo Ishii <ishii@postgresql.org> Reviewed-by: Andrew Gierth <andrew@tao11.riddles.org.uk> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Fetter <david@fetter.org> Reviewed-by: Vik Fearing <vik@postgresfriends.org> Reviewed-by: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-by: Krasiyan Andreev <krasiyan@gmail.com> Reviewed-by: Chao Li <lic@highgo.com> Discussion: https://postgr.es/m/flat/CAGMVOdsbtRwE_4+v8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A@mail.gmail.com ---------------------------------------------------------------------- Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Вложения
- v20-0001-Modify-parse-analysis-modules-to-accept-RESPECT-.patch
- v20-0002-Modify-get_windowfunc_expr_helper-to-handle-IGNO.patch
- v20-0003-Modify-eval_const_expressions_mutator-to-handle-.patch
- v20-0004-Modify-executor-and-window-functions-to-handle-I.patch
- v20-0005-Modify-documents-to-add-null-treatment-clause.patch
- v20-0006-Modify-window-function-regression-tests-to-test-.patch
On Thu, 2 Oct 2025, 13:16 Tatsuo Ishii, <ishii@postgresql.org> wrote:
> Thank you for the patch. (I noticed int8 is also missing).
>
> I have looked into the commit 964d01ae90c3 which was made by Peter. I
> have quick read through the discussion to know why uint8/uint16 (and
> int8) are missing in gen_node_support.pl. Unfortunately I have no
> clear idea why these data types are missing in the script.
>
> Peter,
> Maybe you wanted to limit the data types that are actually used at
> that point? If so, probably we should only add uint8 support this time
> (uint8 is only needed to implement $Subject for now). What do you
> think?
I decided not to include the fix to gen_node_support.pl for now and
commit the patch without it. We could revisit it later on.
So here is the commit message I would like to propose.
For the technical part please look at the message.
Non technical part:
First of all the author is Oliver (no doubt). I would like to be
listed as a co-author since I wrote the not null cache part.
Next is reviewers. Actually the first effor to implement null
treatment clause was back to 9.3 era (2013) at least. After that
multiple trials to implemnt the feature happend but they had faded
away. I think we don't need to include all of those who joined the old
discussions as reviewers. So I started to check from the discussion:
https://postgr.es/m/flat/CAGMVOdsbtRwE_4+v8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A@mail.gmail.com
because it's refered to by the commit fest entry.
Oliver and others, I love to hear your comment!
Looks great, so glad this is finally going in.