Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options
От | Tatsuo Ishii |
---|---|
Тема | Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options |
Дата | |
Msg-id | 20250612.075235.1886649649521129924.ishii@postgresql.org обсуждение исходный текст |
Ответ на | Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options (Tatsuo Ishii <ishii@postgresql.org>) |
Список | pgsql-hackers |
> One thing I worry about the patch is, now the non-nulls array > optimization was removed. Since then I have been thinking about if > there could be other way to optimize searching for non null rows. Here is the v12 patch to implement the optimization on top of Oliver's v11 patch. Only src/backend/executor/nodeWindowAgg.c was modified (especially ignorenulls_getfuncarginframe). In the patch I created 2-bit not null information array, representing following status for each row: UNKNOWN: the row is not determined whether it's NULL or NOT yet. This is the initial value. NULL: the row has been determined to be NULL. NOT NULL: the row has been determined to be NOT NULL. In ignorenulls_getfuncarginframe: For the first time window function visits a row in a frame, the row is fetched using window_gettupleslot() and it is checked whether it is in the frame using row_is_in_frame(). If it's in the frame and the information in the array is UNKNOWN, ExecEvalExpr() is executed to find out if the expression on the function argument is NULL or not. And the result (NULL or NOT NULL) is stored in the array. If the information in the array is not UNKNOWN, we can skip calling ExecEvalExpr() because the information is already in the array. Note that I do not skip calling window_gettupleslot() and row_is_in_frame(), skip only calling ExecEvalExpr(), because whether a row is in a frame or not could be changing as the current row position moves while processing window functions. With this technique I observed around 40% speed up in my environment using the script attached, comparing with Oliver's v11 patch. v11: rows duration (msec) 1000 41.019 2000 148.957 3000 248.291 4000 442.478 5000 687.395 v12: rows duration (msec) 1000 27.515 2000 78.913 3000 174.737 4000 311.412 5000 482.156 The patch is now generated using the standard git format-patch. Also I have slightly adjusted the coding style so that it aligns with the one used in nodeWindowAgg.c, and ran pgindent. Note that I have not modified ignorenulls_getfuncarginpartition yet. I think we could optimize it using the not null info infrastructure as well. Will come up with it. Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp for i in 1000 2000 3000 4000 5000 do echo "$i rows: " pos=`expr $i / 2` psql -a test <<EOF \timing explain analyze SELECT x, nth_value(x,$pos) IGNORE NULLS OVER w FROM generate_series(1,$i) g(x) WINDOW w AS (ORDER BY x ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING); EOF done | egrep "rows:|Time:" | egrep -v "Planning|Execution"| sed -e 's/rows: *//' -e 's/Time: //' -e 's/ms//'
Вложения
В списке pgsql-hackers по дате отправления: