BUG #18399: Query plan optimization results in runtime error when hoisting cast from inside subquery
От | PG Bug reporting form |
---|---|
Тема | BUG #18399: Query plan optimization results in runtime error when hoisting cast from inside subquery |
Дата | |
Msg-id | 18399-17778030a1971177@postgresql.org обсуждение исходный текст |
Ответы |
Re: BUG #18399: Query plan optimization results in runtime error when hoisting cast from inside subquery
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 18399 Logged by: Sawyer Knoblich Email address: scknoblich@gmail.com PostgreSQL version: 16.2 Operating system: Docker image on macOS Description: Hello, I have the following standalone query that operates on a dataset with a text column and attempts to find all rows where that column's text is an integer that is under a certain value: ``` with raw_data(id, value) as (values (1, '1900'), (2, '2100'), (3, 'abc')), integers as (select value::integer from raw_data where id in (select id from raw_data where value ~ '^\d+$')) select count(*) from integers where value < 2000; ``` In this query the "integers" CTE attempts to find all rows where the "value" column is able to be converted to an integer (using a subquery) and performs that conversion, and then the main query does a simple where + count to produce the required dataset. With this query my expectation was that the main query would only be operating on values that had already been converted to integers. Running the query as-is results in `[22P02] ERROR: invalid input syntax for type integer: "abc"`. From the query plan I can see that the optimizer has chosen to directly replace `q.value` with `value::integer` in the outermost where condition to be evaluated independently from the regex match, which ends up attempting the "abc"::integer cast before its row is able to be filtered out by the CTE's own condition: ``` Aggregate (cost=0.20..0.21 rows=1 width=8) CTE raw_data -> Values Scan on "*VALUES*" (cost=0.00..0.04 rows=3 width=36) -> Nested Loop Semi Join (cost=0.00..0.16 rows=1 width=0) Join Filter: (raw_data.id = raw_data_1.id) -> CTE Scan on raw_data (cost=0.00..0.08 rows=1 width=4) Filter: ((value)::integer < 2000) -> CTE Scan on raw_data raw_data_1 (cost=0.00..0.07 rows=1 width=4) Filter: (value ~ '^\d+$'::text) ``` I have also reproduced this behavior when using a different condition on the main query such as a simple null check, when selecting from a larger persisted table instead of the "raw_data" CTE used here for simplicity, and when using a subquery instead of the "integers" CTE. I did find that if I remove the subquery inside the CTE's where clause and instead apply the regex directly to the "value" column then I get the following query plan which succeeds with no issues: ``` with raw_data(id, value) as (values (1, '1900'), (2, '2100'), (3, 'abc')), integers as (select value::integer from raw_data where value ~ '^\d+$') select count(*) from integers where value < 2000; ---------- Aggregate (cost=0.07..0.08 rows=1 width=8) -> Values Scan on "*VALUES*" (cost=0.00..0.07 rows=1 width=0) Filter: ((column2 ~ '^\d+$'::text) AND ((column2)::integer < 2000)) ``` My expectation was that the query should behave as if the CTE (or subquery if I had used that instead) has fully executed before moving on to the main query and this behavior breaks that assumption (which may or may not be correct). I attempted to search the docs for anything related to this, either CTE/subquery execution order guarantees or hoisting of casts, but I couldn't find anything about it. If this turns out not to a bug and is just expected behavior with existing documentation could you please link me to the relevant page for future reference?
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Thomas MunroДата:
Сообщение: Re: Regression tests fail with musl libc because libpq.so can't be loaded
Следующее
От: "Hayato Kuroda (Fujitsu)"Дата:
Сообщение: RE: Potential data loss due to race condition during logical replication slot creation