Postgres_fdw join pushdown - wrong results with whole-row reference

Поиск
Список
Период
Сортировка
От Rushabh Lathia
Тема Postgres_fdw join pushdown - wrong results with whole-row reference
Дата
Msg-id CAGPqQf3sS_SuaDR1FuMhDCt1v8COaKvt1jLk2hwRBKrvjqXQ-Q@mail.gmail.com
обсуждение исходный текст
Ответы Re: Postgres_fdw join pushdown - wrong results with whole-row reference  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Список pgsql-hackers
Below query returns the wrong result when join getting pushdown to the remote
server.

(PFA fdw_setup.sql, to create objects for the test)

postgres=# select e, e.empno, d.deptno, d.dname from f_emp e left join f_dept d on e.deptno = d.deptno and e.sal > 3000 order by 2, 3 limit 10;                             e                             | empno | deptno |   dname    
-----------------------------------------------------------+-------+--------+------------
                                                           |  7369 |        | 
 (7499,ALLEN,SALESMAN,7698,1981-02-20,1600.00,300.00,30)   |  7499 |        | 
 (7521,WARD,SALESMAN,7698,1981-02-22,1250.00,500.00,30)    |  7521 |        | 
                                                           |  7566 |        | 
 (7654,MARTIN,SALESMAN,7698,1981-09-28,1250.00,1400.00,30) |  7654 |        | 
                                                           |  7698 |        | 
                                                           |  7782 |        | 
                                                           |  7788 |        | 
                                                           |  7839 |     10 | ACCOUNTING
 (7844,TURNER,SALESMAN,7698,1981-09-08,1500.00,0.00,30)    |  7844 |        | 
(10 rows)


Here, wholerow is coming as NULL even though with non-null empno. If we remove
limit clause from the query - that will not push the query to the remote side
and in such case getting correct output.

postgres=# select e, e.empno, d.deptno, d.dname from f_emp e left join f_dept d on e.deptno = d.deptno and e.sal > 3000 order by 2, 3;
                             e                             | empno | deptno |   dname    
-----------------------------------------------------------+-------+--------+------------
 (7369,SMITH,CLERK,7902,1980-12-17,800.00,,20)             |  7369 |        | 
 (7499,ALLEN,SALESMAN,7698,1981-02-20,1600.00,300.00,30)   |  7499 |        | 
 (7521,WARD,SALESMAN,7698,1981-02-22,1250.00,500.00,30)    |  7521 |        | 
 (7566,JONES,MANAGER,7839,1981-04-02,2975.00,,20)          |  7566 |        | 
 (7654,MARTIN,SALESMAN,7698,1981-09-28,1250.00,1400.00,30) |  7654 |        | 
 (7698,BLAKE,MANAGER,7839,1981-05-01,2850.00,,30)          |  7698 |        | 
 (7782,CLARK,MANAGER,7839,1981-06-09,2450.00,,10)          |  7782 |        | 
 (7788,SCOTT,ANALYST,7566,1987-04-19,3000.00,,20)          |  7788 |        | 
 (7839,KING,PRESIDENT,,1981-11-17,5000.00,,10)             |  7839 |     10 | ACCOUNTING
 (7844,TURNER,SALESMAN,7698,1981-09-08,1500.00,0.00,30)    |  7844 |        | 
 (7876,ADAMS,CLERK,7788,1987-05-23,1100.00,,20)            |  7876 |        | 
 (7900,JAMES,CLERK,7698,1981-12-03,950.00,,30)             |  7900 |        | 
 (7902,FORD,ANALYST,7566,1981-12-03,3000.00,,20)           |  7902 |        | 
 (7934,MILLER,CLERK,7782,1982-01-23,1300.00,,10)           |  7934 |        | 
(14 rows)

Explain verbose output for the query with LIMIT clause is:

postgres=# explain verbose select e, e.empno, d.deptno, d.dname from f_emp e left join f_dept d on e.deptno = d.deptno and e.sal > 3000 order by 2, 3 limit 10;
                                                                                                                                                              
        QUERY PLAN                                                                                                                                            
                           
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------
 Limit  (cost=100.00..136.86 rows=10 width=236)
   Output: e.*, e.empno, d.deptno, d.dname
   ->  Foreign Scan  (cost=100.00..2304.10 rows=598 width=236)
         Output: e.*, e.empno, d.deptno, d.dname
         Relations: (public.f_emp e) LEFT JOIN (public.f_dept d)
         Remote SQL: SELECT CASE WHEN r1.* IS NOT NULL THEN ROW(r1.empno, r1.ename, r1.job, r1.mgr, r1.hiredate, r1.sal, r1.comm, r1.deptno) END, r1.empno, r2
.deptno, r2.dname FROM (public.emp r1 LEFT JOIN public.dept r2 ON (((r1.sal > 3000::numeric)) AND ((r1.deptno = r2.deptno)))) ORDER BY r1.empno ASC NULLS LAST
, r2.deptno ASC NULLS LAST
(6 rows)

Further looking I found that here problem is because we converting wholerow
reference with ROW - and binding it with CASE clause.

So, in the above example reference to "r" is converted with
"CASE WHEN r1.* IS NOT NULL THEN ROW(r1.empno, r1.ename, r1.job, r1.mgr, r1.hiredate, r1.sal, r1.comm, r1.deptno) END"

Here r1.* IS NOT NULL is behaving strange, it return TRUE only when all the
elements in the wholerow is NOT NULL. 

Example with normal table (not postgres_fdw involded):

postgres=# select r, r.* is null as isnull, r.* is not null as isnotnull from emp r;
                             r                             | isnull | isnotnull 
-----------------------------------------------------------+--------+-----------
 (7369,SMITH,CLERK,7902,1980-12-17,800.00,,20)             | f      | f
 (7499,ALLEN,SALESMAN,7698,1981-02-20,1600.00,300.00,30)   | f      | t
 (7521,WARD,SALESMAN,7698,1981-02-22,1250.00,500.00,30)    | f      | t
 (7566,JONES,MANAGER,7839,1981-04-02,2975.00,,20)          | f      | f
 (7654,MARTIN,SALESMAN,7698,1981-09-28,1250.00,1400.00,30) | f      | t
 (7698,BLAKE,MANAGER,7839,1981-05-01,2850.00,,30)          | f      | f
 (7782,CLARK,MANAGER,7839,1981-06-09,2450.00,,10)          | f      | f
 (7788,SCOTT,ANALYST,7566,1987-04-19,3000.00,,20)          | f      | f
 (7839,KING,PRESIDENT,,1981-11-17,5000.00,,10)             | f      | f
 (7844,TURNER,SALESMAN,7698,1981-09-08,1500.00,0.00,30)    | f      | t
 (7876,ADAMS,CLERK,7788,1987-05-23,1100.00,,20)            | f      | f
 (7900,JAMES,CLERK,7698,1981-12-03,950.00,,30)             | f      | f
 (7902,FORD,ANALYST,7566,1981-12-03,3000.00,,20)           | f      | f
 (7934,MILLER,CLERK,7782,1982-01-23,1300.00,,10)           | f      | f
(14 rows)

Now I was under impression the IS NOT NULL should be always in inverse of
IS NULL, but clearly here its not the case with wholerow. But further looking at
the document its saying different thing for wholerow:


Note: If the expression is row-valued, then IS NULL is true when the row expression
itself is null or when all the row's fields are null, while IS NOT NULL is true
when the row expression itself is non-null and all the row's fields are non-null.
Because of this behavior, IS NULL and IS NOT NULL do not always return inverse
results for row-valued expressions, i.e., a row-valued expression that contains
both NULL and non-null values will return false for both tests. This definition
conforms to the SQL standard, and is a change from the inconsistent behavior
exhibited by PostgreSQL versions prior to 8.2.


And as above documentation clearly says that IS NULL and IS NOT NULL do not
always return inverse results for row-valued expressions. So need to change the
deparse logic into postgres_fdw - how ? May be to use IS NULL rather then IS
NOT NULL?

Input/thought?

Regards
Rushabh Lathia
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Reviewing freeze map code
Следующее
От: Ashutosh Bapat
Дата:
Сообщение: Re: Declarative partitioning