Обсуждение: query decorrelation in postgres

Поиск
Список
Период
Сортировка

query decorrelation in postgres

От
mahendra chavan
Дата:
Hello,<br /><br />I am a master's student in computer science at IIT Bombay. As part of my project, I need to get a
decorrelatedversion of a SQL query.  Please could anyone let me know if we have query decorrelation feature implemented
inpostgres ?<br /><br />Thanks,<br />Mahendra Chavan<br /><br /> 

Re: query decorrelation in postgres

От
Itagaki Takahiro
Дата:
mahendra chavan <mahcha@gmail.com> wrote:

> I am a master's student in computer science at IIT Bombay. As part of my
> project, I need to get a decorrelated version of a SQL query.  Please could
> anyone let me know if we have query decorrelation feature implemented in
> postgres ?

What do you mean by "query decorrelation"? Is it an addtional method for
query optimization? At least there is no word 'decorrelation' in
the postgres documentation.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




Re: query decorrelation in postgres

От
mahendra chavan
Дата:
I am sorry for not elaborating on that. What I meant by de-correlation was optimizing a query to get rid of sub-queirs by using joins.

eg. In the TPC-H schema, a query to find out the names of suppliers who supply parts having size  < 100

Query with nested subqueries:

SELECT 
    S_NAME
FROM
    SUPPLIER
WHERE
    S_SUPPKEY
    IN (
        SELECT
            PS_SUPPKEY
        FROM
            PARTSUPP
        WHERE
            PS_PARTKEY
                IN (
                    SELECT
                        P_PARTKEY
                    FROM
                        PART
                    WHERE
                        P_SIZE < 100
                )



Query with joins without subqueries:


SELECT
    S_NAME
FROM
    SUPPLIER
INNER JOIN
    PARTSUPP
ON
    S_SUPPKEY = PS_SUPPKEY
INNER JOIN
    PART
ON
    P_PARTKEY = PS_PARTKEY
WHERE
    P_SIZE < 100



Thanks,
Mahendra
 


On Thu, Jul 23, 2009 at 9:02 PM, Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp> wrote:

mahendra chavan <mahcha@gmail.com> wrote:

> I am a master's student in computer science at IIT Bombay. As part of my
> project, I need to get a decorrelated version of a SQL query.  Please could
> anyone let me know if we have query decorrelation feature implemented in
> postgres ?

What do you mean by "query decorrelation"? Is it an addtional method for
query optimization? At least there is no word 'decorrelation' in
the postgres documentation.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



Re: query decorrelation in postgres

От
"Kevin Grittner"
Дата:
mahendra chavan <mahcha@gmail.com> wrote: 
> What I meant by de-correlation was optimizing a query to get rid of
> sub-queirs by using joins.
> 
> eg. In the TPC-H schema, a query to find out the names of suppliers
> who supply parts having size  < 100
> 
> *Query with nested subqueries:*
> 
> SELECT
>     S_NAME
> FROM
>     SUPPLIER
> WHERE
>     S_SUPPKEY
>     IN (
>         SELECT
>             PS_SUPPKEY
>         FROM
>             PARTSUPP
>         WHERE
>             PS_PARTKEY
>                 IN (
>                     SELECT
>                         P_PARTKEY
>                     FROM
>                         PART
>                     WHERE
>                         P_SIZE < 100
>                 )
> 
> 
> 
> *Query with joins without subqueries:*
> 
> 
> SELECT
>     S_NAME
> FROM
>     SUPPLIER
> INNER JOIN
>     PARTSUPP
> ON
>     S_SUPPKEY = PS_SUPPKEY
> INNER JOIN
>     PART
> ON
>     P_PARTKEY = PS_PARTKEY
> WHERE
>     P_SIZE < 100
> 
Those two queries aren't exactly identical, because you could get
duplicate rows in the second which would not be there in the first. 
Optimizations to "pull up" subqueries into a higher level FROM clause
as joins have been in PostgreSQL for as long as I've been using it,
but the ability to do the specific optimization you show there
(without the duplicates) was added in version 8.4 using "semi-joins". 
I don't think any syntax was added to explicitly write a query using
semi-joins, though; it's an optimization of EXISTS, IN, ANY, etc.
To see how the planner chooses to execute a given query against a
particular schema which has a particular set of statistics about the
data distributions, use the EXPLAIN option.
http://www.postgresql.org/docs/8.4/interactive/sql-explain.html
-Kevin


Re: query decorrelation in postgres

От
mahendra chavan
Дата:

Thank you for you response.  I was looking for a query rewriting mechanism which would be outside the optimizer and will do this kind of transformations at the query level.

~Mahendra


On Fri, Jul 24, 2009 at 7:32 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
mahendra chavan <mahcha@gmail.com> wrote:
> What I meant by de-correlation was optimizing a query to get rid of
> sub-queirs by using joins.
>
> eg. In the TPC-H schema, a query to find out the names of suppliers
> who supply parts having size  < 100
>
> *Query with nested subqueries:*
>
> SELECT
>     S_NAME
> FROM
>     SUPPLIER
> WHERE
>     S_SUPPKEY
>     IN (
>         SELECT
>             PS_SUPPKEY
>         FROM
>             PARTSUPP
>         WHERE
>             PS_PARTKEY
>                 IN (
>                     SELECT
>                         P_PARTKEY
>                     FROM
>                         PART
>                     WHERE
>                         P_SIZE < 100
>                 )
>
>
>
> *Query with joins without subqueries:*
>
>
> SELECT
>     S_NAME
> FROM
>     SUPPLIER
> INNER JOIN
>     PARTSUPP
> ON
>     S_SUPPKEY = PS_SUPPKEY
> INNER JOIN
>     PART
> ON
>     P_PARTKEY = PS_PARTKEY
> WHERE
>     P_SIZE < 100
>

Those two queries aren't exactly identical, because you could get
duplicate rows in the second which would not be there in the first.
Optimizations to "pull up" subqueries into a higher level FROM clause
as joins have been in PostgreSQL for as long as I've been using it,
but the ability to do the specific optimization you show there
(without the duplicates) was added in version 8.4 using "semi-joins".
I don't think any syntax was added to explicitly write a query using
semi-joins, though; it's an optimization of EXISTS, IN, ANY, etc.

To see how the planner chooses to execute a given query against a
particular schema which has a particular set of statistics about the
data distributions, use the EXPLAIN option.

http://www.postgresql.org/docs/8.4/interactive/sql-explain.html

-Kevin