Обсуждение: query decorrelation in postgres
Hello,<br /><br />I am a master's student in computer science at IIT Bombay. As part of my project, I need to get a decorrelatedversion of a SQL query. Please could anyone let me know if we have query decorrelation feature implemented inpostgres ?<br /><br />Thanks,<br />Mahendra Chavan<br /><br />
mahendra chavan <mahcha@gmail.com> wrote: > I am a master's student in computer science at IIT Bombay. As part of my > project, I need to get a decorrelated version of a SQL query. Please could > anyone let me know if we have query decorrelation feature implemented in > postgres ? What do you mean by "query decorrelation"? Is it an addtional method for query optimization? At least there is no word 'decorrelation' in the postgres documentation. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
I am sorry for not elaborating on that. What I meant by de-correlation was optimizing a query to get rid of sub-queirs by using joins.
eg. In the TPC-H schema, a query to find out the names of suppliers who supply parts having size < 100
Query with nested subqueries:
SELECT
S_NAME
FROM
SUPPLIER
WHERE
S_SUPPKEY
IN (
SELECT
PS_SUPPKEY
FROM
PARTSUPP
WHERE
PS_PARTKEY
IN (
SELECT
P_PARTKEY
FROM
PART
WHERE
P_SIZE < 100
)
Query with joins without subqueries:
SELECT
S_NAME
FROM
SUPPLIER
INNER JOIN
PARTSUPP
ON
S_SUPPKEY = PS_SUPPKEY
INNER JOIN
PART
ON
P_PARTKEY = PS_PARTKEY
WHERE
P_SIZE < 100
Thanks,
Mahendra
eg. In the TPC-H schema, a query to find out the names of suppliers who supply parts having size < 100
Query with nested subqueries:
SELECT
S_NAME
FROM
SUPPLIER
WHERE
S_SUPPKEY
IN (
SELECT
PS_SUPPKEY
FROM
PARTSUPP
WHERE
PS_PARTKEY
IN (
SELECT
P_PARTKEY
FROM
PART
WHERE
P_SIZE < 100
)
Query with joins without subqueries:
SELECT
S_NAME
FROM
SUPPLIER
INNER JOIN
PARTSUPP
ON
S_SUPPKEY = PS_SUPPKEY
INNER JOIN
PART
ON
P_PARTKEY = PS_PARTKEY
WHERE
P_SIZE < 100
Thanks,
Mahendra
On Thu, Jul 23, 2009 at 9:02 PM, Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp> wrote:
What do you mean by "query decorrelation"? Is it an addtional method for
mahendra chavan <mahcha@gmail.com> wrote:
> I am a master's student in computer science at IIT Bombay. As part of my
> project, I need to get a decorrelated version of a SQL query. Please could
> anyone let me know if we have query decorrelation feature implemented in
> postgres ?
query optimization? At least there is no word 'decorrelation' in
the postgres documentation.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
mahendra chavan <mahcha@gmail.com> wrote: > What I meant by de-correlation was optimizing a query to get rid of > sub-queirs by using joins. > > eg. In the TPC-H schema, a query to find out the names of suppliers > who supply parts having size < 100 > > *Query with nested subqueries:* > > SELECT > S_NAME > FROM > SUPPLIER > WHERE > S_SUPPKEY > IN ( > SELECT > PS_SUPPKEY > FROM > PARTSUPP > WHERE > PS_PARTKEY > IN ( > SELECT > P_PARTKEY > FROM > PART > WHERE > P_SIZE < 100 > ) > > > > *Query with joins without subqueries:* > > > SELECT > S_NAME > FROM > SUPPLIER > INNER JOIN > PARTSUPP > ON > S_SUPPKEY = PS_SUPPKEY > INNER JOIN > PART > ON > P_PARTKEY = PS_PARTKEY > WHERE > P_SIZE < 100 > Those two queries aren't exactly identical, because you could get duplicate rows in the second which would not be there in the first. Optimizations to "pull up" subqueries into a higher level FROM clause as joins have been in PostgreSQL for as long as I've been using it, but the ability to do the specific optimization you show there (without the duplicates) was added in version 8.4 using "semi-joins". I don't think any syntax was added to explicitly write a query using semi-joins, though; it's an optimization of EXISTS, IN, ANY, etc. To see how the planner chooses to execute a given query against a particular schema which has a particular set of statistics about the data distributions, use the EXPLAIN option. http://www.postgresql.org/docs/8.4/interactive/sql-explain.html -Kevin
Thank you for you response. I was looking for a query rewriting mechanism which would be outside the optimizer and will do this kind of transformations at the query level.
~Mahendra
On Fri, Jul 24, 2009 at 7:32 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
mahendra chavan <mahcha@gmail.com> wrote:Those two queries aren't exactly identical, because you could get> What I meant by de-correlation was optimizing a query to get rid of
> sub-queirs by using joins.
>
> eg. In the TPC-H schema, a query to find out the names of suppliers
> who supply parts having size < 100
>
> *Query with nested subqueries:*
>
> SELECT
> S_NAME
> FROM
> SUPPLIER
> WHERE
> S_SUPPKEY
> IN (
> SELECT
> PS_SUPPKEY
> FROM
> PARTSUPP
> WHERE
> PS_PARTKEY
> IN (
> SELECT
> P_PARTKEY
> FROM
> PART
> WHERE
> P_SIZE < 100
> )
>
>
>
> *Query with joins without subqueries:*
>
>
> SELECT
> S_NAME
> FROM
> SUPPLIER
> INNER JOIN
> PARTSUPP
> ON
> S_SUPPKEY = PS_SUPPKEY
> INNER JOIN
> PART
> ON
> P_PARTKEY = PS_PARTKEY
> WHERE
> P_SIZE < 100
>
duplicate rows in the second which would not be there in the first.
Optimizations to "pull up" subqueries into a higher level FROM clause
as joins have been in PostgreSQL for as long as I've been using it,
but the ability to do the specific optimization you show there
(without the duplicates) was added in version 8.4 using "semi-joins".
I don't think any syntax was added to explicitly write a query using
semi-joins, though; it's an optimization of EXISTS, IN, ANY, etc.
To see how the planner chooses to execute a given query against a
particular schema which has a particular set of statistics about the
data distributions, use the EXPLAIN option.
http://www.postgresql.org/docs/8.4/interactive/sql-explain.html
-Kevin