Обсуждение: Re: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

Поиск
Список
Период
Сортировка

Re: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

От
Bruce Momjian
Дата:
[Charset iso-8859-1 unsupported, filtering to ASCII...]
> > It currently convert to CNF so it can select the most restrictive
> > restriction and join, and use those first.  However, the CNF conversion
> > is a memory exploder for some queries, and we certainly need to have
> > another method to split up those queries into UNIONS.  I think we need
> > to code to identify those queries capable of being converted to UNIONS,
> > and do that before the query gets to the CNF section.  That would be
> > great, and David Hartwig has implemented a limited capability of doing
> > this, but we really need a general routine to do this with 100%
> > reliability.
>
> Well, if you're talking about a routine to generate a heuristic for CNF vs.
> DNF, it is possible to precalculate the query sizes for CNF and DNF
> rewrites...
>
> For conversion to CNF:
>
> At every node:
>
> if nodeType = AND then f(node) = f(left) + f(right)
> if nodeType = OR then f(node) = f(left) * f(right)
>
> f(root) = a reasonably (but not wonderful) metric
>
> For DNF just switch AND and OR in the above. You may want to compute both
> metrics and compare... take the smaller one and use that path.
>
> How to deal with other operators depends on their implementation...

[Moved to Hackers list.]

This is interesting.  Check CNF size and DNF size.  Choose smallest.
CNF uses existing code, DNF converts to UNIONs.  How do you return the
proper rows with/without proper duplicates?

i.e.

    SELECT * FROM tab1 WHERE x > 1 or x > 2

We need to return all rows where x > 1, even if some there are indentical
rows in tab1.

What I do in the index OR code is to test that rows in index matches
found in 2nd and 3rd index scans are false in earlier index scans.  I am
not sure how to do that with a UNION query, but it may be possible.

We currently have UNION and UNION ALL, and I think we may need a new
UNION type internally to prevent 2nd and 3rd queries from returning rows
returned by earlier UNION queries.

This is interesting.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026


RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

От
"Taral"
Дата:
> This is interesting.  Check CNF size and DNF size.  Choose smallest.
> CNF uses existing code, DNF converts to UNIONs.  How do you return the
> proper rows with/without proper duplicates?

Create a temporary oid hash? (for each table selected on, I guess)

Taral

Re: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

От
Bruce Momjian
Дата:
[Charset iso-8859-1 unsupported, filtering to ASCII...]
> > This is interesting.  Check CNF size and DNF size.  Choose smallest.
> > CNF uses existing code, DNF converts to UNIONs.  How do you return the
> > proper rows with/without proper duplicates?
>
> Create a temporary oid hash? (for each table selected on, I guess)
>
> Taral
>

What I did with indexes was to run the previous OR clause index
restrictions through the qualification code, and make sure it failed,
but I am not sure how that is going to work with a more complex WHERE
clause.  Perhaps I need to restrict this to just simple cases of
constants, which are easy to pick out an run through.  Doing this with
joins would be very hard, I think.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026


RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

От
"Taral"
Дата:
> > Create a temporary oid hash? (for each table selected on, I guess)
>
> What I did with indexes was to run the previous OR clause index
> restrictions through the qualification code, and make sure it failed,
> but I am not sure how that is going to work with a more complex WHERE
> clause.  Perhaps I need to restrict this to just simple cases of
> constants, which are easy to pick out an run through.  Doing this with
> joins would be very hard, I think.

Actually, I was thinking more of an index of returned rows... After each
subquery, the backend would check each row to see if it was already in the
index... Simple duplicate check, in other words. Of course, I don't know how
well this would behave with large tables being returned...

Anyone else have some ideas they want to throw in?

Taral


Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

От
jwieck@debis.com (Jan Wieck)
Дата:
>
> > > Create a temporary oid hash? (for each table selected on, I guess)
> >
> > What I did with indexes was to run the previous OR clause index
> > restrictions through the qualification code, and make sure it failed,
> > but I am not sure how that is going to work with a more complex WHERE
> > clause.  Perhaps I need to restrict this to just simple cases of
> > constants, which are easy to pick out an run through.  Doing this with
> > joins would be very hard, I think.
>
> Actually, I was thinking more of an index of returned rows... After each
> subquery, the backend would check each row to see if it was already in the
> index... Simple duplicate check, in other words. Of course, I don't know how
> well this would behave with large tables being returned...
>
> Anyone else have some ideas they want to throw in?
>
> Taral
>

    But what about unions of join queries? Which OID's then should
    be checked against which? And unions from view selects? There
    are no OID's at all after rewriting.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

От
Bruce Momjian
Дата:
[Charset iso-8859-1 unsupported, filtering to ASCII...]
> > > Create a temporary oid hash? (for each table selected on, I guess)
> >
> > What I did with indexes was to run the previous OR clause index
> > restrictions through the qualification code, and make sure it failed,
> > but I am not sure how that is going to work with a more complex WHERE
> > clause.  Perhaps I need to restrict this to just simple cases of
> > constants, which are easy to pick out an run through.  Doing this with
> > joins would be very hard, I think.
>
> Actually, I was thinking more of an index of returned rows... After each
> subquery, the backend would check each row to see if it was already in the
> index... Simple duplicate check, in other words. Of course, I don't know how
> well this would behave with large tables being returned...
>
> Anyone else have some ideas they want to throw in?

I certainly think we are heading in the direction for a good general
solution.


--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026


Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

От
Bruce Momjian
Дата:
> >
> > > > Create a temporary oid hash? (for each table selected on, I guess)
> > >
> > > What I did with indexes was to run the previous OR clause index
> > > restrictions through the qualification code, and make sure it failed,
> > > but I am not sure how that is going to work with a more complex WHERE
> > > clause.  Perhaps I need to restrict this to just simple cases of
> > > constants, which are easy to pick out an run through.  Doing this with
> > > joins would be very hard, I think.
> >
> > Actually, I was thinking more of an index of returned rows... After each
> > subquery, the backend would check each row to see if it was already in the
> > index... Simple duplicate check, in other words. Of course, I don't know how
> > well this would behave with large tables being returned...
> >
> > Anyone else have some ideas they want to throw in?
> >
> > Taral
> >
>
>     But what about unions of join queries? Which OID's then should
>     be checked against which? And unions from view selects? There
>     are no OID's at all after rewriting.

Yep, you can't just use oid's, I think.  Joins and specifiying a table
multiple times using a table alias would break this anyway.

CNF'ify only goes through the tables once, so we somehow need to
simulate this.  Perhaps we can restrict the kinds of queries used for
DNF so we can do this easily.

Another idea is that we rewrite queries such as:

    SELECT *
    FROM tab
    WHERE (a=1 AND b=2 AND c=3) OR
          (a=1 AND b=2 AND c=4) OR
          (a=1 AND b=2 AND c=5) OR
          (a=1 AND b=2 AND c=6)

into:

    SELECT *
    FROM tab
    WHERE (a=1 AND b=2) AND (c=3 OR c=4 OR c=5 OR c=6)

and we do this BEFORE calling cnfify().  How much does this do for us?

Seems this would not be too hard, and would be a good performer.


You could even convert:

    SELECT *
    FROM tab
    WHERE (a=1 AND b=2 AND c=3) OR
          (a=1 AND b=2 AND c=4) OR
          (a=1 AND b=52 AND c=5) OR
          (a=1 AND b=52 AND c=6)

into:

    SELECT *
    FROM tab
    WHERE ((a=1 AND b=2) AND (c=3 OR c=4)) OR
    WHERE ((a=1 AND b=52) AND (c=5 OR c=6))

This should work OK too.  Someone want to try this?  David, is this what
your code does?

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026


RE: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

От
"Taral"
Дата:
> Another idea is that we rewrite queries such as:
>
>     SELECT *
>     FROM tab
>     WHERE (a=1 AND b=2 AND c=3) OR
>           (a=1 AND b=2 AND c=4) OR
>           (a=1 AND b=2 AND c=5) OR
>           (a=1 AND b=2 AND c=6)
>
> into:
>
>     SELECT *
>     FROM tab
>     WHERE (a=1 AND b=2) AND (c=3 OR c=4 OR c=5 OR c=6)

Very nice, but that's like trying to code factorization of numbers... not
pretty, and very CPU intensive on complex queries...

Taral


Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)

От
Bruce Momjian
Дата:
[Charset iso-8859-1 unsupported, filtering to ASCII...]
> > Another idea is that we rewrite queries such as:
> >
> >     SELECT *
> >     FROM tab
> >     WHERE (a=1 AND b=2 AND c=3) OR
> >           (a=1 AND b=2 AND c=4) OR
> >           (a=1 AND b=2 AND c=5) OR
> >           (a=1 AND b=2 AND c=6)
> >
> > into:
> >
> >     SELECT *
> >     FROM tab
> >     WHERE (a=1 AND b=2) AND (c=3 OR c=4 OR c=5 OR c=6)
>
> Very nice, but that's like trying to code factorization of numbers... not
> pretty, and very CPU intensive on complex queries...

Yes, but how large are the WHERE clauses going to be?  Considering the
cost of cnfify() and UNION, it seems like a clear win.  Is it general
enough to solve our problems?

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026