Обсуждение: Direct XML interfaces to optimizer and even executor?

Поиск
Список
Период
Сортировка

Direct XML interfaces to optimizer and even executor?

От
Gunther Schadow
Дата:
Hi,

I posted this earlier to general but got zero response. So, I'm
directing it to a more specific group.

I'm wondering about alternative interfaces to a PostgreSQL data base
backend. Interfaces other than the SQL loanguage. I know one can map
many things to SQL, but let me think outside the box for just a
moment:

- Sending a parse tree in XML for processing by the optimizer.   This circumvents the SQL language and avoids the kinds
of  syntactic ideosyncrasies of SQL (e.g., where you put commas.)   This is fairly trivial, but of course the question
is,would   it be worth it?
 

- Sending an execution plan in XML directly to the executor.   This now circumvents the SQL parser and optimizer. I
knowthis   is in a way against the relational doxology and I don't take that   light-heartedly. However, isn't it true
thatmost optimizers   cannot deal very well with more than 6 joins? I may be wrong,   but I find myself spending quite
abit of time fighting with the   Oracle or PostgreSQL optimizer to convince it to choose the plan   that I want. There
isso much magic to it with hints and the   way you write SQL (where in relational theory the expressions are
equivalent,they make huge difference in what plan is being   generated.) So, it appears to me almost easier to just
senda   plan directly and have the system execute that plan.
 

- These direct interfaces could be a nice way to experiment with   new strategies without having to implement it on all
three  layers (SQL language, optimizer, and executor.)
 

You noticed I sneaked in XML as the interface, and that would be
neat because with XSLT it's so easy to manipulate. But I'm also
thinking about a Prolog binding or constraint logic programming
binding, that might be better optimizeable if it goes through a
more direct path than SQL.

I'd appreciate comments on this, even if you think it is a
worthless idea (of course would be grateful about hearing why.)

thanks,
-Gunther

-- 
Gunther Schadow, M.D., Ph.D.                    gschadow@regenstrief.org
Medical Information Scientist      Regenstrief Institute for Health Care
Adjunct Assistant Professor        Indiana University School of Medicine
tel:1(317)630-7960                         http://aurora.regenstrief.org



---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: Direct XML interfaces to optimizer and even executor?

От
alex@pilosoft.com
Дата:
On Thu, 30 May 2002, Gunther Schadow wrote:

> - Sending a parse tree in XML for processing by the optimizer.
>     This circumvents the SQL language and avoids the kinds of
>     syntactic ideosyncrasies of SQL (e.g., where you put commas.)
>     This is fairly trivial, but of course the question is, would
>     it be worth it?
_WHY_?  Instead of replacing SQL, learn SQL good enough so you won't need 
to. ;)

> - Sending an execution plan in XML directly to the executor.
>     This now circumvents the SQL parser and optimizer. I know this
>     is in a way against the relational doxology and I don't take that
>     light-heartedly. However, isn't it true that most optimizers
>     cannot deal very well with more than 6 joins? I may be wrong,
>     but I find myself spending quite a bit of time fighting with the
>     Oracle or PostgreSQL optimizer to convince it to choose the plan
>     that I want. There is so much magic to it with hints and the
>     way you write SQL (where in relational theory the expressions are
>     equivalent, they make huge difference in what plan is being
>     generated.) So, it appears to me almost easier to just send a
>     plan directly and have the system execute that plan.
_WHY_? Instead of replacing optimizer, either fix it so magic/hints are 
not needed or learn to write better queries ;)


PS: In my experience, optimizer is performing nearly-perfectly, given 
regular vacuum analyz'ing.

-alex



Re: Direct XML interfaces to optimizer and even executor?

От
Tom Lane
Дата:
Gunther Schadow <gunther@aurora.regenstrief.org> writes:
> - Sending a parse tree in XML for processing by the optimizer.
> - Sending an execution plan in XML directly to the executor.

Supporting this would imply freezing our internal representations of
parse and plan trees (maybe we could move some bits around a little,
but we couldn't make large changes without breaking the XML API).
Like Alex, I don't see any gain that's likely to be worth that price;
especially not when there are so many things I'm unhappy about in both
representations.  There are TODO items to revamp both representations
in major ways.
        regards, tom lane


Re: Direct XML interfaces to optimizer and even executor?

От
Gunther Schadow
Дата:
Tom Lane wrote:

> Gunther Schadow <gunther@aurora.regenstrief.org> writes:
> 
>>- Sending a parse tree in XML for processing by the optimizer.
>>- Sending an execution plan in XML directly to the executor.
>>
> 
> Supporting this would imply freezing our internal representations of
> parse and plan trees (maybe we could move some bits around a little,
> but we couldn't make large changes without breaking the XML API).
> Like Alex, I don't see any gain that's likely to be worth that price;
> especially not when there are so many things I'm unhappy about in both
> representations.  There are TODO items to revamp both representations
> in major ways.


I understand, thanks for responding.

Not to belabor that point too much, but the potential for change
would not necessarily be a live or die issue. Everybody would know
that the direct interface is not a standard and could change
drastically at any time. I guess the bigger problem might be with
documenting these changes other than in comments inside code.
You don't want to have people depend on something they shouldn't
and then get angry about you making changes.

I can understand Alex' lapidar response, but from what I read in the
various tuning guides, all SQL optimizers have their ideosynchrasies.
And the question I'm really still not sure about is whether in many
cases in applications you know the best access plan better than you
know how to convince the optimizer to picking it (all vacuuming etc.
granted.) Or, even worse, you may have to at runtime use application
level knowledge to choose an access plan (and then change your
SQL in cryptic ways to convey your desire to the optimizer.) Besides,
if the optimizer can change, the hacks one employs for convincing
it on the SQL level might change too.

For example, we are dealing with research queries to huge tables
(>200 million rows) of patient observations. The best choice of access
plan needs to consider various statistics that a simple histogram
won't cover. I.e., you may want to know what the likelyhood is for
any patient to have an observation of a certain kind and then use
that as your primary criterion to cut down on the size of your
result set as early as possible. For instance, one of our queries
might be "find all patients with a colorectal cancer and hemoglobin
greater than 15 mg/dL." It is extremely critical to first look for the
cancer patients (using an index) and then constrain those with the
hemoglobin, because just about every patient has hemoglobins whereas
cancer is much rarer. So, if you went through hemoglobin you'd end
up doing a complete table scan. Certainly, a simple histogram might
catch that, but there are confounding factors. Since we are searching
for patients, not for individual values, we want to use those
constraints first that have the highest selectivity for patients, not
necessarily for records in the observation table.

These and other issues are all nicely tweakable by SQL if you have
static queries. But if the queries can be in zillions of combinations,
the problem can't be solved by massaging every single SQL query.
(And yes, the problem of n-way joins with n > 6, 7, 8, etc. is very
much a possibility.)

So, Alex suggests to improve the optimizer. That's a thought, however,
it is a much heavier task to do that in general (abstract from the
kind of application domain knowledge) AND it is harder to do that
directly in the backend code than it would be if one could just
send an execution plan.)


Another use case for this that I am considering is that we are
developing a massive distributed querying facility that deal with
horizontal partitioning of data and wants to maintain privacy.
It would be great of course if postgersql had a distributed
database query interface, however, it is much more difficult a
task to implementing this so general that it could be part of the
standard postgresql code than it would be if it was on an experimental
client that directly feeds access plans into the server.

regards
-Gunther



-- 
Gunther Schadow, M.D., Ph.D.                    gschadow@regenstrief.org
Medical Information Scientist      Regenstrief Institute for Health Care
Adjunct Assistant Professor        Indiana University School of Medicine
tel:1(317)630-7960                         http://aurora.regenstrief.org




Re: Direct XML interfaces to optimizer and even executor?

От
Tom Lane
Дата:
Gunther Schadow <gunther@aurora.regenstrief.org> writes:
> These and other issues are all nicely tweakable by SQL if you have
> static queries. But if the queries can be in zillions of combinations,
> the problem can't be solved by massaging every single SQL query.
> (And yes, the problem of n-way joins with n > 6, 7, 8, etc. is very
> much a possibility.)

If the queries vary that much, it's tough to believe that you can invent
optimal plans for them more easily than the optimizer can.

But on a more global level: sure the optimizer has shortcomings --- but
I'd rather put my development effort into fixing those shortcomings than
into designing, writing, and maintaining an API that shortcircuits the
optimizer.  The costs of having such a feature are not small IMHO.  Nor
are the costs of using it on the application side small: you'd have to
write significant code to produce plans that are both nontrivial and
better than the optimizer can do.  And then maintain it in the face of
significant version-to-version changes in the API you're using (and in
the underlying facts of what the system can do).

ISTM you have essentially waved your hands and claimed you could write
a nearly-general-purpose application-side planner that will outperform
PG's planner.  I'd rather see *you* put that effort into helping fix
PG's planner ;-)
        regards, tom lane