Re: Direct XML interfaces to optimizer and even executor?
От | Gunther Schadow |
---|---|
Тема | Re: Direct XML interfaces to optimizer and even executor? |
Дата | |
Msg-id | 3CF6B49D.6000009@aurora.regenstrief.org обсуждение исходный текст |
Ответ на | Direct XML interfaces to optimizer and even executor? (Gunther Schadow <gunther@aurora.regenstrief.org>) |
Ответы |
Re: Direct XML interfaces to optimizer and even executor?
|
Список | pgsql-interfaces |
Tom Lane wrote: > Gunther Schadow <gunther@aurora.regenstrief.org> writes: > >>- Sending a parse tree in XML for processing by the optimizer. >>- Sending an execution plan in XML directly to the executor. >> > > Supporting this would imply freezing our internal representations of > parse and plan trees (maybe we could move some bits around a little, > but we couldn't make large changes without breaking the XML API). > Like Alex, I don't see any gain that's likely to be worth that price; > especially not when there are so many things I'm unhappy about in both > representations. There are TODO items to revamp both representations > in major ways. I understand, thanks for responding. Not to belabor that point too much, but the potential for change would not necessarily be a live or die issue. Everybody would know that the direct interface is not a standard and could change drastically at any time. I guess the bigger problem might be with documenting these changes other than in comments inside code. You don't want to have people depend on something they shouldn't and then get angry about you making changes. I can understand Alex' lapidar response, but from what I read in the various tuning guides, all SQL optimizers have their ideosynchrasies. And the question I'm really still not sure about is whether in many cases in applications you know the best access plan better than you know how to convince the optimizer to picking it (all vacuuming etc. granted.) Or, even worse, you may have to at runtime use application level knowledge to choose an access plan (and then change your SQL in cryptic ways to convey your desire to the optimizer.) Besides, if the optimizer can change, the hacks one employs for convincing it on the SQL level might change too. For example, we are dealing with research queries to huge tables (>200 million rows) of patient observations. The best choice of access plan needs to consider various statistics that a simple histogram won't cover. I.e., you may want to know what the likelyhood is for any patient to have an observation of a certain kind and then use that as your primary criterion to cut down on the size of your result set as early as possible. For instance, one of our queries might be "find all patients with a colorectal cancer and hemoglobin greater than 15 mg/dL." It is extremely critical to first look for the cancer patients (using an index) and then constrain those with the hemoglobin, because just about every patient has hemoglobins whereas cancer is much rarer. So, if you went through hemoglobin you'd end up doing a complete table scan. Certainly, a simple histogram might catch that, but there are confounding factors. Since we are searching for patients, not for individual values, we want to use those constraints first that have the highest selectivity for patients, not necessarily for records in the observation table. These and other issues are all nicely tweakable by SQL if you have static queries. But if the queries can be in zillions of combinations, the problem can't be solved by massaging every single SQL query. (And yes, the problem of n-way joins with n > 6, 7, 8, etc. is very much a possibility.) So, Alex suggests to improve the optimizer. That's a thought, however, it is a much heavier task to do that in general (abstract from the kind of application domain knowledge) AND it is harder to do that directly in the backend code than it would be if one could just send an execution plan.) Another use case for this that I am considering is that we are developing a massive distributed querying facility that deal with horizontal partitioning of data and wants to maintain privacy. It would be great of course if postgersql had a distributed database query interface, however, it is much more difficult a task to implementing this so general that it could be part of the standard postgresql code than it would be if it was on an experimental client that directly feeds access plans into the server. regards -Gunther -- Gunther Schadow, M.D., Ph.D. gschadow@regenstrief.org Medical Information Scientist Regenstrief Institute for Health Care Adjunct Assistant Professor Indiana University School of Medicine tel:1(317)630-7960 http://aurora.regenstrief.org
В списке pgsql-interfaces по дате отправления: