Re: PG 7.2b4 bug?
От | Don Baccus |
---|---|
Тема | Re: PG 7.2b4 bug? |
Дата | |
Msg-id | 3C1FBDAA.6080101@pacifier.com обсуждение исходный текст |
Ответ на | PG 7.2b4 bug? (Don Baccus <dhogaza@pacifier.com>) |
Список | pgsql-hackers |
Tom Lane wrote: > Don Baccus <dhogaza@pacifier.com> writes: > >>Maybe the behavior's implementation defined ... if not, I'd presume SQL3 >> states that a function in the above context is called either once per >>row or once per query, not sometimes one or sometimes the other. This is still bothering me so I decided to plunge into the standard myself. First of all... > > AFAICT, the relevant concept in SQL99 is whether a function is > "deterministic" or not: Actually this argument may well apply to the function all within the subselect or view, but I fail to see any language in the standard that suggests that this trumps the following declaration about the execution of a <query specification> (what many of us informally refer to as a "SELECT"): (from section 7.12, Query Specification) a) If T is not a grouped table, then Case: (I deleted Case i, which refers to standard aggregates like COUNT) ii) If the <select list> does not include a <set function specification> that contains a referenceto T, then each <value expression> is applied to each row of T yielding a table of M rows, where M is the cardinality of T ... (FYI a <set function specification> is a standard aggregate like COUNT, i.e. Case ii pertains to those queries that don't fall into Case i) ISTM that this quite clearly states that a subselect in a target list should be applied to each row to be returned in M. I don't see any waffle-room here. I would have to dig more deeply into the standard's view regarding VIEW semantics but I would assume it would knit together in a consistent manner. For instance, earlier we saw the following exchange between Stephen and Tom: Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > However, I'd think:> > "select (select nextval from test_seq) from multiple_rows;"> > should give you different valuesand doesn't, although> > "select (select nextval from test_seq where i IS NULL or i IS NOT NULL)> > from multiple_rows;"does give you different values. >> In the first case, the subselect is visibly not dependent on the outer> query, so it's evaluated only once; in the secondcase it has to be> re-evaluated for each row using that row's value of i. Note that the standard does not give you this freedom. It says the <value expression> (in this case the subselect, and yes subselects are valid <value expressions> in SQL3, at least in my draft) must be applied to each row. IMO this means that the optimizer can choose to evaluate the <value expression> once only if it knows for certain that multiple calls will return the same value. For example "my_cachable_function()", not "my_uncachable_function()" or "nextval()". Or IMO a view built using a non-cachable function. In other words it can only suppress evaluation if it can be certain that doing so doesn't change the result. Another nit-pick, the claim's not even strictly true. "i IS NULL OR i IS NOT NULL" can be folded to true, so the subselect's not "visibly dependent on i". In fact, it is quite visibly *not* dependent on the outer query. PG just isn't smart enough to fold the boolean expression into the known value "true". It's this kind of uncertainty that makes the current behavior so ... ugly. You get different answers depending on various optimization values, the complexity of the query, etc. ISTM that the standard is quite clearly worded to avoid this unpleasantness. ... > It looks to me like the spec does NOT attempt to nail down the behavior > of non-deterministic functions; in the places where they talk about > non-deterministic functions at all, it's mostly to forbid their use in > contexts where nondeterminism would affect the final result. Otherwise > the results are implementation-dependent. I've been looking at a few of the "non-deterministic" clauses in the General Rules, out of curiousity. They generally aren't involved with the execution or non-execution of expressions. Ordering of execution is in many cases non-deterministic and implementation-dependent. There are plenty of General Rules of this sort. We also have this big gaping black hole of non-determinism due to character set collation. In other words: select foo from bar order by foo; is non-deterministic (we don't know the order in which the rows will be returned) if foo is a character type. This can even be true within implementations, for instance in PG it changes with when you change locales (and have locale support enabled). However, it seems clear that: select foo, my_function() from bar order by foo; requires my_function() to be called for every row - we just can't depend on the order in which it will be applied to those rows in the case where foo is a character type. Of course, iscachable tells the optimizer that it's OK to just call it once but that's an extension outside SQL3's domain. Obviously if you run this query over and over again with the same collation order the "order by" is deterministic. The non-determinism is in respect to the portability of the query to implementationsbuilt on differing character sets. I'm just not seeing justification for claiming that Section 7.12 can be ignored if the subselect or view happens to contain a function that's not cachable. -- Don Baccus Portland, OR http://donb.photo.net, http://birdnotes.net, http://openacs.org
В списке pgsql-hackers по дате отправления: