Обсуждение: BUG #12869: PostGIS 2.2 can't compile against 9.5 dev branch
The following bug has been logged on the website: Bug reference: 12869 Logged by: Regina Obe Email address: lr@pcorp.us PostgreSQL version: Unsupported/Unknown Operating system: Debian and Mingw-w64 Description: I think something changed in the past 7 days to cause our PostGIS 2.2/9.5 compile to fail. The PostGIS debian Jenkins bot pulls the latest snapshot of 9.5 and compiles our upcoming PostGIS 2.2 trunk against it every weekend. This weekend, it failed to compile. I'm not sure if it's something that needs changing in PostGIS code or some issue in 9.5 recent code changes. The error we get is this: /pg9.5w64/include/postgresql/internal -D_GNU_SOURCE -c -o lwgeom_geos.o lwgeom_geos.c lwgeom_geos.c: In function âerrorIfGeometryCollectionâ: lwgeom_geos.c:1730: error: âDBL_DIGâ undeclared (first use in this function) lwgeom_geos.c:1730: error: (Each undeclared identifier is reported only once lwgeom_geos.c:1730: error: for each function it appears in.) make[1]: *** [lwgeom_geos.o] Error 1 make[1]: Leaving directory `/var/lib/jenkins/workspace/postgis/regress_pgdev/branches/2.2/postgis' make: *** [all] Error 1 I also tried on my windows desktop pulling latest 9.5 snapshot and compiling under mingw-w64 and get similar error. lwgeom_geos.c: In function 'errorIfGeometryCollection': lwgeom_geos.c:1730:46: error: 'DBL_DIG' undeclared (first use in this function) hintwkt = lwgeom_to_wkt(lwgeom, WKT_SFSQL, DBL_DIG, &hintsz); Other versions of PostgreSQL are not affected. I have the issue in our bug tracker as well here: http://trac.osgeo.org/postgis/ticket/3079
lr@pcorp.us writes: > The PostGIS debian Jenkins bot pulls the latest snapshot of 9.5 and compiles > our upcoming PostGIS 2.2 trunk against it every weekend. This weekend, it > failed to compile. I'm not sure if it's something that needs changing in > PostGIS code or some issue in 9.5 recent code changes. > The error we get is this: > /pg9.5w64/include/postgresql/internal -D_GNU_SOURCE -c -o lwgeom_geos.o > lwgeom_geos.c > lwgeom_geos.c: In function âerrorIfGeometryCollectionâ: > lwgeom_geos.c:1730: error: âDBL_DIGâ undeclared (first use in this > function) According to the C and POSIX standards, DBL_DIG is defined by <float.h>. It sounds like you were indirectly depending on some Postgres header to #include that, and said header no longer does. I'm not sure what bit of refactoring might've had such an effect, but it doesn't much matter --- you really ought to #include <float.h> for yourself. regards, tom lane
> According to the C and POSIX standards, DBL_DIG is defined by <float.h>. > It sounds like you were indirectly depending on some Postgres header to > #include that, and said header no longer does. I'm not sure what bit > of refactoring might've had such an effect, but it doesn't much matter > --- you really ought to #include <float.h> for yourself. Tom, Thanks for the response. We were able to fix that issue by adding a <float.h> that was missing in one of our files. That unfortunately unearthed another failure when it got to regression testing which I have logged here: http://trac.osgeo.org/postgis/ticket/3080 It seems that under some conditions functions are being called multiple times causing additional NOTICES to be emitted. This is causing a failure in one of our raster tests. In prior versions they were emitted only once (I assume they were only called once). I've come up with a simple example that doesn't involve PostGIS to replicate the issue. CREATE OR REPLACE FUNCTION dummy_notice(variadic integer[]) RETURNS integer[] AS $$ BEGIN RAISE NOTICE 'This is a dummy notice'; RETURN $1; END; $$ language plpgsql IMMUTABLE STRICT; SELECT v[1] As v1, v[2] As v2, v[3] As v3 FROM (SELECT dummy_notice(1,2,3) As v) As t; In prior versions of PostgreSQL, this simple test would just emit: NOTICE: This is a dummy notice In PostgreSQL 9.5 as of when I had reported the original issue, it emits for every single column in the outter output. NOTICE: This is a dummy notice NOTICE: This is a dummy notice NOTICE: This is a dummy notice I presume that means the dummy_notice function is now being called 3 times for the above. This behavior started when I reported the float issue and is still doing that with latest PostgreSQL 9.5 snapshot. I'm not sure if this is an intended change or a side-effect of something else that needs checking. Thanks, Regina
On Sunday, March 22, 2015, Paragon Corporation <lr@pcorp.us> wrote: > > SELECT v[1] As v1, v[2] As v2, v[3] As v3 > FROM > (SELECT dummy_notice(1,2,3) As v) As t; > > I suspect Tom's optimization: http://git.postgresql.org/pg/commitdiff/f4abd0241de20d5d6a79b84992b9e88603d44134 is flattening the array returning function into the select-list of the outer query so it effectively reads: Select Dummy_notice(1,2,3)[1], dummy_notice(1,2,3)[2], etc... The flatten would work if the result could be cached...it is defined to be immutable...but that would not be reliable generally... The more complex PostGIS query simply appears to Simply be multiple nested functions but still with all constants. I guess the only easy way to not pass constants would be to use a CTE and use an embedded scalar subquery to pull from it. From the commit the lack of a from clause is the trigger so putting the cte in a from would remove the optimization instead of proving that the correct behavior is applied even if the arguments to the select-list function are not literals. Apologies in advance if this is a red herring... David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > On Sunday, March 22, 2015, Paragon Corporation <lr@pcorp.us> wrote: >> SELECT v[1] As v1, v[2] As v2, v[3] As v3 >> FROM >> (SELECT dummy_notice(1,2,3) As v) As t; > I suspect Tom's optimization: > http://git.postgresql.org/pg/commitdiff/f4abd0241de20d5d6a79b84992b9e88603d44134 > is flattening the array returning function into the select-list of the > outer query so it effectively reads: > Select Dummy_notice(1,2,3)[1], dummy_notice(1,2,3)[2], etc... Yeah. You could put "OFFSET 0" into the sub-select if you want to ensure it will not be flattened. > The flatten would work if the result could be cached...it is defined to > be immutable...but that would not be reliable generally... Note that "immutable" effectively means "having no side-effects of interest", so emitting a NOTICE from such a function is really cheating. Another solution would be to mark the function VOLATILE. regards, tom lane
On Sunday, March 22, 2015, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "David G. Johnston" <david.g.johnston@gmail.com <javascript:;>> writes: > > On Sunday, March 22, 2015, Paragon Corporation <lr@pcorp.us > <javascript:;>> wrote: > >> SELECT v[1] As v1, v[2] As v2, v[3] As v3 > >> FROM > >> (SELECT dummy_notice(1,2,3) As v) As t; > > > I suspect Tom's optimization: > > > http://git.postgresql.org/pg/commitdiff/f4abd0241de20d5d6a79b84992b9e88603d44134 > > is flattening the array returning function into the select-list of the > > outer query so it effectively reads: > > Select Dummy_notice(1,2,3)[1], dummy_notice(1,2,3)[2], etc... > > Yeah. You could put "OFFSET 0" into the sub-select if you want to ensure > it will not be flattened. This doesn't seem like a solution...if the flattened version of an all constants invocation cannot be run only once where it could if it was not flattened I would say the we've introduced a likely (and actual) performance regression that punishes current valid and idiomatic code. I haven't gone back and devised the entire reasoning for, and potential benefit of, the flattening but both this and likely functions returning composites are being negatively affected by this change. > > > The flatten would work if the result could be cached...it is defined to > > be immutable...but that would not be reliable generally... > > Note that "immutable" effectively means "having no side-effects of > interest", so emitting a NOTICE from such a function is really cheating. > Another solution would be to mark the VOLATILE. > The notice is not important here and suggesting mis-defining an immutable function as volatile is likewise an insufficient solution. Presuming this optimization is the cause it should, at first glance, either be fixed or reverted. I'm still confused on when immutable functions are only invoked once for a given set of arguments...since this seems like it should qualify. I presume it is only a row-based optimization then? David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > This doesn't seem like a solution...if the flattened version of an all > constants invocation cannot be run only once where it could if it was not > flattened I would say the we've introduced a likely (and actual) > performance regression that punishes current valid and idiomatic code. I > haven't gone back and devised the entire reasoning for, and potential > benefit of, the flattening but both this and likely functions returning > composites are being negatively affected by this change. Well, it improves some queries and perhaps punishes others, but I should think the overall effect would generally be a win. Even in the case given here, it's far from clear that allowing flattening is a performance loss; the end result would have been a query containing only constants at run time, which in most scenarios would be a Good Thing. As for claiming that this is broken and should be reverted: nyet. In the first place, there is certainly no PG documentation anywhere that promises single evaluation of a function written in the manner shown here. We do, on the other hand, say that OFFSET 0 creates an optimization fence; so I see nothing wrong with my recommendation. In the second place, this was a HEAD-only change, and we certainly do not promise than the planner never changes behavior in major version updates. regards, tom lane
On Sunday, March 22, 2015, Tom Lane <tgl@sss.pgh.pa.us <javascript:_e(%7B%7D,'cvml','tgl@sss.pgh.pa.us');>> wrote: > "David G. Johnston" <david.g.johnston@gmail.com> writes: > > This doesn't seem like a solution...if the flattened version of an all > > constants invocation cannot be run only once where it could if it was not > > flattened I would say the we've introduced a likely (and actual) > > performance regression that punishes current valid and idiomatic code. I > > haven't gone back and devised the entire reasoning for, and potential > > benefit of, the flattening but both this and likely functions returning > > composites are being negatively affected by this change. > > Well, it improves some queries and perhaps punishes others, but I should > think the overall effect would generally be a win. While I trust your judgement actual knowledge is nice. The fact that PostGIS is raising the complaint should make us rethink those assumptions; and nothing in their code or the minimal example strikes me as being improper. That said, I am surprised you didn't recommend that the function should be moved to a FROM clause - though I believe the code could still optimize that way without the offset 0. > Even in the case given > here, it's far from clear that allowing flattening is a performance loss; > the end result would have been a query containing only constants at run > time, which in most scenarios would be a Good Thing. The PostGIS example involves constructing geometries from a definition. I suspect that is not cheap, which suggests that maybe we could recommend increasing the cost of the function above the default and refusing to flatten a function if the cost is sufficiently high? > > As for claiming that this is broken and should be reverted: nyet. In the > first place, there is certainly no PG documentation anywhere that promises > single evaluation of a function written in the manner shown here. But what does the typical user expect? That should be the digested standard: whether it is documented should be secondary. It is not documented that the given query will be multiply evaluated either - and the fact that the query, as written, only include the function call once would suggest the user expect that it will only be evaluated once, > We do, on the other hand, say that OFFSET 0 creates an optimization fence; > so > I see nothing wrong with my recommendation. Given your dislike of planner hints I'm surprised you are so quick to defend this solution. A hint is exactly what this is and requires the user to know much more about the planner and postgresql internals than we shoud be happy doing. > In the second place, this was > a HEAD-only change, and we certainly do not promise than the planner never > changes behavior in major verions. We do promise to not introduce regressions whenever possible. That includes performance regressions. So we don't know where the gain/loss boundary is, yet we know of at least one common case that is being negatively affected, nor is the the suggested hack appealing. The appealing workaround, then, is to put the function call into a CTE and hope we don't decide to act on the fact that it is not documented to be an optimization fence. I cannot help but feel there is a patch here that allows us to negate the regression while keeping the benefits - but it's beyond my skills to prove that let alone write it. David J.
>While I trust your judgement actual knowledge is nice. The fact that = PostGIS is raising the complaint should make us rethink those = assumptions; and nothing in their code or the minimal > example strikes me as being improper. That said, I am surprised you = didn't recommend that the function should be moved to a FROM clause - = though I believe the code could still optimize > that way without the = offset 0. > The PostGIS example involves constructing geometries from a = definition. I suspect that is not cheap, which suggests that maybe we = could recommend increasing the cost of the function above > the default and refusing to flatten a function if the cost is = sufficiently high? Guys I tried increasing cost with the ST_Reclass function to 2000 and it = didn't help. The function is called for every single output. The = only thing that helps is setting this to VOLATILE as Tom suggests. I = guess we can do that as some sort of fix for these kinds of functions = though I'm not sure if that would cause other issues. > As for claiming that this is broken and should be reverted: nyet. In = the > first place, there is certainly no PG documentation anywhere that = promises > single evaluation of a function written in the manner shown here. =20 > But what does the typical user expect? That should be the digested = standard: whether it is documented should be secondary. It is not = documented that the given query will be multiply=20 > evaluated either - and the fact that the query, as written, only = include the function call once would suggest the user expect that it = will only be evaluated once, A good chunk of PostGIS functions are very costly and return a raster, = geometry, geography object which is then further acted on in queries. = As David noted, the general user -- partly because of observation I = guess assumed the behavior that a function such as that would only be = called once. I suppose if we were to change all such things to = VOLATILE, that would be fine, but I'm concerned about the vast number of = PostGIS users who further wrap the PostGIS functions and don't know to = do this. =20 > I cannot help but feel there is a patch here that allows us to negate = the regression while keeping the benefits - but it's beyond my skills to = prove that let alone write it. > David J. Does this change affect any function that takes a complex object and = then has that complex object return another complex object? If that is = the case, this would be disastrous for PostGIS users upgrading to 9.5 = unless we change all the functions we have that have this behavior to = VOLATILE. I would be okay with that as long as it doesn't cause = performance issues for older versions of PostgreSQL and doesn't cause = performance issues elsewhere. Thanks, Regina =20
"Paragon Corporation" <lr@pcorp.us> writes: > Guys I tried increasing cost with the ST_Reclass function to 2000 and it didn't help. The function is called for everysingle output. The only thing that helps is setting this to VOLATILE as Tom suggests. I guess we can do that assome sort of fix for these kinds of functions though I'm not sure if that would cause other issues. I think that would be a very bad idea; it would foreclose optimizations that you *do* want. Much better, if you are relying on single-evaluation behavior for a non-volatile function, is to use one of the documented optimization fences: either a CTE, or an OFFSET 0 in a sub-select. TBH, this particular example does not fill me with concern, because (a) it's obviously artificial, and (b) you'd really never notice if the function got evaluated 3 times not once, if you hadn't put in that NOTICE. I grant that there may be cases where you're worried about avoiding multiple evaluations *per row* over some large number of rows, but to discuss that type of problem we'd have to see what your coding habits are like for such cases. An immutable function being fed constants is not going to create that type of problem. regards, tom lane
Sorry for cross post. I've cc'd PostGIS dev group on this for input -- For PostGIS folks - link to related issue that started this conversation here: http://trac.osgeo.org/postgis/ticket/3080 >> "Paragon Corporation" <lr@pcorp.us> writes: >> Guys I tried increasing cost with the ST_Reclass function to 2000 and it didn't help. The function is called for every single output. The only thing that helps is setting this to VOLATILE as Tom suggests. I guess we >> can do that as some sort of fix for these kinds of functions though I'm not sure if that would cause other issues. > I think that would be a very bad idea; it would foreclose optimizations that you *do* want. Much better, > if you are relying on single-evaluation behavior for a non-volatile function, is to use one of the documented optimization fences: either a CTE, or an OFFSET 0 in a sub-select. The CTE / OFFSET solution requires the end-user to do this, which means a lot of people using PostGIS may suffer severe performance issues after upgrading to PostgreSQL 9.5 unless they change their code which is something I'd like to avoid. > TBH, this particular example does not fill me with concern, because > (a) it's obviously artificial, and (b) you'd really never notice if the function got evaluated 3 times not once, if you hadn't put in that NOTICE. > I grant that there may be cases where you're worried about avoiding multiple evaluations *per row* over some large number of rows, but to discuss that type of problem we'd have to see what your coding > habits are like for such cases. An immutable function being fed constants is not going to create that type of problem. > regards, tom lane The particular constant example I presented was simply to provide an example that exhibits this behavior without having to require PostGIS to replicate it. The more common use case (the ones I am really worried about are cases where we have a function that takes a large composite object and outputs another which then gets passed to another PostGIS function that cuts off small pieces. Here one that come to mind SELECT ST_GeometryN(newgeom,1) As geom1, ST_GeometryN(newgeom,2) As geom2 FROM ( SELECT ST_Simplify(geom,25) As newgeom FROM sometable) As foo; If I am not mistaken -- please correct me if I am wrong. The above example would cause ST_Simplify to be called for each ST_GeometryN. If that were the case, this would be a huge PostGIS performance killer for a lot of folks as this pattern is very common and we have a lot of expensive functions that do this kind of thing. -- PostGIS raster is filled with a ton of these. The ST_Reclass is just one example of these (and one that just happened to be tripped up by this change because it happened to have a NOTICE in it) -- things like ST_MapAlgebra that are very intensive functions that output often get used later by other functions concern me even more. Thanks, Regina
"Paragon Corporation" <lr@pcorp.us> writes: > The more common use case (the ones I am really worried about are cases where > we have a function that takes a large composite object and outputs another > which then gets passed to another PostGIS function that cuts off small > pieces. > Here one that come to mind > SELECT ST_GeometryN(newgeom,1) As geom1, ST_GeometryN(newgeom,2) As geom2 > FROM ( > SELECT ST_Simplify(geom,25) As newgeom > FROM sometable) As foo; > If I am not mistaken -- please correct me if I am wrong. The above example > would cause ST_Simplify to be called for each ST_GeometryN. Indeed, and that is exactly what happens in every PG release for time out of mind, if that's actually how you're coding it without using any method that would prevent flattening of the subquery. Check it with EXPLAIN VERBOSE if you doubt that. The patch I committed two weeks ago has no impact whatsoever on this example, because that's not a table-free subquery. I suspect that (a) you are using OFFSET 0 in the subquery, (b) you already have ST_Simplify marked as volatile, or (c) the performance hit from multiple evaluations isn't as drastic as you think. regards, tom lane
Tom, >> Here one that come to mind >> SELECT ST_GeometryN(newgeom,1) As geom1, ST_GeometryN(newgeom,2) As >> geom2 FROM ( SELECT ST_Simplify(geom,25) As newgeom FROM sometable) >> As foo; >> If I am not mistaken -- please correct me if I am wrong. The above >> example would cause ST_Simplify to be called for each ST_GeometryN. > Indeed, and that is exactly what happens in every PG release for time out of mind, if that's actually how you're coding it without using any method that would prevent flattening of the subquery. Check it with > EXPLAIN VERBOSE if you doubt that. The patch I committed two weeks ago has no impact whatsoever on this example, because that's not a table-free subquery. > I suspect that (a) you are using OFFSET 0 in the subquery, (b) you already have ST_Simplify marked as volatile, or (c) the performance hit from multiple evaluations isn't as drastic as you think. > regards, tom lane Sorry for the false alarm. Indeed you are right and the change is not as much of a change as I thought. This revised version of the query -- triggers a double call in older versions of PostgreSQL too and since this would be the more common case not much has changed in behavior. I think for raster work most people wrap these kind of calls in CTEs anyway. DROP TABLE IF EXISTS t2; CREATE TABLE t2(rast raster); INSERT INTO t2(rast) SELECT ST_SetValue( ST_SetValue( ST_AddBand( ST_MakeEmptyRaster(100, 100, 10, 10, 2, 2, 0, 0, 0), 1, '32BF', 1, 0 ), 1, 1, 1, 3.14159 ), 1, 10, 10, 2.71828 ) As rast; SELECT ST_Value(t.rast, 1, 1, 1), ST_Value(t.rast, 1, 10, 10), ST_BandNoDataValue(rast, 1) FROM ( SELECT ST_Reclass( rast, ROW(1, 'a-100]:50-1,(-100-1000]:150-50,(1000-10000]:254-150', '8BUI', 0) ) AS rast FROM t2 ) AS t; So I'll just put an OFFSET 0 on this regression query and call it a day. Thanks, Regina