Обсуждение: BUG #12869: PostGIS 2.2 can't compile against 9.5 dev branch

Поиск
Список
Период
Сортировка

BUG #12869: PostGIS 2.2 can't compile against 9.5 dev branch

От
lr@pcorp.us
Дата:
The following bug has been logged on the website:

Bug reference:      12869
Logged by:          Regina Obe
Email address:      lr@pcorp.us
PostgreSQL version: Unsupported/Unknown
Operating system:   Debian and Mingw-w64
Description:

I think something changed in the past 7 days to cause our PostGIS 2.2/9.5
compile to fail.

The PostGIS debian Jenkins bot pulls the latest snapshot of 9.5 and compiles
our upcoming PostGIS 2.2 trunk against it every weekend.  This weekend, it
failed to compile.  I'm not sure if it's something that needs changing in
PostGIS code or some issue in 9.5 recent code changes.

The error we get is this:

/pg9.5w64/include/postgresql/internal -D_GNU_SOURCE   -c -o lwgeom_geos.o
lwgeom_geos.c
lwgeom_geos.c: In function ‘errorIfGeometryCollection’:
lwgeom_geos.c:1730: error: ‘DBL_DIG’ undeclared (first use in this
function)
lwgeom_geos.c:1730: error: (Each undeclared identifier is reported only
once
lwgeom_geos.c:1730: error: for each function it appears in.)
make[1]: *** [lwgeom_geos.o] Error 1
make[1]: Leaving directory
`/var/lib/jenkins/workspace/postgis/regress_pgdev/branches/2.2/postgis'
make: *** [all] Error 1

I also tried on my windows desktop pulling latest 9.5 snapshot and compiling
under mingw-w64 and get similar error.

lwgeom_geos.c: In function 'errorIfGeometryCollection':
lwgeom_geos.c:1730:46: error: 'DBL_DIG' undeclared (first use in this
function)
   hintwkt = lwgeom_to_wkt(lwgeom, WKT_SFSQL, DBL_DIG, &hintsz);

Other versions of PostgreSQL are not affected.


I have the issue in our bug tracker as well here:

http://trac.osgeo.org/postgis/ticket/3079

Re: BUG #12869: PostGIS 2.2 can't compile against 9.5 dev branch

От
Tom Lane
Дата:
lr@pcorp.us writes:
> The PostGIS debian Jenkins bot pulls the latest snapshot of 9.5 and compiles
> our upcoming PostGIS 2.2 trunk against it every weekend.  This weekend, it
> failed to compile.  I'm not sure if it's something that needs changing in
> PostGIS code or some issue in 9.5 recent code changes.

> The error we get is this:

> /pg9.5w64/include/postgresql/internal -D_GNU_SOURCE   -c -o lwgeom_geos.o
> lwgeom_geos.c
> lwgeom_geos.c: In function ‘errorIfGeometryCollection’:
> lwgeom_geos.c:1730: error: ‘DBL_DIG’ undeclared (first use in this
> function)

According to the C and POSIX standards, DBL_DIG is defined by <float.h>.
It sounds like you were indirectly depending on some Postgres header to
#include that, and said header no longer does.  I'm not sure what bit
of refactoring might've had such an effect, but it doesn't much matter
--- you really ought to #include <float.h> for yourself.

            regards, tom lane

Re: BUG #12869: PostGIS 2.2 can't compile against 9.5 dev branch

От
"Paragon Corporation"
Дата:
> According to the C and POSIX standards, DBL_DIG is defined by <float.h>.
> It sounds like you were indirectly depending on some Postgres header to
> #include that, and said header no longer does.  I'm not sure what bit
> of refactoring might've had such an effect, but it doesn't much matter
> --- you really ought to #include <float.h> for yourself.

Tom,

Thanks for the response.  We were able to fix that issue by adding a
<float.h> that was missing in one of our files.
That unfortunately unearthed another failure when it got to regression
testing which I have logged here:  http://trac.osgeo.org/postgis/ticket/3080

It seems that under some conditions functions are being called multiple
times causing additional NOTICES to be emitted.  This is causing a failure
in one of our raster tests.

In  prior versions they were emitted only once (I assume they were only
called once).  I've come up with a simple example that doesn't involve
PostGIS to replicate the issue.


CREATE OR REPLACE FUNCTION dummy_notice(variadic integer[]) RETURNS
integer[] AS
$$
BEGIN
    RAISE NOTICE 'This is a dummy notice';
    RETURN $1;
END;
$$
language plpgsql IMMUTABLE STRICT;


SELECT  v[1] As v1, v[2] As v2, v[3] As v3
FROM
    (SELECT dummy_notice(1,2,3) As v) As t;


In prior versions of PostgreSQL, this simple test would just emit:

NOTICE:  This is a dummy notice


In PostgreSQL 9.5 as of when I had reported the original issue, it emits for
every single column in the outter output.
NOTICE:  This is a dummy notice
NOTICE:  This is a dummy notice
NOTICE:  This is a dummy notice

 I presume that means the dummy_notice function is now being called 3 times
for the above.

This behavior started when I reported the float issue and  is still doing
that with latest PostgreSQL 9.5 snapshot.

I'm not sure if this is an intended change or a side-effect of something
else that needs checking.


Thanks,
Regina

Re: BUG #12869: PostGIS 2.2 can't compile against 9.5 dev branch

От
"David G. Johnston"
Дата:
On Sunday, March 22, 2015, Paragon Corporation <lr@pcorp.us> wrote:

>
> SELECT  v[1] As v1, v[2] As v2, v[3] As v3
> FROM
>         (SELECT dummy_notice(1,2,3) As v) As t;
>
>
I suspect Tom's optimization:
http://git.postgresql.org/pg/commitdiff/f4abd0241de20d5d6a79b84992b9e88603d44134

is flattening the array returning function into the select-list of the
outer query so it effectively reads:

Select Dummy_notice(1,2,3)[1], dummy_notice(1,2,3)[2], etc...

The flatten would work if the result could be cached...it is defined to
be immutable...but that would not be reliable generally...

The more complex PostGIS query simply appears to Simply be multiple nested
functions but still with all constants.  I guess the only easy way to not
pass constants would be to use a CTE and use an embedded scalar subquery to
pull from it.  From the commit the lack of a from clause is the trigger so
putting the cte in a from would remove the optimization instead of proving
that the correct behavior is applied even if the arguments to the
select-list function are not literals.

Apologies in advance if this is a red herring...

David J.

Re: BUG #12869: PostGIS 2.2 can't compile against 9.5 dev branch

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Sunday, March 22, 2015, Paragon Corporation <lr@pcorp.us> wrote:
>> SELECT  v[1] As v1, v[2] As v2, v[3] As v3
>> FROM
>> (SELECT dummy_notice(1,2,3) As v) As t;

> I suspect Tom's optimization:
> http://git.postgresql.org/pg/commitdiff/f4abd0241de20d5d6a79b84992b9e88603d44134
> is flattening the array returning function into the select-list of the
> outer query so it effectively reads:
> Select Dummy_notice(1,2,3)[1], dummy_notice(1,2,3)[2], etc...

Yeah.  You could put "OFFSET 0" into the sub-select if you want to ensure
it will not be flattened.

> The flatten would work if the result could be cached...it is defined to
> be immutable...but that would not be reliable generally...

Note that "immutable" effectively means "having no side-effects of
interest", so emitting a NOTICE from such a function is really cheating.
Another solution would be to mark the function VOLATILE.

            regards, tom lane

Re: BUG #12869: PostGIS 2.2 can't compile against 9.5 dev branch

От
"David G. Johnston"
Дата:
On Sunday, March 22, 2015, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> "David G. Johnston" <david.g.johnston@gmail.com <javascript:;>> writes:
> > On Sunday, March 22, 2015, Paragon Corporation <lr@pcorp.us
> <javascript:;>> wrote:
> >> SELECT  v[1] As v1, v[2] As v2, v[3] As v3
> >> FROM
> >> (SELECT dummy_notice(1,2,3) As v) As t;
>
> > I suspect Tom's optimization:
> >
> http://git.postgresql.org/pg/commitdiff/f4abd0241de20d5d6a79b84992b9e88603d44134
> > is flattening the array returning function into the select-list of the
> > outer query so it effectively reads:
> > Select Dummy_notice(1,2,3)[1], dummy_notice(1,2,3)[2], etc...
>
> Yeah.  You could put "OFFSET 0" into the sub-select if you want to ensure
> it will not be flattened.


This doesn't seem like a solution...if the flattened version of an all
constants invocation cannot be run only once where it could if it was not
flattened I would say the we've introduced a likely (and actual)
performance regression that punishes current valid and idiomatic code.  I
haven't gone back and devised the entire reasoning for, and potential
benefit of, the flattening but both this and likely functions returning
composites are being negatively affected by this change.


>
> > The flatten would work if the result could be cached...it is defined to
> > be immutable...but that would not be reliable generally...
>
> Note that "immutable" effectively means "having no side-effects of
> interest", so emitting a NOTICE from such a function is really cheating.
> Another solution would be to mark the VOLATILE.
>

The notice is not important here and suggesting mis-defining an
immutable function as volatile is likewise an insufficient solution.

Presuming this optimization is the cause it should, at first glance, either
be fixed or reverted.

I'm still confused on when immutable functions are only invoked once for a
given set of arguments...since this seems like it should qualify.  I
presume it is only a row-based optimization then?

David J.

Re: BUG #12869: PostGIS 2.2 can't compile against 9.5 dev branch

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> This doesn't seem like a solution...if the flattened version of an all
> constants invocation cannot be run only once where it could if it was not
> flattened I would say the we've introduced a likely (and actual)
> performance regression that punishes current valid and idiomatic code.  I
> haven't gone back and devised the entire reasoning for, and potential
> benefit of, the flattening but both this and likely functions returning
> composites are being negatively affected by this change.

Well, it improves some queries and perhaps punishes others, but I should
think the overall effect would generally be a win.  Even in the case given
here, it's far from clear that allowing flattening is a performance loss;
the end result would have been a query containing only constants at run
time, which in most scenarios would be a Good Thing.

As for claiming that this is broken and should be reverted: nyet.  In the
first place, there is certainly no PG documentation anywhere that promises
single evaluation of a function written in the manner shown here.  We do,
on the other hand, say that OFFSET 0 creates an optimization fence; so
I see nothing wrong with my recommendation.  In the second place, this was
a HEAD-only change, and we certainly do not promise than the planner never
changes behavior in major version updates.

            regards, tom lane

BUG #12869: PostGIS 2.2 can't compile against 9.5 dev branch

От
"David G. Johnston"
Дата:
On Sunday, March 22, 2015, Tom Lane <tgl@sss.pgh.pa.us
<javascript:_e(%7B%7D,'cvml','tgl@sss.pgh.pa.us');>> wrote:

> "David G. Johnston" <david.g.johnston@gmail.com> writes:
> > This doesn't seem like a solution...if the flattened version of an all
> > constants invocation cannot be run only once where it could if it was not
> > flattened I would say the we've introduced a likely (and actual)
> > performance regression that punishes current valid and idiomatic code.  I
> > haven't gone back and devised the entire reasoning for, and potential
> > benefit of, the flattening but both this and likely functions returning
> > composites are being negatively affected by this change.
>
> Well, it improves some queries and perhaps punishes others, but I should
> think the overall effect would generally be a win.


While I trust your judgement actual knowledge is nice.  The fact that
PostGIS is raising the complaint should make us rethink those assumptions;
and nothing in their code or the minimal example strikes me as being
improper.  That said, I am surprised you didn't recommend that the function
should be moved to a FROM clause - though I believe the code could still
optimize that way without the offset 0.


> Even in the case given
> here, it's far from clear that allowing flattening is a performance loss;
> the end result would have been a query containing only constants at run
> time, which in most scenarios would be a Good Thing.


The PostGIS example involves constructing geometries from a definition.  I
suspect that is not cheap, which suggests that maybe we could recommend
increasing the cost of the function above the default and refusing to
flatten a function if the cost is sufficiently high?


>
> As for claiming that this is broken and should be reverted: nyet.  In the
> first place, there is certainly no PG documentation anywhere that promises
> single evaluation of a function written in the manner shown here.


But what does the typical user expect?  That should be the digested
standard: whether it is documented should be secondary.  It is not
documented that the given query will be multiply evaluated either - and the
fact that the query, as written, only include the function call once would
suggest the user expect that it will only be evaluated once,


> We do, on the other hand, say that OFFSET 0 creates an optimization fence;
> so
> I see nothing wrong with my recommendation.


Given your dislike of planner hints I'm surprised you are so quick to
defend this solution.  A hint is exactly what this is and requires the user
to know much more about the planner and postgresql internals than we shoud
be happy doing.


> In the second place, this was
> a HEAD-only change, and we certainly do not promise than the planner never
> changes behavior in major verions.


We do promise to not introduce regressions whenever possible.  That
includes performance regressions.

So we don't know where the gain/loss boundary is, yet we know of at least
one common case that is being negatively affected, nor is the the suggested
hack appealing.

The appealing workaround, then, is to put the function call into a CTE and
hope we don't decide to act on the fact that it is not documented to be an
optimization fence.

I cannot help but feel there is a patch here that allows us to negate the
regression while keeping the benefits - but it's beyond my skills to prove
that let alone write it.

David J.

Re: BUG #12869: PostGIS 2.2 can't compile against 9.5 dev branch

От
"Paragon Corporation"
Дата:
>While I trust your judgement actual knowledge is nice.  The fact that =
PostGIS is raising the complaint should make us rethink those =
assumptions; and nothing in their code or the minimal
> example strikes me as being improper.  That said, I am surprised you =
didn't recommend that the function should be moved to a FROM clause - =
though I believe the code could still optimize > that way without the =
offset 0.

> The PostGIS example involves constructing geometries from a =
definition.  I suspect that is not cheap, which suggests that maybe we =
could recommend increasing the cost of the function above
 > the default and refusing to flatten a function if the cost is =
sufficiently high?

Guys I tried increasing cost with the ST_Reclass function to 2000 and it =
didn't help.  The function is called for every single output.    The =
only thing that helps is setting this to VOLATILE as Tom suggests.  I =
guess we can do that as some sort of fix for these kinds of functions =
though I'm not sure if that would cause other issues.

> As for claiming that this is broken and should be reverted: nyet.  In =
the
> first place, there is certainly no PG documentation anywhere that =
promises
> single evaluation of a function written in the manner shown here. =20

> But what does the typical user expect?  That should be the digested =
standard: whether it is documented should be secondary.  It is not =
documented that the given query will be multiply=20

> evaluated either - and the fact that the query, as written, only =
include the function call once would suggest the user expect that it =
will only be evaluated once,

A good chunk of PostGIS functions are very costly and return a raster, =
geometry, geography object which is then further acted on in queries.  =
As David noted, the general user -- partly because of observation I =
guess assumed the behavior that a function such as that would only be =
called once.  I suppose if we were to change all such things to =
VOLATILE, that would be fine, but I'm concerned about the vast number of =
PostGIS users who further wrap the PostGIS functions and don't know to =
do this.
=20

> I cannot help but feel there is a patch here that allows us to negate =
the regression while keeping the benefits - but it's beyond my skills to =
prove that let alone write it.

> David J.


Does this change affect any function that takes a complex object and =
then has that complex object return another complex object?  If that is =
the case, this would be disastrous for PostGIS users upgrading to 9.5 =
unless we change all the functions we have that have this behavior to =
VOLATILE.  I would be okay with that as long as it doesn't cause =
performance issues for older versions of PostgreSQL and doesn't cause =
performance issues elsewhere.


Thanks,
Regina



=20

Re: BUG #12869: PostGIS 2.2 can't compile against 9.5 dev branch

От
Tom Lane
Дата:
"Paragon Corporation" <lr@pcorp.us> writes:
> Guys I tried increasing cost with the ST_Reclass function to 2000 and it didn't help.  The function is called for
everysingle output.    The only thing that helps is setting this to VOLATILE as Tom suggests.  I guess we can do that
assome sort of fix for these kinds of functions though I'm not sure if that would cause other issues. 

I think that would be a very bad idea; it would foreclose optimizations
that you *do* want.  Much better, if you are relying on single-evaluation
behavior for a non-volatile function, is to use one of the documented
optimization fences: either a CTE, or an OFFSET 0 in a sub-select.

TBH, this particular example does not fill me with concern, because
(a) it's obviously artificial, and (b) you'd really never notice if the
function got evaluated 3 times not once, if you hadn't put in that NOTICE.
I grant that there may be cases where you're worried about avoiding
multiple evaluations *per row* over some large number of rows, but to
discuss that type of problem we'd have to see what your coding habits are
like for such cases.  An immutable function being fed constants is not
going to create that type of problem.

            regards, tom lane

Re: BUG #12869: PostGIS 2.2 can't compile against 9.5 dev branch

От
"Paragon Corporation"
Дата:
Sorry for cross post.  I've cc'd PostGIS dev group on this for input -- For
PostGIS folks -  link to related issue that started this conversation here:
http://trac.osgeo.org/postgis/ticket/3080

>> "Paragon Corporation" <lr@pcorp.us> writes:
>>  Guys I tried increasing cost with the ST_Reclass function to 2000 and it
didn't help.  The function is called for every single output.    The only
thing that helps is setting this to VOLATILE as Tom suggests.  I guess we

>> can do that as some sort of fix for these kinds of functions though I'm
not sure if that would cause other issues.

> I think that would be a very bad idea; it would foreclose optimizations
that you *do* want.  Much better,
>  if you are relying on single-evaluation behavior for a non-volatile
function, is to use one of the documented optimization fences: either a CTE,
or an OFFSET 0 in a sub-select.

The CTE / OFFSET solution requires the end-user to do this, which means a
lot of people using PostGIS may suffer severe performance issues after
upgrading to PostgreSQL 9.5 unless they change their code which is something
I'd like to avoid.



> TBH, this particular example does not fill me with concern, because
> (a) it's obviously artificial, and (b) you'd really never notice if the
function got evaluated 3 times not once, if you hadn't put in that NOTICE.
> I grant that there may be cases where you're worried about avoiding
multiple evaluations *per row* over some large number of rows, but to
discuss that type of problem we'd have to see what your coding

>  habits are like for such cases.  An immutable function being fed
constants is not going to create that type of problem.

>            regards, tom lane

The particular constant example I presented was simply to provide an example
that exhibits this behavior without having to require PostGIS to replicate
it.

The more common use case (the ones I am really worried about are cases where
we have a function that takes a large composite object and outputs another
which then gets passed to another PostGIS function that cuts off small
pieces.

Here one  that come to mind

SELECT ST_GeometryN(newgeom,1) As geom1, ST_GeometryN(newgeom,2) As geom2
FROM (
SELECT ST_Simplify(geom,25)  As newgeom
FROM sometable) As foo;

If I am not mistaken -- please correct me if I am wrong.  The above example
would cause ST_Simplify to be called for each ST_GeometryN.  If that were
the case, this would be a huge PostGIS performance killer for a lot of folks
as this pattern is very common and we have a lot of expensive functions that
do this kind of thing.

-- PostGIS raster is filled with a ton of these.  The ST_Reclass is just one
example of these (and one that just happened to be tripped up by this change
because it happened to have a NOTICE in it) -- things like ST_MapAlgebra
that are very intensive functions that output often get used later by other
functions concern me even more.




Thanks,
Regina

Re: BUG #12869: PostGIS 2.2 can't compile against 9.5 dev branch

От
Tom Lane
Дата:
"Paragon Corporation" <lr@pcorp.us> writes:
> The more common use case (the ones I am really worried about are cases where
> we have a function that takes a large composite object and outputs another
> which then gets passed to another PostGIS function that cuts off small
> pieces.

> Here one  that come to mind

> SELECT ST_GeometryN(newgeom,1) As geom1, ST_GeometryN(newgeom,2) As geom2
> FROM (
> SELECT ST_Simplify(geom,25)  As newgeom
> FROM sometable) As foo;

> If I am not mistaken -- please correct me if I am wrong.  The above example
> would cause ST_Simplify to be called for each ST_GeometryN.

Indeed, and that is exactly what happens in every PG release for time out
of mind, if that's actually how you're coding it without using any method
that would prevent flattening of the subquery.  Check it with EXPLAIN
VERBOSE if you doubt that.  The patch I committed two weeks ago has no
impact whatsoever on this example, because that's not a table-free subquery.

I suspect that (a) you are using OFFSET 0 in the subquery, (b) you already
have ST_Simplify marked as volatile, or (c) the performance hit from
multiple evaluations isn't as drastic as you think.

            regards, tom lane

Re: BUG #12869: PostGIS 2.2 can't compile against 9.5 dev branch

От
"Paragon Corporation"
Дата:
Tom,


>> Here one  that come to mind

>> SELECT ST_GeometryN(newgeom,1) As geom1, ST_GeometryN(newgeom,2) As
>> geom2 FROM ( SELECT ST_Simplify(geom,25)  As newgeom FROM sometable)
>> As foo;

>> If I am not mistaken -- please correct me if I am wrong.  The above
>> example would cause ST_Simplify to be called for each ST_GeometryN.

> Indeed, and that is exactly what happens in every PG release for time out
of mind, if that's actually how you're coding it without using any method
that would prevent flattening of the subquery.  Check it with

> EXPLAIN VERBOSE if you doubt that.  The patch I committed two weeks ago
has no impact whatsoever on this example, because that's not a table-free
subquery.

> I suspect that (a) you are using OFFSET 0 in the subquery, (b) you already
have ST_Simplify marked as volatile, or (c) the performance hit from
multiple evaluations isn't as drastic as you think.

> regards, tom lane

Sorry for the false alarm.  Indeed you are right and the change is not as
much of a change as I thought.

This revised version of the query -- triggers a double call in older
versions of PostgreSQL too and since this would be the more common case not
much has changed in behavior.  I think for raster work most people wrap
these kind of calls in CTEs anyway.

DROP TABLE IF EXISTS t2;
CREATE TABLE t2(rast raster);
INSERT INTO t2(rast)
SELECT ST_SetValue(
            ST_SetValue(
                ST_AddBand(
                    ST_MakeEmptyRaster(100, 100, 10, 10,
2, 2, 0, 0, 0),
                    1, '32BF', 1, 0
                ),
                1, 1, 1, 3.14159
            ),
            1, 10, 10, 2.71828
        ) As rast;

SELECT
    ST_Value(t.rast, 1, 1, 1),
    ST_Value(t.rast, 1, 10, 10),
    ST_BandNoDataValue(rast, 1)
FROM (
    SELECT ST_Reclass(
        rast,
        ROW(1,
'a-100]:50-1,(-100-1000]:150-50,(1000-10000]:254-150', '8BUI', 0)
    ) AS rast  FROM t2
) AS t;


So I'll just put an OFFSET 0 on this regression query and call it a day.

Thanks,
Regina