Обсуждение: When IMMUTABLE is not.

Поиск
Список
Период
Сортировка

When IMMUTABLE is not.

От
Yura Sokolov
Дата:
Good day, hackers.

I found, than declaration of function as IMMUTABLE/STABLE is not enough to be sure
function doesn't manipulate data.

In fact, SPI checks only direct function kind, but fails to check indirect call.

Attached immutable_not.sql creates 3 functions:

- `immutable_direct` is IMMUTABLE and tries to insert into table directly.
   PostgreSQL correctly detects and forbids this action.

- `volatile_direct` is VOLATILE and inserts into table directly.
   It is allowed and executed well.

- `immutable_indirect` is IMMUTABLE and calls `volatile_direct`.
   PostgreSQL failed to detect and prevent this DML manipulation.

Output:

select immutable_direct('immutable_direct'); psql:immutable_not.sql:28: 
ERROR:  INSERT is not allowed in a non-volatile function CONTEXT:  SQL 
statement "insert into xxx values(j)" PL/pgSQL function 
immutable_direct(character varying) line 3 at SQL statement select 
volatile_direct('volatile_direct'); volatile_direct ----------------- 
volatile_direct (1 row) select immutable_indirect('immutable_indirect'); 
immutable_indirect -------------------- immutable_indirect (1 row) 
select * from xxx;         i -------------------- volatile_direct 
immutable_indirect (2 rows) Attached forbid-non-volatile-mutations.diff 
add checks readonly function didn't made data manipulations. Output for 
patched version: select immutable_indirect('immutable_indirect'); 
psql:immutable_not.sql:32: ERROR:  Damn2! Update were done in a 
non-volatile function CONTEXT:  SQL statement "SELECT 
volatile_direct(j)" PL/pgSQL function immutable_indirect(character 
varying) line 3 at PERFORM I doubt check should be done this way. This 
check is necessary, but it should be FATAL instead of ERROR. And ERROR 
should be generated at same place, when it is generated for 
`immutable_direct`, but with check of "read_only" status through whole 
call stack instead of just direct function kind. ----- regards, Yura 
Sokolov Postgres Professional

Вложения

Re: When IMMUTABLE is not.

От
Yura Sokolov
Дата:
Sorry, previous message were smashed for some reason.

I'll try to repeat

I found, than declaration of function as IMMUTABLE/STABLE is not enough 
to be sure
function doesn't manipulate data.

In fact, SPI checks only direct function kind, but fails to check 
indirect call.

Attached immutable_not.sql creates 3 functions:

- `immutable_direct` is IMMUTABLE and tries to insert into table directly.
   PostgreSQL correctly detects and forbids this action.

- `volatile_direct` is VOLATILE and inserts into table directly.
   It is allowed and executed well.

- `immutable_indirect` is IMMUTABLE and calls `volatile_direct`.
   PostgreSQL failed to detect and prevent this DML manipulation.

Output:

select immutable_direct('immutable_direct');
psql:immutable_not.sql:28: ERROR:  INSERT is not allowed in a 
non-volatile function
CONTEXT:  SQL statement "insert into xxx values(j)"
PL/pgSQL function immutable_direct(character varying) line 3 at SQL 
statement

select volatile_direct('volatile_direct');
volatile_direct
-----------------
volatile_direct
(1 row)

select immutable_indirect('immutable_indirect');
immutable_indirect
--------------------
immutable_indirect
(1 row)

select * from xxx;
         i
--------------------
volatile_direct
immutable_indirect
(2 rows)

Attached forbid-non-volatile-mutations.diff add checks readonly function 
didn't made data manipulations.
Output for patched version:

select immutable_indirect('immutable_indirect');
psql:immutable_not.sql:32: ERROR:  Damn2! Update were done in a 
non-volatile function
CONTEXT:  SQL statement "SELECT volatile_direct(j)"
PL/pgSQL function immutable_indirect(character varying) line 3 at PERFORM

I doubt check should be done this way. This check is necessary, but it 
should be
FATAL instead of ERROR. And ERROR should be generated at same place, when
it is generated for `immutable_direct`, but with check of "read_only" 
status through
whole call stack instead of just direct function kind.

-----

regards,
Yura Sokolov
Postgres Professional





Re: When IMMUTABLE is not.

От
Laurenz Albe
Дата:
On Thu, 2023-06-15 at 13:22 +0300, Yura Sokolov wrote:
> Good day, hackers.
>
> I found, than declaration of function as IMMUTABLE/STABLE is not enough to be sure
> function doesn't manipulate data.
>
> [...]
>
> +                        errmsg("Damn1! Update were done in a non-volatile function")));

I think it is project policy to start error messages with a lower case character.

Yours,
Laurenz Albe



Re: When IMMUTABLE is not.

От
Tom Lane
Дата:
Yura Sokolov <y.sokolov@postgrespro.ru> writes:
> I found, than declaration of function as IMMUTABLE/STABLE is not enough to be sure
> function doesn't manipulate data.

Of course not.  It is the user's responsibility to mark functions
properly.  Trying to enforce that completely is a fool's errand;
you soon get into trying to solve the halting problem.

I don't like anything about the proposed patch.  It's necessarily
only a partial solution, and it probably breaks cases that are
perfectly safe in context.

            regards, tom lane



Re: When IMMUTABLE is not.

От
Yura Sokolov
Дата:
15.06.2023 16:21, Tom Lane wrote:
> Yura Sokolov <y.sokolov@postgrespro.ru> writes:
>> I found, than declaration of function as IMMUTABLE/STABLE is not enough to be sure
>> function doesn't manipulate data.
> Of course not.  It is the user's responsibility to mark functions
> properly.  Trying to enforce that completely is a fool's errand

https://github.com/postgres/postgres/commit/b2c4071299e02ed96d48d3c8e776de2fab36f88c.patch

https://github.com/postgres/postgres/commit/cdf8b56d5463815244467ea8f5ec6e72b6c65a6c.patch




Re: When IMMUTABLE is not.

От
chap@anastigmatix.net
Дата:
On 2023-06-15 09:21, Tom Lane wrote:
> Yura Sokolov <y.sokolov@postgrespro.ru> writes:
>> not enough to be sure function doesn't manipulate data.
> 
> Of course not.  It is the user's responsibility to mark functions
> properly.

And also, isn't it the case that IMMUTABLE should mark a function,
not merely that "doesn't manipulate data", but whose return value
doesn't depend in any way on data (outside its own arguments)?

The practice among PLs of choosing an SPI readonly flag based on
the IMMUTABLE/STABLE/VOLATILE declaration seems to be a sort of
peculiar heuristic, not something inherent in what that declaration
means to the optimizer. (And also influences what snapshot the
function is looking at, and therefore what it can see, which has
also struck me more as a tacked-on effect than something inherent
in the declaration's meaning.)

Regards,
-Chap



Re: When IMMUTABLE is not.

От
Yura Sokolov
Дата:
15.06.2023 16:58, chap@anastigmatix.net пишет:
> On 2023-06-15 09:21, Tom Lane wrote:
>> Yura Sokolov <y.sokolov@postgrespro.ru> writes:
>>> not enough to be sure function doesn't manipulate data.
>>
>> Of course not.  It is the user's responsibility to mark functions
>> properly.
>
> And also, isn't it the case that IMMUTABLE should mark a function,
> not merely that "doesn't manipulate data", but whose return value
> doesn't depend in any way on data (outside its own arguments)?
>
> The practice among PLs of choosing an SPI readonly flag based on
> the IMMUTABLE/STABLE/VOLATILE declaration seems to be a sort of
> peculiar heuristic, not something inherent in what that declaration
> means to the optimizer. (And also influences what snapshot the
> function is looking at, and therefore what it can see, which has
> also struck me more as a tacked-on effect than something inherent
> in the declaration's meaning.)

Documentation disagrees:

https://www.postgresql.org/docs/current/sql-createfunction.html#:~:text=IMMUTABLE%0ASTABLE%0AVOLATILE

 > |IMMUTABLE|indicates that the function cannot modify the database and 
always returns the same result when given the same argument values

 > |STABLE|indicates that the function cannot modify the database, and 
that within a single table scan it will consistently return the same 
result for the same argument values, but that its result could change 
across SQL statements.

 > |VOLATILE|indicates that the function value can change even within a 
single table scan, so no optimizations can be made... But note that any 
function that has side-effects must be classified volatile, even if its 
result is quite predictable, to prevent calls from being optimized away




Re: When IMMUTABLE is not.

От
Tom Lane
Дата:
chap@anastigmatix.net writes:
> And also, isn't it the case that IMMUTABLE should mark a function,
> not merely that "doesn't manipulate data", but whose return value
> doesn't depend in any way on data (outside its own arguments)?

Right.  We can't realistically enforce that either, so it's
up to the user.

> The practice among PLs of choosing an SPI readonly flag based on
> the IMMUTABLE/STABLE/VOLATILE declaration seems to be a sort of
> peculiar heuristic, not something inherent in what that declaration
> means to the optimizer. (And also influences what snapshot the
> function is looking at, and therefore what it can see, which has
> also struck me more as a tacked-on effect than something inherent
> in the declaration's meaning.)

Well, it is a bit odd at first sight, but these properties play
together well.  See

https://www.postgresql.org/docs/current/xfunc-volatility.html

            regards, tom lane



Re: When IMMUTABLE is not.

От
chap@anastigmatix.net
Дата:
On 2023-06-15 09:58, chap@anastigmatix.net wrote:
> also influences what snapshot the
> function is looking at, and therefore what it can see, which has
> also struck me more as a tacked-on effect than something inherent
> in the declaration's meaning.

I just re-read that and realized I should anticipate the obvious
response "but how can it matter what the function can see, if
it's IMMUTABLE and depends on no data?".

So, I ran into the effect while working on PL/Java, where the
code of a function isn't all found in pg_proc.prosrc; that just
indicates what code has to be fetched from sqlj.jar_entry.

So one could take a strict view that "no PL/Java function should
ever be marked IMMUTABLE" because every one depends on fetching
something (once, at least).

But on the other hand, it would seem punctilious to say that
f(int x, int y) { return x + y; } isn't IMMUTABLE, only because
it depends on a fetch /of its own implementation/, and overall
its behavior is better described by marking it IMMUTABLE.

Regards,
-Chap



Re: When IMMUTABLE is not.

От
"David G. Johnston"
Дата:
On Thursday, June 15, 2023, <chap@anastigmatix.net> wrote:

So one could take a strict view that "no PL/Java function should
ever be marked IMMUTABLE" because every one depends on fetching
something (once, at least).

The failure to find and execute the function code itself is not a failure mode that these markers need be concerned with.  Assuming one can execute the function an immutable function will give the same answer for the same input for all time.

David J.

Re: When IMMUTABLE is not.

От
chap@anastigmatix.net
Дата:
On 2023-06-15 10:19, David G. Johnston wrote:
> The failure to find and execute the function code itself is not a 
> failure
> mode that these markers need be concerned with.  Assuming one can 
> execute
> the function an immutable function will give the same answer for the 
> same
> input for all time.

That was the view I ultimately took, and just made PL/Java suppress that
SPI readonly flag when going to look for the function code.

Until that change, you could run into the not-uncommon situation
where you've just loaded a jar of new functions and try to use them
in the same transaction, and hey presto, the VOLATILE ones all work,
and the IMMUTABLE ones aren't there yet.

Regards,
-Chap



Re: When IMMUTABLE is not.

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> The failure to find and execute the function code itself is not a failure
> mode that these markers need be concerned with.  Assuming one can execute
> the function an immutable function will give the same answer for the same
> input for all time.

The viewpoint taken in the docs I mentioned is that an IMMUTABLE
marker is a promise from the user to the system about the behavior
of a function.  While the system does provide a few simple tools
to catch obvious errors and to make it easier to write functions
that obey such promises, it's mostly on the user to get it right.

In particular, we've never enforced that an immutable function can't
call non-immutable functions.  While that would seem like a good idea
in the abstract, we've intentionally not tried to do it.  (I'm pretty
sure there is more than one round of previous discussions of the point
in the archives, although locating relevant threads seems hard.)
One reason not to is that polymorphic functions have to be marked
with worst-case volatility labels.  There are plenty of examples of
functions that are stable for some input types and immutable for
others (array_to_string, for instance); but the marking system can't
represent that so we have to label them stable.  Enforcing that a
user-defined immutable function can't use such a function might
just break things for no gain.

            regards, tom lane



Re: When IMMUTABLE is not.

От
Isaac Morland
Дата:
On Thu, 15 Jun 2023 at 10:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:

In particular, we've never enforced that an immutable function can't
call non-immutable functions.  While that would seem like a good idea
in the abstract, we've intentionally not tried to do it.  (I'm pretty
sure there is more than one round of previous discussions of the point
in the archives, although locating relevant threads seems hard.)
One reason not to is that polymorphic functions have to be marked
with worst-case volatility labels.  There are plenty of examples of
functions that are stable for some input types and immutable for
others (array_to_string, for instance); but the marking system can't
represent that so we have to label them stable.  Enforcing that a
user-defined immutable function can't use such a function might
just break things for no gain.

More sophisticated type systems (which I am *not* volunteering to graft onto Postgres) can handle some of this, but even Haskell has unsafePerformIO. The current policy is both wise and practical.

Re: When IMMUTABLE is not.

От
Yura Sokolov
Дата:
15.06.2023 17:49, Tom Lane пишет:
> "David G. Johnston" <david.g.johnston@gmail.com> writes:
>> The failure to find and execute the function code itself is not a failure
>> mode that these markers need be concerned with.  Assuming one can execute
>> the function an immutable function will give the same answer for the same
>> input for all time.
> The viewpoint taken in the docs I mentioned is that an IMMUTABLE
> marker is a promise from the user to the system about the behavior
> of a function.  While the system does provide a few simple tools
> to catch obvious errors and to make it easier to write functions
> that obey such promises, it's mostly on the user to get it right.
>
> In particular, we've never enforced that an immutable function can't
> call non-immutable functions.  While that would seem like a good idea
> in the abstract, we've intentionally not tried to do it.  (I'm pretty
> sure there is more than one round of previous discussions of the point
> in the archives, although locating relevant threads seems hard.)
> One reason not to is that polymorphic functions have to be marked
> with worst-case volatility labels.  There are plenty of examples of
> functions that are stable for some input types and immutable for
> others (array_to_string, for instance); but the marking system can't
> represent that so we have to label them stable.  Enforcing that a
> user-defined immutable function can't use such a function might
> just break things for no gain.

"Stable vs Immutable" is much lesser problem compared to "ReadOnly vs 
Volatile".

Executing fairly read-only function more times than necessary (or less 
times),
doesn't modify data in unexpecting way.

But executing immutable/stable function, that occasionally modifies 
data, could
lead to different unexpected effects due to optimizer decided to call 
them more
or less times than query assumes.

Some vulnerabilities were present due to user defined functions used in 
index
definitions started to modify data. If "read-only" execution were forced 
in index
operations, those issues couldn't happen.

 > it's mostly on the user to get it right.

It is really bad premise. Users does strange things and aren't expected 
to be
professionals who really understand whole PostgreSQL internals.

And it is strange to hear it at the same time we don't allow users to do 
query hints
since "optimizer does better" :-D

Ok, I'd go and cool myself. Certainly I don't get some point.