Обсуждение: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

Поиск
Список
Период
Сортировка

Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Yugo Nagata
Дата:
Hi,

I found that multiple sessions concurrently execute CREATE OR REPLACE FUNCTION
for a same function, the error "tuple concurrently updated" is raised. This is
an internal error output by elog, also the message is not user-friendly.

I've attached a patch to prevent this internal error by locking an exclusive
lock before the command and get the read tuple after acquiring the lock.
Also, if the function has been removed during the lock waiting, the new entry
is created.

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Вложения

Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Yugo Nagata
Дата:
On Mon, 31 Mar 2025 20:00:57 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:

> Hi,
> 
> I found that multiple sessions concurrently execute CREATE OR REPLACE FUNCTION
> for a same function, the error "tuple concurrently updated" is raised. This is
> an internal error output by elog, also the message is not user-friendly.
> 
> I've attached a patch to prevent this internal error by locking an exclusive
> lock before the command and get the read tuple after acquiring the lock.
> Also, if the function has been removed during the lock waiting, the new entry
> is created.

I also found the same error is raised when concurrent ALTER FUNCTION commands are
executed. I've added a patch to fix this in the similar way.

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Вложения

Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Jim Jones
Дата:
Hi!

On 31.03.25 13:22, Yugo Nagata wrote:
> On Mon, 31 Mar 2025 20:00:57 +0900
> Yugo Nagata <nagata@sraoss.co.jp> wrote:
>
>> Hi,
>>
>> I found that multiple sessions concurrently execute CREATE OR REPLACE FUNCTION
>> for a same function, the error "tuple concurrently updated" is raised. This is
>> an internal error output by elog, also the message is not user-friendly.
>>
>> I've attached a patch to prevent this internal error by locking an exclusive
>> lock before the command and get the read tuple after acquiring the lock.
>> Also, if the function has been removed during the lock waiting, the new entry
>> is created.
> I also found the same error is raised when concurrent ALTER FUNCTION commands are
> executed. I've added a patch to fix this in the similar way.
>
> Regards,
> Yugo Nagata


I just briefly tested this patch and it seems to work as expected for
CREATE OF REPLACE FUNCTION:

-- Session 1 (t1):

postgres=# BEGIN;
BEGIN
postgres=*# CREATE OR REPLACE FUNCTION f1()
RETURNS INT LANGUAGE plpgsql AS
$$ BEGIN RETURN 1; END;$$;
CREATE FUNCTION

-- Session 2 (t2)

postgres=# CREATE OR REPLACE FUNCTION f1()
RETURNS INT LANGUAGE plpgsql AS
$$ BEGIN RETURN 2; END;$$;

(wait)

-- Session 3 (t3)

postgres=# CREATE OR REPLACE FUNCTION f1()
RETURNS INT LANGUAGE plpgsql AS
$$ BEGIN RETURN 3; END;$$;

(wait)

-- Session 4 (t4)

postgres=# CREATE OR REPLACE FUNCTION f1()
RETURNS INT LANGUAGE plpgsql AS
$$ BEGIN RETURN 4; END;$$;
CREATE FUNCTION

(wait)

-- Session 1 (t5)

postgres=*# END;
COMMIT

at this point Sessions 2, 3, and 4 were released with: CREATE FUNCTION

-- Session 1 (t6)

postgres=# \sf f1
CREATE OR REPLACE FUNCTION public.f1()
 RETURNS integer
 LANGUAGE plpgsql
AS $function$ BEGIN RETURN 4; END;$function$

So... it no longer shows the error message:

ERROR:  tuple concurrently updated

I did the same for ALTER FUNCTION but I was unable to reproduce the
error your reported. Could you provide your script?


Best regards, Jim





Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
jian he
Дата:
On Mon, Mar 31, 2025 at 7:22 PM Yugo Nagata <nagata@sraoss.co.jp> wrote:
>
> On Mon, 31 Mar 2025 20:00:57 +0900
> Yugo Nagata <nagata@sraoss.co.jp> wrote:
>
> > Hi,
> >
> > I found that multiple sessions concurrently execute CREATE OR REPLACE FUNCTION
> > for a same function, the error "tuple concurrently updated" is raised. This is
> > an internal error output by elog, also the message is not user-friendly.
> >
> > I've attached a patch to prevent this internal error by locking an exclusive
> > lock before the command and get the read tuple after acquiring the lock.
> > Also, if the function has been removed during the lock waiting, the new entry
> > is created.
>
> I also found the same error is raised when concurrent ALTER FUNCTION commands are
> executed. I've added a patch to fix this in the similar way.
>

hi.

+ /* Lock the function so nobody else can do anything with it. */
+ LockDatabaseObject(ProcedureRelationId, oldproc->oid, 0, AccessExclusiveLock);
+
+ /*
+ * It is possible that by the time we acquire the lock on function,
+ * concurrent DDL has removed it. We can test this by checking the
+ * existence of function. We get the tuple again to avoid the risk
+ * of function definition getting changed.
+ */
+ oldtup = SearchSysCacheCopy3(PROCNAMEARGSNSP,
+ PointerGetDatum(procedureName),
+ PointerGetDatum(parameterTypes),
+ ObjectIdGetDatum(procNamespace));

we already called LockDatabaseObject, concurrent DDL can
not do DROP FUNCTION or ALTER FUNCTION.
so no need to call SearchSysCacheCopy3 again?


@@ -553,11 +575,13 @@ ProcedureCreate(const char *procedureName,
  replaces[Anum_pg_proc_proowner - 1] = false;
  replaces[Anum_pg_proc_proacl - 1] = false;

+
+
  /* Okay, do it... */
no need to add these two new lines.



Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Yugo Nagata
Дата:
On Tue, 20 May 2025 17:30:35 +0200
Jim Jones <jim.jones@uni-muenster.de> wrote:

> I just briefly tested this patch and it seems to work as expected for
> CREATE OF REPLACE FUNCTION:

Thank you for reviewing the patch!
 
> I did the same for ALTER FUNCTION but I was unable to reproduce the
> error your reported. Could you provide your script?

I can see the error when two concurrent transactions issue
"alter function f() immutable".

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>



Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Yugo Nagata
Дата:
Hi,


On Thu, 22 May 2025 10:25:58 +0800
jian he <jian.universality@gmail.com> wrote:

Thank you for looking into it.

> + /* Lock the function so nobody else can do anything with it. */
> + LockDatabaseObject(ProcedureRelationId, oldproc->oid, 0, AccessExclusiveLock);
> +
> + /*
> + * It is possible that by the time we acquire the lock on function,
> + * concurrent DDL has removed it. We can test this by checking the
> + * existence of function. We get the tuple again to avoid the risk
> + * of function definition getting changed.
> + */
> + oldtup = SearchSysCacheCopy3(PROCNAMEARGSNSP,
> + PointerGetDatum(procedureName),
> + PointerGetDatum(parameterTypes),
> + ObjectIdGetDatum(procNamespace));
> 
> we already called LockDatabaseObject, concurrent DDL can
> not do DROP FUNCTION or ALTER FUNCTION.
> so no need to call SearchSysCacheCopy3 again?

The function may be dropped *before* we call LockDatabaseObject.
SearchSysCacheCopy3 is called for check this. 
Plese see AlterPublication() as a similar code example.

> 
> @@ -553,11 +575,13 @@ ProcedureCreate(const char *procedureName,
>   replaces[Anum_pg_proc_proowner - 1] = false;
>   replaces[Anum_pg_proc_proacl - 1] = false;
> 
> +
> +
>   /* Okay, do it... */
> no need to add these two new lines.

I'll remove the lines. Thanks.

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>



Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Yugo Nagata
Дата:
On Fri, 23 May 2025 10:37:42 +0800
jian he <jian.universality@gmail.com> wrote:

> On Thu, May 22, 2025 at 10:25 AM jian he <jian.universality@gmail.com> wrote:
> >
> hi.
> earlier, i didn't check patch 0002.
> 
> i think in AlterFunction add
>     /* Lock the function so nobody else can do anything with it. */
>     LockDatabaseObject(ProcedureRelationId, funcOid, 0, AccessExclusiveLock);
> 
> right after
> funcOid = LookupFuncWithArgs(stmt->objtype, stmt->func, false);
> 
> should be fine.

Thank you. That makes sense because we can reduce redundant call of SearchSysCacheCopy1
and HeapTupleIsValid. I've attached a updated patch.

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Вложения

Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
jian he
Дата:
On Tue, May 27, 2025 at 1:35 AM Yugo Nagata <nagata@sraoss.co.jp> wrote:
>
> > + /* Lock the function so nobody else can do anything with it. */
> > + LockDatabaseObject(ProcedureRelationId, oldproc->oid, 0, AccessExclusiveLock);
> > +
> > + /*
> > + * It is possible that by the time we acquire the lock on function,
> > + * concurrent DDL has removed it. We can test this by checking the
> > + * existence of function. We get the tuple again to avoid the risk
> > + * of function definition getting changed.
> > + */
> > + oldtup = SearchSysCacheCopy3(PROCNAMEARGSNSP,
> > + PointerGetDatum(procedureName),
> > + PointerGetDatum(parameterTypes),
> > + ObjectIdGetDatum(procNamespace));
> >
> > we already called LockDatabaseObject, concurrent DDL can
> > not do DROP FUNCTION or ALTER FUNCTION.
> > so no need to call SearchSysCacheCopy3 again?
>
> The function may be dropped *before* we call LockDatabaseObject.
> SearchSysCacheCopy3 is called for check this.
> Plese see AlterPublication() as a similar code example.
>

I am wondering, can we do it the following way for v2-0001?


    /* Check for pre-existing definition */
    oldtup = SearchSysCacheCopy3(PROCNAMEARGSNSP,
                                 PointerGetDatum(procedureName),
                                 PointerGetDatum(parameterTypes),
                                 ObjectIdGetDatum(procNamespace));
    if (HeapTupleIsValid(oldtup))
    {
        /* There is one; okay to replace it? */
        Form_pg_proc oldproc = (Form_pg_proc) GETSTRUCT(oldtup);
        if (!replace)
            ereport(ERROR,
                    (errcode(ERRCODE_DUPLICATE_FUNCTION),
                     errmsg("function \"%s\" already exists with same
argument types",
                            procedureName)));
        /*
         * It is possible that by the time we acquire the lock on function,
         * concurrent DDL has removed it. We can test this by checking the
         * existence of function. We get the tuple again to avoid the risk
         * of function definition getting changed.
         */
        if (!ConditionalLockDatabaseObject(ProcedureRelationId,
oldproc->oid, 0, AccessExclusiveLock))
            oldtup = SearchSysCacheCopy3(PROCNAMEARGSNSP,
                                        PointerGetDatum(procedureName),
                                        PointerGetDatum(parameterTypes),
                                        ObjectIdGetDatum(procNamespace));
    }



Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Yugo Nagata
Дата:
On Tue, 27 May 2025 10:03:58 +0800
jian he <jian.universality@gmail.com> wrote:

> On Tue, May 27, 2025 at 1:35 AM Yugo Nagata <nagata@sraoss.co.jp> wrote:
> >
> > > + /* Lock the function so nobody else can do anything with it. */
> > > + LockDatabaseObject(ProcedureRelationId, oldproc->oid, 0, AccessExclusiveLock);
> > > +
> > > + /*
> > > + * It is possible that by the time we acquire the lock on function,
> > > + * concurrent DDL has removed it. We can test this by checking the
> > > + * existence of function. We get the tuple again to avoid the risk
> > > + * of function definition getting changed.
> > > + */
> > > + oldtup = SearchSysCacheCopy3(PROCNAMEARGSNSP,
> > > + PointerGetDatum(procedureName),
> > > + PointerGetDatum(parameterTypes),
> > > + ObjectIdGetDatum(procNamespace));
> > >
> > > we already called LockDatabaseObject, concurrent DDL can
> > > not do DROP FUNCTION or ALTER FUNCTION.
> > > so no need to call SearchSysCacheCopy3 again?
> >
> > The function may be dropped *before* we call LockDatabaseObject.
> > SearchSysCacheCopy3 is called for check this.
> > Plese see AlterPublication() as a similar code example.
> >
> 
> I am wondering, can we do it the following way for v2-0001?
> 
> 
>     /* Check for pre-existing definition */
>     oldtup = SearchSysCacheCopy3(PROCNAMEARGSNSP,
>                                  PointerGetDatum(procedureName),
>                                  PointerGetDatum(parameterTypes),
>                                  ObjectIdGetDatum(procNamespace));
>     if (HeapTupleIsValid(oldtup))
>     {
>         /* There is one; okay to replace it? */
>         Form_pg_proc oldproc = (Form_pg_proc) GETSTRUCT(oldtup);
>         if (!replace)
>             ereport(ERROR,
>                     (errcode(ERRCODE_DUPLICATE_FUNCTION),
>                      errmsg("function \"%s\" already exists with same
> argument types",
>                             procedureName)));
>         /*
>          * It is possible that by the time we acquire the lock on function,
>          * concurrent DDL has removed it. We can test this by checking the
>          * existence of function. We get the tuple again to avoid the risk
>          * of function definition getting changed.
>          */
>         if (!ConditionalLockDatabaseObject(ProcedureRelationId,
> oldproc->oid, 0, AccessExclusiveLock))
>             oldtup = SearchSysCacheCopy3(PROCNAMEARGSNSP,
>                                         PointerGetDatum(procedureName),
>                                         PointerGetDatum(parameterTypes),
>                                         ObjectIdGetDatum(procNamespace));
>     }

No. This cannot prevent the error "ERROR: tuple concurrently updated"
because it doesn't wait for end of the concurrently running session
if the lock cannot be acquired.

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>



Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Alexander Lakhin
Дата:
Hello Yugo,

31.03.2025 14:22, Yugo Nagata wrote:
I found that multiple sessions concurrently execute CREATE OR REPLACE FUNCTION
for a same function, the error "tuple concurrently updated" is raised. This is
an internal error output by elog, also the message is not user-friendly.
I also found the same error is raised when concurrent ALTER FUNCTION commands are
executed. I've added a patch to fix this in the similar way.

FWIW, the same error is raised also with concurrent GRANT/REVOKE on a
database:
https://www.postgresql.org/message-id/18dcfb7f-5deb-4487-ae22-a2c16839519a%40gmail.com

Maybe you would also find relevant this thread:
https://www.postgresql.org/message-id/flat/ZiYjn0eVc7pxVY45%40ip-10-97-1-34.eu-west-3.compute.internal

Best regards,
Alexander Lakhin
Neon (https://neon.tech)

Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Jim Jones
Дата:
Hi Yugo

On 26.05.25 18:39, Yugo Nagata wrote:
> I can see the error when two concurrent transactions issue
> "alter function f() immutable".


I might have missed something in my last tests... I could now reproduce
the behaviour you mentioned.

I've tested v2 and it works as described. CREATE OR REPLACE FUNCTION and
ALTER TABLE no longer raise an error after the lock by the concurrent
transaction was freed.

One quick question in v2-002:

     tup = SearchSysCacheCopy1(PROCOID, ObjectIdGetDatum(funcOid));
-    if (!HeapTupleIsValid(tup)) /* should not happen */
-        elog(ERROR, "cache lookup failed for function %u", funcOid);
+    if (!HeapTupleIsValid(tup))
+        ereport(ERROR,
+                errcode(ERRCODE_UNDEFINED_OBJECT),
+                errmsg("function \"%s\" does not exist",
+                       NameListToString(stmt->func->objname)));


Is it really ok to change this error message here? Did the addition of
LockDatabaseObject change the semantics of the previous message? Other
similar parts of the code still report "cache lookup failed for function
x". I don't have a strong opinion here, but perhaps we should keep these
messages consistent at least throughout the file?

Thanks!

Best, Jim




Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Yugo Nagata
Дата:
On Tue, 27 May 2025 08:33:42 +0200
Jim Jones <jim.jones@uni-muenster.de> wrote:

> Hi Yugo
> 
> On 26.05.25 18:39, Yugo Nagata wrote:
> > I can see the error when two concurrent transactions issue
> > "alter function f() immutable".
> 
> 
> I might have missed something in my last tests... I could now reproduce
> the behaviour you mentioned.
> 
> I've tested v2 and it works as described. CREATE OR REPLACE FUNCTION and
> ALTER TABLE no longer raise an error after the lock by the concurrent
> transaction was freed.
> 
> One quick question in v2-002:
> 
>      tup = SearchSysCacheCopy1(PROCOID, ObjectIdGetDatum(funcOid));
> -    if (!HeapTupleIsValid(tup)) /* should not happen */
> -        elog(ERROR, "cache lookup failed for function %u", funcOid);
> +    if (!HeapTupleIsValid(tup))
> +        ereport(ERROR,
> +                errcode(ERRCODE_UNDEFINED_OBJECT),
> +                errmsg("function \"%s\" does not exist",
> +                       NameListToString(stmt->func->objname)));
> 
> 
> Is it really ok to change this error message here? Did the addition of
> LockDatabaseObject change the semantics of the previous message? 

Yes. AcceptInvalidationMessages() is called in LockDatabaseObject() after wait,
and this enables the detection of object deletion during the wait.

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>



Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Yugo Nagata
Дата:
On Tue, 27 May 2025 09:00:01 +0300
Alexander Lakhin <exclusion@gmail.com> wrote:

> Hello Yugo,
> 
> 31.03.2025 14:22, Yugo Nagata wrote:
> >> I found that multiple sessions concurrently execute CREATE OR REPLACE FUNCTION
> >> for a same function, the error "tuple concurrently updated" is raised. This is
> >> an internal error output by elog, also the message is not user-friendly.
> > I also found the same error is raised when concurrent ALTER FUNCTION commands are
> > executed. I've added a patch to fix this in the similar way.
> 
> FWIW, the same error is raised also with concurrent GRANT/REVOKE on a
> database:
> https://www.postgresql.org/message-id/18dcfb7f-5deb-4487-ae22-a2c16839519a%40gmail.com
> 
> Maybe you would also find relevant this thread:
> https://www.postgresql.org/message-id/flat/ZiYjn0eVc7pxVY45%40ip-10-97-1-34.eu-west-3.compute.internal

Thank you for sharing the information.

I know there are other scenarios where the same is raises and I agree that
it would be better to consider a more global solution instead of addressing
each of them. However, I am not sure that improving the error message for
each case doesn't not make sense.

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>



Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Yugo Nagata
Дата:
On Tue, 3 Jun 2025 17:39:50 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:

> On Tue, 27 May 2025 09:00:01 +0300
> Alexander Lakhin <exclusion@gmail.com> wrote:

> I know there are other scenarios where the same is raises and I agree that
> it would be better to consider a more global solution instead of addressing
> each of them. However, I am not sure that improving the error message for
> each case doesn't not make sense.

To address the remaining cases where DDL commands fail with the internal
error "ERROR:  tuple concurrently updated" due to insufficient locking,
I would like to propose improving the error reporting to produce a more
appropriate and user-facing error message. This should make it easier for
users to understand the cause of the failure.

Patch 0003 improves the error message shown when concurrent updates to a
system catalog tuple occur, producing output like:

 ERROR:  operation failed due to a concurrent command
 DETAIL: Another command modified the same object in a concurrent session.

Patches 0001 and 0002 are unchanged from v2, except for updated commit messages.
I believe these patches are still useful, as they allow the operation to complete
successfully after waiting, or to behave appropriately when the target function
is dropped by another session during the wait.

Best regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Вложения

Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Yugo Nagata
Дата:
On Thu, 5 Jun 2025 16:26:08 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:

> On Tue, 3 Jun 2025 17:39:50 +0900
> Yugo Nagata <nagata@sraoss.co.jp> wrote:
> 
> > On Tue, 27 May 2025 09:00:01 +0300
> > Alexander Lakhin <exclusion@gmail.com> wrote:
> 
> > I know there are other scenarios where the same is raises and I agree that
> > it would be better to consider a more global solution instead of addressing
> > each of them. However, I am not sure that improving the error message for
> > each case doesn't not make sense.
> 
> To address the remaining cases where DDL commands fail with the internal
> error "ERROR:  tuple concurrently updated" due to insufficient locking,
> I would like to propose improving the error reporting to produce a more
> appropriate and user-facing error message. This should make it easier for
> users to understand the cause of the failure.
> 
> Patch 0003 improves the error message shown when concurrent updates to a
> system catalog tuple occur, producing output like:
> 
>  ERROR:  operation failed due to a concurrent command
>  DETAIL: Another command modified the same object in a concurrent session.
> 
> Patches 0001 and 0002 are unchanged from v2, except for updated commit messages.
> I believe these patches are still useful, as they allow the operation to complete
> successfully after waiting, or to behave appropriately when the target function
> is dropped by another session during the wait.

I found that the error "tuple concurrently updated" was expected as the results
of injection_points test , so I've fixed it so that the new message is expected
instead.

I've attached updated patches.

Best regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Вложения

Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Daniil Davydov
Дата:
Hi,

On Thu, Jun 5, 2025 at 5:21 PM Yugo Nagata <nagata@sraoss.co.jp> wrote:
>
> I've attached updated patches.
>

I have some comments on v4-0001 patch :
1)
heap_freetuple should be called for every tuple that we get from
SearchSysCacheCopy3.
But if tuple is valid after the first SearchSysCacheCopy3, we
overwrite the old pointer (by the second SearchSysCacheCopy3 call) and
forget to free it.
I suggest adding heap_freetuple call before the second SearchSysCacheCopy3 call.

2)
+        Form_pg_proc oldproc = (Form_pg_proc) GETSTRUCT(oldtup);
+        Datum           proargnames;
+        bool            isnull;
+        const char *dropcmd;
Strange alignment. I guess you should keep the same alignment as in
deleted declarations.

3)
This patch fixes postgres behavior if I first create a function and
then try to CREATE OR REPLACE it in concurrent transactions.
But if the function doesn't exist and I try to call CREATE OR REPLACE
in concurrent transactions, I will get an error.
I wrote about it in this thread [1] and Tom Lane said that this
behavior is kinda expected.
Just in case, I decided to mention it here anyway - perhaps you will
have other thoughts on this matter.

[1]
https://www.postgresql.org/message-id/flat/CAJDiXghv2JF5zbLyyybokWKM%2B-GYsTG%2Bhw7xseLNgJOJwf0%2B8w%40mail.gmail.com

--
Best regards,
Daniil Davydov



Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Yugo Nagata
Дата:
On Fri, 27 Jun 2025 18:53:02 +0700
Daniil Davydov <3danissimo@gmail.com> wrote:

> Hi,
> 
> On Thu, Jun 5, 2025 at 5:21 PM Yugo Nagata <nagata@sraoss.co.jp> wrote:
> >
> > I've attached updated patches.
> >
> 
> I have some comments on v4-0001 patch :

Thank you for your comments!

> 1)
> heap_freetuple should be called for every tuple that we get from
> SearchSysCacheCopy3.
> But if tuple is valid after the first SearchSysCacheCopy3, we
> overwrite the old pointer (by the second SearchSysCacheCopy3 call) and
> forget to free it.
> I suggest adding heap_freetuple call before the second SearchSysCacheCopy3 call.

Good catches. Fixed.

> 2)
> +        Form_pg_proc oldproc = (Form_pg_proc) GETSTRUCT(oldtup);
> +        Datum           proargnames;
> +        bool            isnull;
> +        const char *dropcmd;
> Strange alignment. I guess you should keep the same alignment as in
> deleted declarations.

Fixed.

I've attached patches including these fixes.

> 3)
> This patch fixes postgres behavior if I first create a function and
> then try to CREATE OR REPLACE it in concurrent transactions.
> But if the function doesn't exist and I try to call CREATE OR REPLACE
> in concurrent transactions, I will get an error.
> I wrote about it in this thread [1] and Tom Lane said that this
> behavior is kinda expected.
> Just in case, I decided to mention it here anyway - perhaps you will
> have other thoughts on this matter.
> 
> [1]
https://www.postgresql.org/message-id/flat/CAJDiXghv2JF5zbLyyybokWKM%2B-GYsTG%2Bhw7xseLNgJOJwf0%2B8w%40mail.gmail.com

I agree with Tom Lane that the behavior is expected, although it would be better
if the error message were more user-friendly. We could improve it by checking the
unique constraint before calling index_insert in CatalogIndexInsert.

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Вложения

Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Daniil Davydov
Дата:
Hi,

On Mon, Jun 30, 2025 at 3:47 PM Yugo Nagata <nagata@sraoss.co.jp> wrote:
>
> On Fri, 27 Jun 2025 18:53:02 +0700
> Daniil Davydov <3danissimo@gmail.com> wrote:
> > This patch fixes postgres behavior if I first create a function and
> > then try to CREATE OR REPLACE it in concurrent transactions.
> > But if the function doesn't exist and I try to call CREATE OR REPLACE
> > in concurrent transactions, I will get an error.
> > I wrote about it in this thread [1] and Tom Lane said that this
> > behavior is kinda expected.
> > Just in case, I decided to mention it here anyway - perhaps you will
> > have other thoughts on this matter.
> >
> > [1]
https://www.postgresql.org/message-id/flat/CAJDiXghv2JF5zbLyyybokWKM%2B-GYsTG%2Bhw7xseLNgJOJwf0%2B8w%40mail.gmail.com
>
> I agree with Tom Lane that the behavior is expected, although it would be better
> if the error message were more user-friendly. We could improve it by checking the
> unique constraint before calling index_insert in CatalogIndexInsert.
>

As far as I understand, unique constraint checking is specific for
each index access method.
Thus, to implement the proposed idea, you will have to create a
separate callback for check_unique function.
It doesn't seem like a very neat solution, but there aren't many other
options left.

I would suggest intercepting the error (via PG_CATCH), and if it has
the ERRCODE_UNIQUE_VIOLATION code, change the error message (more
precisely, throw another error with the desired message).
If we caught an error during the CatalogTupleInsert call, we can be
sure that the problem is in concurrent execution, because before the
insertion, we checked that such a tuple does not exist.

What do you think? And in general, are you going to fix this behavior
within this thread?

P.S.
Thank you for updating the patch.

--
Best regards,
Daniil Davydov



Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Daniil Davydov
Дата:
Hi,

On Tue, Jul 1, 2025 at 5:47 PM Yugo Nagata <nagata@sraoss.co.jp> wrote:
>
> On Mon, 30 Jun 2025 18:32:47 +0700
> Daniil Davydov <3danissimo@gmail.com> wrote:
>
> > On Mon, Jun 30, 2025 at 3:47 PM Yugo Nagata <nagata@sraoss.co.jp> wrote:
> > >
> > > I agree with Tom Lane that the behavior is expected, although it would be better
> > > if the error message were more user-friendly. We could improve it by checking the
> > > unique constraint before calling index_insert in CatalogIndexInsert.
> > >
> >
> > As far as I understand, unique constraint checking is specific for
> > each index access method.
> > Thus, to implement the proposed idea, you will have to create a
> > separate callback for check_unique function.
> > It doesn't seem like a very neat solution, but there aren't many other
> > options left.
>
> I believe check_exclusion_or_unique_constraint() can be used independently of
> a specific index access method.
>
> > I would suggest intercepting the error (via PG_CATCH), and if it has
> > the ERRCODE_UNIQUE_VIOLATION code, change the error message (more
> > precisely, throw another error with the desired message).
> > If we caught an error during the CatalogTupleInsert call, we can be
> > sure that the problem is in concurrent execution, because before the
> > insertion, we checked that such a tuple does not exist.
> >
> > What do you think? And in general, are you going to fix this behavior
> > within this thread?
>
> Initially, I wasn't planning to do so, but I gave it a try and wrote a
> patch to fix the issue based on my idea.

Thanks for the patch! Some comments on it :
1)
I found two typos :
+    if (HeapTupleIsHeapOenly(heapTuple) && !onlySummarized)
and
+            sartisfied = check_unique_constraint(heapRelation,

2)
CatalogIndexInsert is kinda "popular" function. It can be called in
different situations, not in all of which a violation of unique
constraint means an error due to competitiveness.

For example, with this patch such a query : "CREATE TYPE mood AS ENUM
('happy', 'sad', 'happy');"
Will throw this error : "operation failed due to a concurrent command"
Of course, it isn't true.

That is why I suggested handling unique violations exactly inside
ProcedureCreate - the only place where we can be sure about reasons of
error.

--
Best regards,
Daniil Davydov



Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Yugo Nagata
Дата:
On Thu, 3 Jul 2025 23:18:12 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:

> On Tue, 1 Jul 2025 18:56:11 +0700
> Daniil Davydov <3danissimo@gmail.com> wrote:

> > For example, with this patch such a query : "CREATE TYPE mood AS ENUM
> > ('happy', 'sad', 'happy');"
> > Will throw this error : "operation failed due to a concurrent command"
> > Of course, it isn't true
> 
> You're right ― this error is not caused by a concurrent command.
> However, I believe the error message in cases like creating an ENUM type with
> duplicate labels could be improved to explain the issue more clearly, rather
> than just reporting it as a unique constraint violation.

I have submitted a patch addressing this in a separate thread [1].

[1] https://www.postgresql.org/message-id/20250704000402.37e605ab0c59c300965a17ee%40sraoss.co.jp

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>



Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Daniil Davydov
Дата:
Hi,

On Thu, Jul 3, 2025 at 9:18 PM Yugo Nagata <nagata@sraoss.co.jp> wrote:
>
> On Tue, 1 Jul 2025 18:56:11 +0700
> Daniil Davydov <3danissimo@gmail.com> wrote:
>
> > CatalogIndexInsert is kinda "popular" function. It can be called in
> > different situations, not in all of which a violation of unique
> > constraint means an error due to competitiveness.
> >
> > For example, with this patch such a query : "CREATE TYPE mood AS ENUM
> > ('happy', 'sad', 'happy');"
> > Will throw this error : "operation failed due to a concurrent command"
> > Of course, it isn't true
>
> You're right — this error is not caused by a concurrent command.
> However, I believe the error message in cases like creating an ENUM type with
> duplicate labels could be improved to explain the issue more clearly, rather
> than just reporting it as a unique constraint violation.
>
> In any case, a unique constraint violation in a system catalog is not necessarily
> due to concurrent DDL. Therefore, the error message shouldn't suggest that as the
> only cause. Instead, it should clearly report the constraint violation as the primary
> issue, and mention concurrent DDL as just one possible explanation in HINT.
>
> I've updated the patch accordingly to reflect this direction in the error message.
>
>  ERROR:  operation failed due to duplicate key object
>  DETAIL:  Key (proname, proargtypes, pronamespace)=(fnc, , 2200) already exists in unique index
pg_proc_proname_args_nsp_index.
>  HINT:  Another command might have created a object with the same key in a concurrent session.
>
> However, as a result, the message ends up being similar to the current one raised
> by the btree code, so the overall improvement in user-friendliness might be limited.
>

Thanks for updating the patch!
+1 for adding such a hint for this error.

> > That is why I suggested handling unique violations exactly inside
> > ProcedureCreate - the only place where we can be sure about reasons of
> > error.
>
> If we were to fix the error message outside of CatalogIndexInsert, we would need to
> modify CatalogTupleInsert, CatalogTupleUpdate, and related functions to allow them to
> report the failure appropriately.
>
> You suggested using PG_TRY/PG_CATCH, but these do not suppress the error message from
> the btree code, so this approach seems not to fully address the issue.
>
> Moreover, the places affected are not limited to ProcedureCreate, for example,
> concurrent CREATE TABLE commands can also lead to the same situation, and possibly
> other commands as well.

Actually, we can suppress errors from btree (by flushing error context
and creating another), but it doesn't look like the best design
decision.
It was an idea for one concrete error fix. Anyway,I like the
correction you suggested better.


--
Best regards,
Daniil Davydov



Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Yugo Nagata
Дата:
On Fri, 4 Jul 2025 10:48:26 +0700
Daniil Davydov <3danissimo@gmail.com> wrote:

> Hi,
> 
> On Thu, Jul 3, 2025 at 9:18 PM Yugo Nagata <nagata@sraoss.co.jp> wrote:
> >
> > On Tue, 1 Jul 2025 18:56:11 +0700
> > Daniil Davydov <3danissimo@gmail.com> wrote:
> >
> > > CatalogIndexInsert is kinda "popular" function. It can be called in
> > > different situations, not in all of which a violation of unique
> > > constraint means an error due to competitiveness.
> > >
> > > For example, with this patch such a query : "CREATE TYPE mood AS ENUM
> > > ('happy', 'sad', 'happy');"
> > > Will throw this error : "operation failed due to a concurrent command"
> > > Of course, it isn't true
> >
> > You're right — this error is not caused by a concurrent command.
> > However, I believe the error message in cases like creating an ENUM type with
> > duplicate labels could be improved to explain the issue more clearly, rather
> > than just reporting it as a unique constraint violation.
> >
> > In any case, a unique constraint violation in a system catalog is not necessarily
> > due to concurrent DDL. Therefore, the error message shouldn't suggest that as the
> > only cause. Instead, it should clearly report the constraint violation as the primary
> > issue, and mention concurrent DDL as just one possible explanation in HINT.
> >
> > I've updated the patch accordingly to reflect this direction in the error message.
> >
> >  ERROR:  operation failed due to duplicate key object
> >  DETAIL:  Key (proname, proargtypes, pronamespace)=(fnc, , 2200) already exists in unique index
pg_proc_proname_args_nsp_index.
> >  HINT:  Another command might have created a object with the same key in a concurrent session.
> >
> > However, as a result, the message ends up being similar to the current one raised
> > by the btree code, so the overall improvement in user-friendliness might be limited.
> >
> 
> Thanks for updating the patch!
> +1 for adding such a hint for this error.
> 
> > > That is why I suggested handling unique violations exactly inside
> > > ProcedureCreate - the only place where we can be sure about reasons of
> > > error.
> >
> > If we were to fix the error message outside of CatalogIndexInsert, we would need to
> > modify CatalogTupleInsert, CatalogTupleUpdate, and related functions to allow them to
> > report the failure appropriately.
> >
> > You suggested using PG_TRY/PG_CATCH, but these do not suppress the error message from
> > the btree code, so this approach seems not to fully address the issue.
> >
> > Moreover, the places affected are not limited to ProcedureCreate, for example,
> > concurrent CREATE TABLE commands can also lead to the same situation, and possibly
> > other commands as well.
> 
> Actually, we can suppress errors from btree (by flushing error context
> and creating another),

Right. That was my misunderstanding.

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>



Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Yugo Nagata
Дата:
On Fri, 4 Jul 2025 14:58:05 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:

> On Fri, 4 Jul 2025 10:48:26 +0700
> Daniil Davydov <3danissimo@gmail.com> wrote:
> 
> > Hi,
> > 
> > On Thu, Jul 3, 2025 at 9:18 PM Yugo Nagata <nagata@sraoss.co.jp> wrote:
> > >
> > > On Tue, 1 Jul 2025 18:56:11 +0700
> > > Daniil Davydov <3danissimo@gmail.com> wrote:
> > >
> > > > CatalogIndexInsert is kinda "popular" function. It can be called in
> > > > different situations, not in all of which a violation of unique
> > > > constraint means an error due to competitiveness.
> > > >
> > > > For example, with this patch such a query : "CREATE TYPE mood AS ENUM
> > > > ('happy', 'sad', 'happy');"
> > > > Will throw this error : "operation failed due to a concurrent command"
> > > > Of course, it isn't true
> > >
> > > You're right — this error is not caused by a concurrent command.
> > > However, I believe the error message in cases like creating an ENUM type with
> > > duplicate labels could be improved to explain the issue more clearly, rather
> > > than just reporting it as a unique constraint violation.
> > >
> > > In any case, a unique constraint violation in a system catalog is not necessarily
> > > due to concurrent DDL. Therefore, the error message shouldn't suggest that as the
> > > only cause. Instead, it should clearly report the constraint violation as the primary
> > > issue, and mention concurrent DDL as just one possible explanation in HINT.
> > >
> > > I've updated the patch accordingly to reflect this direction in the error message.
> > >
> > >  ERROR:  operation failed due to duplicate key object
> > >  DETAIL:  Key (proname, proargtypes, pronamespace)=(fnc, , 2200) already exists in unique index
pg_proc_proname_args_nsp_index.
> > >  HINT:  Another command might have created a object with the same key in a concurrent session.
> > >
> > > However, as a result, the message ends up being similar to the current one raised
> > > by the btree code, so the overall improvement in user-friendliness might be limited.
> > >
> > 
> > Thanks for updating the patch!
> > +1 for adding such a hint for this error.

I've attached updated patches since I found some test failed.

Regards,
Yugo Nagata


-- 
Yugo Nagata <nagata@sraoss.co.jp>

Вложения

Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Yugo Nagata
Дата:
On Thu, 17 Jul 2025 14:09:14 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:

> On Fri, 4 Jul 2025 14:58:05 +0900
> Yugo Nagata <nagata@sraoss.co.jp> wrote:
> 
> > On Fri, 4 Jul 2025 10:48:26 +0700
> > Daniil Davydov <3danissimo@gmail.com> wrote:
> > 
> > > Hi,
> > > 
> > > On Thu, Jul 3, 2025 at 9:18 PM Yugo Nagata <nagata@sraoss.co.jp> wrote:
> > > >
> > > > On Tue, 1 Jul 2025 18:56:11 +0700
> > > > Daniil Davydov <3danissimo@gmail.com> wrote:
> > > >
> > > > > CatalogIndexInsert is kinda "popular" function. It can be called in
> > > > > different situations, not in all of which a violation of unique
> > > > > constraint means an error due to competitiveness.
> > > > >
> > > > > For example, with this patch such a query : "CREATE TYPE mood AS ENUM
> > > > > ('happy', 'sad', 'happy');"
> > > > > Will throw this error : "operation failed due to a concurrent command"
> > > > > Of course, it isn't true
> > > >
> > > > You're right — this error is not caused by a concurrent command.
> > > > However, I believe the error message in cases like creating an ENUM type with
> > > > duplicate labels could be improved to explain the issue more clearly, rather
> > > > than just reporting it as a unique constraint violation.
> > > >
> > > > In any case, a unique constraint violation in a system catalog is not necessarily
> > > > due to concurrent DDL. Therefore, the error message shouldn't suggest that as the
> > > > only cause. Instead, it should clearly report the constraint violation as the primary
> > > > issue, and mention concurrent DDL as just one possible explanation in HINT.
> > > >
> > > > I've updated the patch accordingly to reflect this direction in the error message.
> > > >
> > > >  ERROR:  operation failed due to duplicate key object
> > > >  DETAIL:  Key (proname, proargtypes, pronamespace)=(fnc, , 2200) already exists in unique index
pg_proc_proname_args_nsp_index.
> > > >  HINT:  Another command might have created a object with the same key in a concurrent session.
> > > >
> > > > However, as a result, the message ends up being similar to the current one raised
> > > > by the btree code, so the overall improvement in user-friendliness might be limited.
> > > >
> > > 
> > > Thanks for updating the patch!
> > > +1 for adding such a hint for this error.

I've fixed the following cfbot failure:

 SUMMARY: AddressSanitizer: heap-use-after-free /tmp/cirrus-ci-build/src/backend/catalog/pg_proc.c:406 in
ProcedureCreate

This was caused by accessing oldproc->oid after oldtup had already been freed.

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Вложения

Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Yugo Nagata
Дата:
On Wed, 20 Aug 2025 17:01:56 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:

> On Thu, 17 Jul 2025 14:09:14 +0900
> Yugo Nagata <nagata@sraoss.co.jp> wrote:
> 
> > On Fri, 4 Jul 2025 14:58:05 +0900
> > Yugo Nagata <nagata@sraoss.co.jp> wrote:
> > 
> > > On Fri, 4 Jul 2025 10:48:26 +0700
> > > Daniil Davydov <3danissimo@gmail.com> wrote:
> > > 
> > > > Hi,
> > > > 
> > > > On Thu, Jul 3, 2025 at 9:18 PM Yugo Nagata <nagata@sraoss.co.jp> wrote:
> > > > >
> > > > > On Tue, 1 Jul 2025 18:56:11 +0700
> > > > > Daniil Davydov <3danissimo@gmail.com> wrote:
> > > > >
> > > > > > CatalogIndexInsert is kinda "popular" function. It can be called in
> > > > > > different situations, not in all of which a violation of unique
> > > > > > constraint means an error due to competitiveness.
> > > > > >
> > > > > > For example, with this patch such a query : "CREATE TYPE mood AS ENUM
> > > > > > ('happy', 'sad', 'happy');"
> > > > > > Will throw this error : "operation failed due to a concurrent command"
> > > > > > Of course, it isn't true
> > > > >
> > > > > You're right — this error is not caused by a concurrent command.
> > > > > However, I believe the error message in cases like creating an ENUM type with
> > > > > duplicate labels could be improved to explain the issue more clearly, rather
> > > > > than just reporting it as a unique constraint violation.
> > > > >
> > > > > In any case, a unique constraint violation in a system catalog is not necessarily
> > > > > due to concurrent DDL. Therefore, the error message shouldn't suggest that as the
> > > > > only cause. Instead, it should clearly report the constraint violation as the primary
> > > > > issue, and mention concurrent DDL as just one possible explanation in HINT.
> > > > >
> > > > > I've updated the patch accordingly to reflect this direction in the error message.
> > > > >
> > > > >  ERROR:  operation failed due to duplicate key object
> > > > >  DETAIL:  Key (proname, proargtypes, pronamespace)=(fnc, , 2200) already exists in unique index
pg_proc_proname_args_nsp_index.
> > > > >  HINT:  Another command might have created a object with the same key in a concurrent session.
> > > > >
> > > > > However, as a result, the message ends up being similar to the current one raised
> > > > > by the btree code, so the overall improvement in user-friendliness might be limited.
> > > > >
> > > > 
> > > > Thanks for updating the patch!
> > > > +1 for adding such a hint for this error.

I've attached rebase patches.

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Вложения

Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

От
Daniil Davydov
Дата:
Hi,

On Tue, Sep 30, 2025 at 9:02 AM Yugo Nagata <nagata@sraoss.co.jp> wrote:
>
> I've attached rebase patches.
>

It seems redundant to me that in CatalogIndexInsert we first check the
unique constraint, and then pass the UNIQUE_CHECK_YES parameter
to the index_insert function call below. As far as I understand, after
check_unique_constraint we can be sure that everything is okay with
inserted values.  Am I missing something?

Also, why should we add "IsCatalogRelation(heapRelation)" check inside
the CatalogIndexInsert function? We know for sure that a given table is a
catalog relation.

BTW, what do you think about adding an isolation test for a concurrent
"CREATE OR REPLACE FUNCTION" command? This is one of the
'problems' we're struggling with, but I don't see this case  being explicitly
tested anywhere.

All other changes look good to me.

--
Best regards,
Daniil Davydov