Обсуждение: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION
Hi, I found that multiple sessions concurrently execute CREATE OR REPLACE FUNCTION for a same function, the error "tuple concurrently updated" is raised. This is an internal error output by elog, also the message is not user-friendly. I've attached a patch to prevent this internal error by locking an exclusive lock before the command and get the read tuple after acquiring the lock. Also, if the function has been removed during the lock waiting, the new entry is created. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Вложения
On Mon, 31 Mar 2025 20:00:57 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > Hi, > > I found that multiple sessions concurrently execute CREATE OR REPLACE FUNCTION > for a same function, the error "tuple concurrently updated" is raised. This is > an internal error output by elog, also the message is not user-friendly. > > I've attached a patch to prevent this internal error by locking an exclusive > lock before the command and get the read tuple after acquiring the lock. > Also, if the function has been removed during the lock waiting, the new entry > is created. I also found the same error is raised when concurrent ALTER FUNCTION commands are executed. I've added a patch to fix this in the similar way. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Вложения
Hi! On 31.03.25 13:22, Yugo Nagata wrote: > On Mon, 31 Mar 2025 20:00:57 +0900 > Yugo Nagata <nagata@sraoss.co.jp> wrote: > >> Hi, >> >> I found that multiple sessions concurrently execute CREATE OR REPLACE FUNCTION >> for a same function, the error "tuple concurrently updated" is raised. This is >> an internal error output by elog, also the message is not user-friendly. >> >> I've attached a patch to prevent this internal error by locking an exclusive >> lock before the command and get the read tuple after acquiring the lock. >> Also, if the function has been removed during the lock waiting, the new entry >> is created. > I also found the same error is raised when concurrent ALTER FUNCTION commands are > executed. I've added a patch to fix this in the similar way. > > Regards, > Yugo Nagata I just briefly tested this patch and it seems to work as expected for CREATE OF REPLACE FUNCTION: -- Session 1 (t1): postgres=# BEGIN; BEGIN postgres=*# CREATE OR REPLACE FUNCTION f1() RETURNS INT LANGUAGE plpgsql AS $$ BEGIN RETURN 1; END;$$; CREATE FUNCTION -- Session 2 (t2) postgres=# CREATE OR REPLACE FUNCTION f1() RETURNS INT LANGUAGE plpgsql AS $$ BEGIN RETURN 2; END;$$; (wait) -- Session 3 (t3) postgres=# CREATE OR REPLACE FUNCTION f1() RETURNS INT LANGUAGE plpgsql AS $$ BEGIN RETURN 3; END;$$; (wait) -- Session 4 (t4) postgres=# CREATE OR REPLACE FUNCTION f1() RETURNS INT LANGUAGE plpgsql AS $$ BEGIN RETURN 4; END;$$; CREATE FUNCTION (wait) -- Session 1 (t5) postgres=*# END; COMMIT at this point Sessions 2, 3, and 4 were released with: CREATE FUNCTION -- Session 1 (t6) postgres=# \sf f1 CREATE OR REPLACE FUNCTION public.f1() RETURNS integer LANGUAGE plpgsql AS $function$ BEGIN RETURN 4; END;$function$ So... it no longer shows the error message: ERROR: tuple concurrently updated I did the same for ALTER FUNCTION but I was unable to reproduce the error your reported. Could you provide your script? Best regards, Jim
On Mon, Mar 31, 2025 at 7:22 PM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > On Mon, 31 Mar 2025 20:00:57 +0900 > Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > Hi, > > > > I found that multiple sessions concurrently execute CREATE OR REPLACE FUNCTION > > for a same function, the error "tuple concurrently updated" is raised. This is > > an internal error output by elog, also the message is not user-friendly. > > > > I've attached a patch to prevent this internal error by locking an exclusive > > lock before the command and get the read tuple after acquiring the lock. > > Also, if the function has been removed during the lock waiting, the new entry > > is created. > > I also found the same error is raised when concurrent ALTER FUNCTION commands are > executed. I've added a patch to fix this in the similar way. > hi. + /* Lock the function so nobody else can do anything with it. */ + LockDatabaseObject(ProcedureRelationId, oldproc->oid, 0, AccessExclusiveLock); + + /* + * It is possible that by the time we acquire the lock on function, + * concurrent DDL has removed it. We can test this by checking the + * existence of function. We get the tuple again to avoid the risk + * of function definition getting changed. + */ + oldtup = SearchSysCacheCopy3(PROCNAMEARGSNSP, + PointerGetDatum(procedureName), + PointerGetDatum(parameterTypes), + ObjectIdGetDatum(procNamespace)); we already called LockDatabaseObject, concurrent DDL can not do DROP FUNCTION or ALTER FUNCTION. so no need to call SearchSysCacheCopy3 again? @@ -553,11 +575,13 @@ ProcedureCreate(const char *procedureName, replaces[Anum_pg_proc_proowner - 1] = false; replaces[Anum_pg_proc_proacl - 1] = false; + + /* Okay, do it... */ no need to add these two new lines.
On Tue, 20 May 2025 17:30:35 +0200 Jim Jones <jim.jones@uni-muenster.de> wrote: > I just briefly tested this patch and it seems to work as expected for > CREATE OF REPLACE FUNCTION: Thank you for reviewing the patch! > I did the same for ALTER FUNCTION but I was unable to reproduce the > error your reported. Could you provide your script? I can see the error when two concurrent transactions issue "alter function f() immutable". Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Hi, On Thu, 22 May 2025 10:25:58 +0800 jian he <jian.universality@gmail.com> wrote: Thank you for looking into it. > + /* Lock the function so nobody else can do anything with it. */ > + LockDatabaseObject(ProcedureRelationId, oldproc->oid, 0, AccessExclusiveLock); > + > + /* > + * It is possible that by the time we acquire the lock on function, > + * concurrent DDL has removed it. We can test this by checking the > + * existence of function. We get the tuple again to avoid the risk > + * of function definition getting changed. > + */ > + oldtup = SearchSysCacheCopy3(PROCNAMEARGSNSP, > + PointerGetDatum(procedureName), > + PointerGetDatum(parameterTypes), > + ObjectIdGetDatum(procNamespace)); > > we already called LockDatabaseObject, concurrent DDL can > not do DROP FUNCTION or ALTER FUNCTION. > so no need to call SearchSysCacheCopy3 again? The function may be dropped *before* we call LockDatabaseObject. SearchSysCacheCopy3 is called for check this. Plese see AlterPublication() as a similar code example. > > @@ -553,11 +575,13 @@ ProcedureCreate(const char *procedureName, > replaces[Anum_pg_proc_proowner - 1] = false; > replaces[Anum_pg_proc_proacl - 1] = false; > > + > + > /* Okay, do it... */ > no need to add these two new lines. I'll remove the lines. Thanks. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
On Fri, 23 May 2025 10:37:42 +0800 jian he <jian.universality@gmail.com> wrote: > On Thu, May 22, 2025 at 10:25 AM jian he <jian.universality@gmail.com> wrote: > > > hi. > earlier, i didn't check patch 0002. > > i think in AlterFunction add > /* Lock the function so nobody else can do anything with it. */ > LockDatabaseObject(ProcedureRelationId, funcOid, 0, AccessExclusiveLock); > > right after > funcOid = LookupFuncWithArgs(stmt->objtype, stmt->func, false); > > should be fine. Thank you. That makes sense because we can reduce redundant call of SearchSysCacheCopy1 and HeapTupleIsValid. I've attached a updated patch. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Вложения
On Tue, May 27, 2025 at 1:35 AM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > + /* Lock the function so nobody else can do anything with it. */ > > + LockDatabaseObject(ProcedureRelationId, oldproc->oid, 0, AccessExclusiveLock); > > + > > + /* > > + * It is possible that by the time we acquire the lock on function, > > + * concurrent DDL has removed it. We can test this by checking the > > + * existence of function. We get the tuple again to avoid the risk > > + * of function definition getting changed. > > + */ > > + oldtup = SearchSysCacheCopy3(PROCNAMEARGSNSP, > > + PointerGetDatum(procedureName), > > + PointerGetDatum(parameterTypes), > > + ObjectIdGetDatum(procNamespace)); > > > > we already called LockDatabaseObject, concurrent DDL can > > not do DROP FUNCTION or ALTER FUNCTION. > > so no need to call SearchSysCacheCopy3 again? > > The function may be dropped *before* we call LockDatabaseObject. > SearchSysCacheCopy3 is called for check this. > Plese see AlterPublication() as a similar code example. > I am wondering, can we do it the following way for v2-0001? /* Check for pre-existing definition */ oldtup = SearchSysCacheCopy3(PROCNAMEARGSNSP, PointerGetDatum(procedureName), PointerGetDatum(parameterTypes), ObjectIdGetDatum(procNamespace)); if (HeapTupleIsValid(oldtup)) { /* There is one; okay to replace it? */ Form_pg_proc oldproc = (Form_pg_proc) GETSTRUCT(oldtup); if (!replace) ereport(ERROR, (errcode(ERRCODE_DUPLICATE_FUNCTION), errmsg("function \"%s\" already exists with same argument types", procedureName))); /* * It is possible that by the time we acquire the lock on function, * concurrent DDL has removed it. We can test this by checking the * existence of function. We get the tuple again to avoid the risk * of function definition getting changed. */ if (!ConditionalLockDatabaseObject(ProcedureRelationId, oldproc->oid, 0, AccessExclusiveLock)) oldtup = SearchSysCacheCopy3(PROCNAMEARGSNSP, PointerGetDatum(procedureName), PointerGetDatum(parameterTypes), ObjectIdGetDatum(procNamespace)); }
On Tue, 27 May 2025 10:03:58 +0800 jian he <jian.universality@gmail.com> wrote: > On Tue, May 27, 2025 at 1:35 AM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > > > + /* Lock the function so nobody else can do anything with it. */ > > > + LockDatabaseObject(ProcedureRelationId, oldproc->oid, 0, AccessExclusiveLock); > > > + > > > + /* > > > + * It is possible that by the time we acquire the lock on function, > > > + * concurrent DDL has removed it. We can test this by checking the > > > + * existence of function. We get the tuple again to avoid the risk > > > + * of function definition getting changed. > > > + */ > > > + oldtup = SearchSysCacheCopy3(PROCNAMEARGSNSP, > > > + PointerGetDatum(procedureName), > > > + PointerGetDatum(parameterTypes), > > > + ObjectIdGetDatum(procNamespace)); > > > > > > we already called LockDatabaseObject, concurrent DDL can > > > not do DROP FUNCTION or ALTER FUNCTION. > > > so no need to call SearchSysCacheCopy3 again? > > > > The function may be dropped *before* we call LockDatabaseObject. > > SearchSysCacheCopy3 is called for check this. > > Plese see AlterPublication() as a similar code example. > > > > I am wondering, can we do it the following way for v2-0001? > > > /* Check for pre-existing definition */ > oldtup = SearchSysCacheCopy3(PROCNAMEARGSNSP, > PointerGetDatum(procedureName), > PointerGetDatum(parameterTypes), > ObjectIdGetDatum(procNamespace)); > if (HeapTupleIsValid(oldtup)) > { > /* There is one; okay to replace it? */ > Form_pg_proc oldproc = (Form_pg_proc) GETSTRUCT(oldtup); > if (!replace) > ereport(ERROR, > (errcode(ERRCODE_DUPLICATE_FUNCTION), > errmsg("function \"%s\" already exists with same > argument types", > procedureName))); > /* > * It is possible that by the time we acquire the lock on function, > * concurrent DDL has removed it. We can test this by checking the > * existence of function. We get the tuple again to avoid the risk > * of function definition getting changed. > */ > if (!ConditionalLockDatabaseObject(ProcedureRelationId, > oldproc->oid, 0, AccessExclusiveLock)) > oldtup = SearchSysCacheCopy3(PROCNAMEARGSNSP, > PointerGetDatum(procedureName), > PointerGetDatum(parameterTypes), > ObjectIdGetDatum(procNamespace)); > } No. This cannot prevent the error "ERROR: tuple concurrently updated" because it doesn't wait for end of the concurrently running session if the lock cannot be acquired. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Hello Yugo,
31.03.2025 14:22, Yugo Nagata wrote:
31.03.2025 14:22, Yugo Nagata wrote:
I found that multiple sessions concurrently execute CREATE OR REPLACE FUNCTION for a same function, the error "tuple concurrently updated" is raised. This is an internal error output by elog, also the message is not user-friendly.I also found the same error is raised when concurrent ALTER FUNCTION commands are executed. I've added a patch to fix this in the similar way.
FWIW, the same error is raised also with concurrent GRANT/REVOKE on a
database:
https://www.postgresql.org/message-id/18dcfb7f-5deb-4487-ae22-a2c16839519a%40gmail.com
Maybe you would also find relevant this thread:
https://www.postgresql.org/message-id/flat/ZiYjn0eVc7pxVY45%40ip-10-97-1-34.eu-west-3.compute.internal
Best regards,
Alexander Lakhin
Neon (https://neon.tech)
Hi Yugo On 26.05.25 18:39, Yugo Nagata wrote: > I can see the error when two concurrent transactions issue > "alter function f() immutable". I might have missed something in my last tests... I could now reproduce the behaviour you mentioned. I've tested v2 and it works as described. CREATE OR REPLACE FUNCTION and ALTER TABLE no longer raise an error after the lock by the concurrent transaction was freed. One quick question in v2-002: tup = SearchSysCacheCopy1(PROCOID, ObjectIdGetDatum(funcOid)); - if (!HeapTupleIsValid(tup)) /* should not happen */ - elog(ERROR, "cache lookup failed for function %u", funcOid); + if (!HeapTupleIsValid(tup)) + ereport(ERROR, + errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg("function \"%s\" does not exist", + NameListToString(stmt->func->objname))); Is it really ok to change this error message here? Did the addition of LockDatabaseObject change the semantics of the previous message? Other similar parts of the code still report "cache lookup failed for function x". I don't have a strong opinion here, but perhaps we should keep these messages consistent at least throughout the file? Thanks! Best, Jim
On Tue, 27 May 2025 08:33:42 +0200 Jim Jones <jim.jones@uni-muenster.de> wrote: > Hi Yugo > > On 26.05.25 18:39, Yugo Nagata wrote: > > I can see the error when two concurrent transactions issue > > "alter function f() immutable". > > > I might have missed something in my last tests... I could now reproduce > the behaviour you mentioned. > > I've tested v2 and it works as described. CREATE OR REPLACE FUNCTION and > ALTER TABLE no longer raise an error after the lock by the concurrent > transaction was freed. > > One quick question in v2-002: > > tup = SearchSysCacheCopy1(PROCOID, ObjectIdGetDatum(funcOid)); > - if (!HeapTupleIsValid(tup)) /* should not happen */ > - elog(ERROR, "cache lookup failed for function %u", funcOid); > + if (!HeapTupleIsValid(tup)) > + ereport(ERROR, > + errcode(ERRCODE_UNDEFINED_OBJECT), > + errmsg("function \"%s\" does not exist", > + NameListToString(stmt->func->objname))); > > > Is it really ok to change this error message here? Did the addition of > LockDatabaseObject change the semantics of the previous message? Yes. AcceptInvalidationMessages() is called in LockDatabaseObject() after wait, and this enables the detection of object deletion during the wait. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
On Tue, 27 May 2025 09:00:01 +0300 Alexander Lakhin <exclusion@gmail.com> wrote: > Hello Yugo, > > 31.03.2025 14:22, Yugo Nagata wrote: > >> I found that multiple sessions concurrently execute CREATE OR REPLACE FUNCTION > >> for a same function, the error "tuple concurrently updated" is raised. This is > >> an internal error output by elog, also the message is not user-friendly. > > I also found the same error is raised when concurrent ALTER FUNCTION commands are > > executed. I've added a patch to fix this in the similar way. > > FWIW, the same error is raised also with concurrent GRANT/REVOKE on a > database: > https://www.postgresql.org/message-id/18dcfb7f-5deb-4487-ae22-a2c16839519a%40gmail.com > > Maybe you would also find relevant this thread: > https://www.postgresql.org/message-id/flat/ZiYjn0eVc7pxVY45%40ip-10-97-1-34.eu-west-3.compute.internal Thank you for sharing the information. I know there are other scenarios where the same is raises and I agree that it would be better to consider a more global solution instead of addressing each of them. However, I am not sure that improving the error message for each case doesn't not make sense. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
On Tue, 3 Jun 2025 17:39:50 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > On Tue, 27 May 2025 09:00:01 +0300 > Alexander Lakhin <exclusion@gmail.com> wrote: > I know there are other scenarios where the same is raises and I agree that > it would be better to consider a more global solution instead of addressing > each of them. However, I am not sure that improving the error message for > each case doesn't not make sense. To address the remaining cases where DDL commands fail with the internal error "ERROR: tuple concurrently updated" due to insufficient locking, I would like to propose improving the error reporting to produce a more appropriate and user-facing error message. This should make it easier for users to understand the cause of the failure. Patch 0003 improves the error message shown when concurrent updates to a system catalog tuple occur, producing output like: ERROR: operation failed due to a concurrent command DETAIL: Another command modified the same object in a concurrent session. Patches 0001 and 0002 are unchanged from v2, except for updated commit messages. I believe these patches are still useful, as they allow the operation to complete successfully after waiting, or to behave appropriately when the target function is dropped by another session during the wait. Best regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Вложения
On Thu, 5 Jun 2025 16:26:08 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > On Tue, 3 Jun 2025 17:39:50 +0900 > Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > On Tue, 27 May 2025 09:00:01 +0300 > > Alexander Lakhin <exclusion@gmail.com> wrote: > > > I know there are other scenarios where the same is raises and I agree that > > it would be better to consider a more global solution instead of addressing > > each of them. However, I am not sure that improving the error message for > > each case doesn't not make sense. > > To address the remaining cases where DDL commands fail with the internal > error "ERROR: tuple concurrently updated" due to insufficient locking, > I would like to propose improving the error reporting to produce a more > appropriate and user-facing error message. This should make it easier for > users to understand the cause of the failure. > > Patch 0003 improves the error message shown when concurrent updates to a > system catalog tuple occur, producing output like: > > ERROR: operation failed due to a concurrent command > DETAIL: Another command modified the same object in a concurrent session. > > Patches 0001 and 0002 are unchanged from v2, except for updated commit messages. > I believe these patches are still useful, as they allow the operation to complete > successfully after waiting, or to behave appropriately when the target function > is dropped by another session during the wait. I found that the error "tuple concurrently updated" was expected as the results of injection_points test , so I've fixed it so that the new message is expected instead. I've attached updated patches. Best regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Вложения
Hi, On Thu, Jun 5, 2025 at 5:21 PM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > I've attached updated patches. > I have some comments on v4-0001 patch : 1) heap_freetuple should be called for every tuple that we get from SearchSysCacheCopy3. But if tuple is valid after the first SearchSysCacheCopy3, we overwrite the old pointer (by the second SearchSysCacheCopy3 call) and forget to free it. I suggest adding heap_freetuple call before the second SearchSysCacheCopy3 call. 2) + Form_pg_proc oldproc = (Form_pg_proc) GETSTRUCT(oldtup); + Datum proargnames; + bool isnull; + const char *dropcmd; Strange alignment. I guess you should keep the same alignment as in deleted declarations. 3) This patch fixes postgres behavior if I first create a function and then try to CREATE OR REPLACE it in concurrent transactions. But if the function doesn't exist and I try to call CREATE OR REPLACE in concurrent transactions, I will get an error. I wrote about it in this thread [1] and Tom Lane said that this behavior is kinda expected. Just in case, I decided to mention it here anyway - perhaps you will have other thoughts on this matter. [1] https://www.postgresql.org/message-id/flat/CAJDiXghv2JF5zbLyyybokWKM%2B-GYsTG%2Bhw7xseLNgJOJwf0%2B8w%40mail.gmail.com -- Best regards, Daniil Davydov
On Fri, 27 Jun 2025 18:53:02 +0700 Daniil Davydov <3danissimo@gmail.com> wrote: > Hi, > > On Thu, Jun 5, 2025 at 5:21 PM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > > I've attached updated patches. > > > > I have some comments on v4-0001 patch : Thank you for your comments! > 1) > heap_freetuple should be called for every tuple that we get from > SearchSysCacheCopy3. > But if tuple is valid after the first SearchSysCacheCopy3, we > overwrite the old pointer (by the second SearchSysCacheCopy3 call) and > forget to free it. > I suggest adding heap_freetuple call before the second SearchSysCacheCopy3 call. Good catches. Fixed. > 2) > + Form_pg_proc oldproc = (Form_pg_proc) GETSTRUCT(oldtup); > + Datum proargnames; > + bool isnull; > + const char *dropcmd; > Strange alignment. I guess you should keep the same alignment as in > deleted declarations. Fixed. I've attached patches including these fixes. > 3) > This patch fixes postgres behavior if I first create a function and > then try to CREATE OR REPLACE it in concurrent transactions. > But if the function doesn't exist and I try to call CREATE OR REPLACE > in concurrent transactions, I will get an error. > I wrote about it in this thread [1] and Tom Lane said that this > behavior is kinda expected. > Just in case, I decided to mention it here anyway - perhaps you will > have other thoughts on this matter. > > [1] https://www.postgresql.org/message-id/flat/CAJDiXghv2JF5zbLyyybokWKM%2B-GYsTG%2Bhw7xseLNgJOJwf0%2B8w%40mail.gmail.com I agree with Tom Lane that the behavior is expected, although it would be better if the error message were more user-friendly. We could improve it by checking the unique constraint before calling index_insert in CatalogIndexInsert. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Вложения
Hi, On Mon, Jun 30, 2025 at 3:47 PM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > On Fri, 27 Jun 2025 18:53:02 +0700 > Daniil Davydov <3danissimo@gmail.com> wrote: > > This patch fixes postgres behavior if I first create a function and > > then try to CREATE OR REPLACE it in concurrent transactions. > > But if the function doesn't exist and I try to call CREATE OR REPLACE > > in concurrent transactions, I will get an error. > > I wrote about it in this thread [1] and Tom Lane said that this > > behavior is kinda expected. > > Just in case, I decided to mention it here anyway - perhaps you will > > have other thoughts on this matter. > > > > [1] https://www.postgresql.org/message-id/flat/CAJDiXghv2JF5zbLyyybokWKM%2B-GYsTG%2Bhw7xseLNgJOJwf0%2B8w%40mail.gmail.com > > I agree with Tom Lane that the behavior is expected, although it would be better > if the error message were more user-friendly. We could improve it by checking the > unique constraint before calling index_insert in CatalogIndexInsert. > As far as I understand, unique constraint checking is specific for each index access method. Thus, to implement the proposed idea, you will have to create a separate callback for check_unique function. It doesn't seem like a very neat solution, but there aren't many other options left. I would suggest intercepting the error (via PG_CATCH), and if it has the ERRCODE_UNIQUE_VIOLATION code, change the error message (more precisely, throw another error with the desired message). If we caught an error during the CatalogTupleInsert call, we can be sure that the problem is in concurrent execution, because before the insertion, we checked that such a tuple does not exist. What do you think? And in general, are you going to fix this behavior within this thread? P.S. Thank you for updating the patch. -- Best regards, Daniil Davydov
Hi, On Tue, Jul 1, 2025 at 5:47 PM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > On Mon, 30 Jun 2025 18:32:47 +0700 > Daniil Davydov <3danissimo@gmail.com> wrote: > > > On Mon, Jun 30, 2025 at 3:47 PM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > > > > I agree with Tom Lane that the behavior is expected, although it would be better > > > if the error message were more user-friendly. We could improve it by checking the > > > unique constraint before calling index_insert in CatalogIndexInsert. > > > > > > > As far as I understand, unique constraint checking is specific for > > each index access method. > > Thus, to implement the proposed idea, you will have to create a > > separate callback for check_unique function. > > It doesn't seem like a very neat solution, but there aren't many other > > options left. > > I believe check_exclusion_or_unique_constraint() can be used independently of > a specific index access method. > > > I would suggest intercepting the error (via PG_CATCH), and if it has > > the ERRCODE_UNIQUE_VIOLATION code, change the error message (more > > precisely, throw another error with the desired message). > > If we caught an error during the CatalogTupleInsert call, we can be > > sure that the problem is in concurrent execution, because before the > > insertion, we checked that such a tuple does not exist. > > > > What do you think? And in general, are you going to fix this behavior > > within this thread? > > Initially, I wasn't planning to do so, but I gave it a try and wrote a > patch to fix the issue based on my idea. Thanks for the patch! Some comments on it : 1) I found two typos : + if (HeapTupleIsHeapOenly(heapTuple) && !onlySummarized) and + sartisfied = check_unique_constraint(heapRelation, 2) CatalogIndexInsert is kinda "popular" function. It can be called in different situations, not in all of which a violation of unique constraint means an error due to competitiveness. For example, with this patch such a query : "CREATE TYPE mood AS ENUM ('happy', 'sad', 'happy');" Will throw this error : "operation failed due to a concurrent command" Of course, it isn't true. That is why I suggested handling unique violations exactly inside ProcedureCreate - the only place where we can be sure about reasons of error. -- Best regards, Daniil Davydov
On Thu, 3 Jul 2025 23:18:12 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > On Tue, 1 Jul 2025 18:56:11 +0700 > Daniil Davydov <3danissimo@gmail.com> wrote: > > For example, with this patch such a query : "CREATE TYPE mood AS ENUM > > ('happy', 'sad', 'happy');" > > Will throw this error : "operation failed due to a concurrent command" > > Of course, it isn't true > > You're right ― this error is not caused by a concurrent command. > However, I believe the error message in cases like creating an ENUM type with > duplicate labels could be improved to explain the issue more clearly, rather > than just reporting it as a unique constraint violation. I have submitted a patch addressing this in a separate thread [1]. [1] https://www.postgresql.org/message-id/20250704000402.37e605ab0c59c300965a17ee%40sraoss.co.jp Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Hi, On Thu, Jul 3, 2025 at 9:18 PM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > On Tue, 1 Jul 2025 18:56:11 +0700 > Daniil Davydov <3danissimo@gmail.com> wrote: > > > CatalogIndexInsert is kinda "popular" function. It can be called in > > different situations, not in all of which a violation of unique > > constraint means an error due to competitiveness. > > > > For example, with this patch such a query : "CREATE TYPE mood AS ENUM > > ('happy', 'sad', 'happy');" > > Will throw this error : "operation failed due to a concurrent command" > > Of course, it isn't true > > You're right — this error is not caused by a concurrent command. > However, I believe the error message in cases like creating an ENUM type with > duplicate labels could be improved to explain the issue more clearly, rather > than just reporting it as a unique constraint violation. > > In any case, a unique constraint violation in a system catalog is not necessarily > due to concurrent DDL. Therefore, the error message shouldn't suggest that as the > only cause. Instead, it should clearly report the constraint violation as the primary > issue, and mention concurrent DDL as just one possible explanation in HINT. > > I've updated the patch accordingly to reflect this direction in the error message. > > ERROR: operation failed due to duplicate key object > DETAIL: Key (proname, proargtypes, pronamespace)=(fnc, , 2200) already exists in unique index pg_proc_proname_args_nsp_index. > HINT: Another command might have created a object with the same key in a concurrent session. > > However, as a result, the message ends up being similar to the current one raised > by the btree code, so the overall improvement in user-friendliness might be limited. > Thanks for updating the patch! +1 for adding such a hint for this error. > > That is why I suggested handling unique violations exactly inside > > ProcedureCreate - the only place where we can be sure about reasons of > > error. > > If we were to fix the error message outside of CatalogIndexInsert, we would need to > modify CatalogTupleInsert, CatalogTupleUpdate, and related functions to allow them to > report the failure appropriately. > > You suggested using PG_TRY/PG_CATCH, but these do not suppress the error message from > the btree code, so this approach seems not to fully address the issue. > > Moreover, the places affected are not limited to ProcedureCreate, for example, > concurrent CREATE TABLE commands can also lead to the same situation, and possibly > other commands as well. Actually, we can suppress errors from btree (by flushing error context and creating another), but it doesn't look like the best design decision. It was an idea for one concrete error fix. Anyway,I like the correction you suggested better. -- Best regards, Daniil Davydov
On Fri, 4 Jul 2025 10:48:26 +0700 Daniil Davydov <3danissimo@gmail.com> wrote: > Hi, > > On Thu, Jul 3, 2025 at 9:18 PM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > > On Tue, 1 Jul 2025 18:56:11 +0700 > > Daniil Davydov <3danissimo@gmail.com> wrote: > > > > > CatalogIndexInsert is kinda "popular" function. It can be called in > > > different situations, not in all of which a violation of unique > > > constraint means an error due to competitiveness. > > > > > > For example, with this patch such a query : "CREATE TYPE mood AS ENUM > > > ('happy', 'sad', 'happy');" > > > Will throw this error : "operation failed due to a concurrent command" > > > Of course, it isn't true > > > > You're right — this error is not caused by a concurrent command. > > However, I believe the error message in cases like creating an ENUM type with > > duplicate labels could be improved to explain the issue more clearly, rather > > than just reporting it as a unique constraint violation. > > > > In any case, a unique constraint violation in a system catalog is not necessarily > > due to concurrent DDL. Therefore, the error message shouldn't suggest that as the > > only cause. Instead, it should clearly report the constraint violation as the primary > > issue, and mention concurrent DDL as just one possible explanation in HINT. > > > > I've updated the patch accordingly to reflect this direction in the error message. > > > > ERROR: operation failed due to duplicate key object > > DETAIL: Key (proname, proargtypes, pronamespace)=(fnc, , 2200) already exists in unique index pg_proc_proname_args_nsp_index. > > HINT: Another command might have created a object with the same key in a concurrent session. > > > > However, as a result, the message ends up being similar to the current one raised > > by the btree code, so the overall improvement in user-friendliness might be limited. > > > > Thanks for updating the patch! > +1 for adding such a hint for this error. > > > > That is why I suggested handling unique violations exactly inside > > > ProcedureCreate - the only place where we can be sure about reasons of > > > error. > > > > If we were to fix the error message outside of CatalogIndexInsert, we would need to > > modify CatalogTupleInsert, CatalogTupleUpdate, and related functions to allow them to > > report the failure appropriately. > > > > You suggested using PG_TRY/PG_CATCH, but these do not suppress the error message from > > the btree code, so this approach seems not to fully address the issue. > > > > Moreover, the places affected are not limited to ProcedureCreate, for example, > > concurrent CREATE TABLE commands can also lead to the same situation, and possibly > > other commands as well. > > Actually, we can suppress errors from btree (by flushing error context > and creating another), Right. That was my misunderstanding. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
On Fri, 4 Jul 2025 14:58:05 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > On Fri, 4 Jul 2025 10:48:26 +0700 > Daniil Davydov <3danissimo@gmail.com> wrote: > > > Hi, > > > > On Thu, Jul 3, 2025 at 9:18 PM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > > > > On Tue, 1 Jul 2025 18:56:11 +0700 > > > Daniil Davydov <3danissimo@gmail.com> wrote: > > > > > > > CatalogIndexInsert is kinda "popular" function. It can be called in > > > > different situations, not in all of which a violation of unique > > > > constraint means an error due to competitiveness. > > > > > > > > For example, with this patch such a query : "CREATE TYPE mood AS ENUM > > > > ('happy', 'sad', 'happy');" > > > > Will throw this error : "operation failed due to a concurrent command" > > > > Of course, it isn't true > > > > > > You're right — this error is not caused by a concurrent command. > > > However, I believe the error message in cases like creating an ENUM type with > > > duplicate labels could be improved to explain the issue more clearly, rather > > > than just reporting it as a unique constraint violation. > > > > > > In any case, a unique constraint violation in a system catalog is not necessarily > > > due to concurrent DDL. Therefore, the error message shouldn't suggest that as the > > > only cause. Instead, it should clearly report the constraint violation as the primary > > > issue, and mention concurrent DDL as just one possible explanation in HINT. > > > > > > I've updated the patch accordingly to reflect this direction in the error message. > > > > > > ERROR: operation failed due to duplicate key object > > > DETAIL: Key (proname, proargtypes, pronamespace)=(fnc, , 2200) already exists in unique index pg_proc_proname_args_nsp_index. > > > HINT: Another command might have created a object with the same key in a concurrent session. > > > > > > However, as a result, the message ends up being similar to the current one raised > > > by the btree code, so the overall improvement in user-friendliness might be limited. > > > > > > > Thanks for updating the patch! > > +1 for adding such a hint for this error. I've attached updated patches since I found some test failed. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Вложения
On Thu, 17 Jul 2025 14:09:14 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > On Fri, 4 Jul 2025 14:58:05 +0900 > Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > On Fri, 4 Jul 2025 10:48:26 +0700 > > Daniil Davydov <3danissimo@gmail.com> wrote: > > > > > Hi, > > > > > > On Thu, Jul 3, 2025 at 9:18 PM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > > > > > > On Tue, 1 Jul 2025 18:56:11 +0700 > > > > Daniil Davydov <3danissimo@gmail.com> wrote: > > > > > > > > > CatalogIndexInsert is kinda "popular" function. It can be called in > > > > > different situations, not in all of which a violation of unique > > > > > constraint means an error due to competitiveness. > > > > > > > > > > For example, with this patch such a query : "CREATE TYPE mood AS ENUM > > > > > ('happy', 'sad', 'happy');" > > > > > Will throw this error : "operation failed due to a concurrent command" > > > > > Of course, it isn't true > > > > > > > > You're right — this error is not caused by a concurrent command. > > > > However, I believe the error message in cases like creating an ENUM type with > > > > duplicate labels could be improved to explain the issue more clearly, rather > > > > than just reporting it as a unique constraint violation. > > > > > > > > In any case, a unique constraint violation in a system catalog is not necessarily > > > > due to concurrent DDL. Therefore, the error message shouldn't suggest that as the > > > > only cause. Instead, it should clearly report the constraint violation as the primary > > > > issue, and mention concurrent DDL as just one possible explanation in HINT. > > > > > > > > I've updated the patch accordingly to reflect this direction in the error message. > > > > > > > > ERROR: operation failed due to duplicate key object > > > > DETAIL: Key (proname, proargtypes, pronamespace)=(fnc, , 2200) already exists in unique index pg_proc_proname_args_nsp_index. > > > > HINT: Another command might have created a object with the same key in a concurrent session. > > > > > > > > However, as a result, the message ends up being similar to the current one raised > > > > by the btree code, so the overall improvement in user-friendliness might be limited. > > > > > > > > > > Thanks for updating the patch! > > > +1 for adding such a hint for this error. I've fixed the following cfbot failure: SUMMARY: AddressSanitizer: heap-use-after-free /tmp/cirrus-ci-build/src/backend/catalog/pg_proc.c:406 in ProcedureCreate This was caused by accessing oldproc->oid after oldtup had already been freed. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Вложения
On Wed, 20 Aug 2025 17:01:56 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > On Thu, 17 Jul 2025 14:09:14 +0900 > Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > On Fri, 4 Jul 2025 14:58:05 +0900 > > Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > > > On Fri, 4 Jul 2025 10:48:26 +0700 > > > Daniil Davydov <3danissimo@gmail.com> wrote: > > > > > > > Hi, > > > > > > > > On Thu, Jul 3, 2025 at 9:18 PM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > > > > > > > > On Tue, 1 Jul 2025 18:56:11 +0700 > > > > > Daniil Davydov <3danissimo@gmail.com> wrote: > > > > > > > > > > > CatalogIndexInsert is kinda "popular" function. It can be called in > > > > > > different situations, not in all of which a violation of unique > > > > > > constraint means an error due to competitiveness. > > > > > > > > > > > > For example, with this patch such a query : "CREATE TYPE mood AS ENUM > > > > > > ('happy', 'sad', 'happy');" > > > > > > Will throw this error : "operation failed due to a concurrent command" > > > > > > Of course, it isn't true > > > > > > > > > > You're right — this error is not caused by a concurrent command. > > > > > However, I believe the error message in cases like creating an ENUM type with > > > > > duplicate labels could be improved to explain the issue more clearly, rather > > > > > than just reporting it as a unique constraint violation. > > > > > > > > > > In any case, a unique constraint violation in a system catalog is not necessarily > > > > > due to concurrent DDL. Therefore, the error message shouldn't suggest that as the > > > > > only cause. Instead, it should clearly report the constraint violation as the primary > > > > > issue, and mention concurrent DDL as just one possible explanation in HINT. > > > > > > > > > > I've updated the patch accordingly to reflect this direction in the error message. > > > > > > > > > > ERROR: operation failed due to duplicate key object > > > > > DETAIL: Key (proname, proargtypes, pronamespace)=(fnc, , 2200) already exists in unique index pg_proc_proname_args_nsp_index. > > > > > HINT: Another command might have created a object with the same key in a concurrent session. > > > > > > > > > > However, as a result, the message ends up being similar to the current one raised > > > > > by the btree code, so the overall improvement in user-friendliness might be limited. > > > > > > > > > > > > > Thanks for updating the patch! > > > > +1 for adding such a hint for this error. I've attached rebase patches. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Вложения
Hi, On Tue, Sep 30, 2025 at 9:02 AM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > I've attached rebase patches. > It seems redundant to me that in CatalogIndexInsert we first check the unique constraint, and then pass the UNIQUE_CHECK_YES parameter to the index_insert function call below. As far as I understand, after check_unique_constraint we can be sure that everything is okay with inserted values. Am I missing something? Also, why should we add "IsCatalogRelation(heapRelation)" check inside the CatalogIndexInsert function? We know for sure that a given table is a catalog relation. BTW, what do you think about adding an isolation test for a concurrent "CREATE OR REPLACE FUNCTION" command? This is one of the 'problems' we're struggling with, but I don't see this case being explicitly tested anywhere. All other changes look good to me. -- Best regards, Daniil Davydov