Обсуждение: pg_get_expr locking

Поиск
Список
Период
Сортировка

pg_get_expr locking

От
Peter Eisentraut
Дата:
The function pg_get_expr(), which is used in various system views and 
information schema views, does not appear to lock the table passed as 
the second argument, and so appears to be liable to fail if there is a 
concurrent drop of the table.  There is a (probable) case of this being 
discussed at [0].  I also see various mentions of this issue in the 
commit logs, mostly related to pg_dump.

Is there a reason there is no locking?  Performance?

What workaround should we use if there are conflicts created by 
concurrent regression tests?  Just move the tests around a bit until the 
issue goes away?


[0]: 
https://www.postgresql.org/message-id/flat/9ec24d7b-633d-463a-84c6-7acff769c9e8@eisentraut.org



Re: pg_get_expr locking

От
Tom Lane
Дата:
Peter Eisentraut <peter@eisentraut.org> writes:
> The function pg_get_expr(), which is used in various system views and 
> information schema views, does not appear to lock the table passed as 
> the second argument, and so appears to be liable to fail if there is a 
> concurrent drop of the table.  There is a (probable) case of this being 
> discussed at [0].  I also see various mentions of this issue in the 
> commit logs, mostly related to pg_dump.

> Is there a reason there is no locking?  Performance?

I think we have a general rule that you shouldn't be able to take
locks on relations you have no privileges for, so pg_get_expr would
need to verify privileges if it locked the rel.  At least for
pg_dump's purposes, that cure would be about as bad as the disease.

> What workaround should we use if there are conflicts created by 
> concurrent regression tests?  Just move the tests around a bit until the 
> issue goes away?

Why would a test be applying pg_get_expr to a table it doesn't
control?

            regards, tom lane



Re: pg_get_expr locking

От
Peter Eisentraut
Дата:
On 07.02.24 16:26, Tom Lane wrote:
>> What workaround should we use if there are conflicts created by
>> concurrent regression tests?  Just move the tests around a bit until the
>> issue goes away?
> 
> Why would a test be applying pg_get_expr to a table it doesn't
> control?

I think the situation is that one test (domain) runs pg_get_expr as part 
of an information_schema view, while at the same time another test 
(alter_table) drops a table that the pg_get_expr is just processing.




Re: pg_get_expr locking

От
Tom Lane
Дата:
Peter Eisentraut <peter@eisentraut.org> writes:
> On 07.02.24 16:26, Tom Lane wrote:
>> Why would a test be applying pg_get_expr to a table it doesn't
>> control?

> I think the situation is that one test (domain) runs pg_get_expr as part 
> of an information_schema view, while at the same time another test 
> (alter_table) drops a table that the pg_get_expr is just processing.

The test case that's failing is, IIUC,

+SELECT * FROM information_schema.domain_constraints
+  WHERE domain_name IN ('con', 'dom', 'pos_int', 'things')
+  ORDER BY constraint_name;

I see no use of pg_get_expr() in the domain_constraints view:

CREATE VIEW domain_constraints AS
    SELECT CAST(current_database() AS sql_identifier) AS constraint_catalog,
           CAST(rs.nspname AS sql_identifier) AS constraint_schema,
           CAST(con.conname AS sql_identifier) AS constraint_name,
           CAST(current_database() AS sql_identifier) AS domain_catalog,
           CAST(n.nspname AS sql_identifier) AS domain_schema,
           CAST(t.typname AS sql_identifier) AS domain_name,
           CAST(CASE WHEN condeferrable THEN 'YES' ELSE 'NO' END
             AS yes_or_no) AS is_deferrable,
           CAST(CASE WHEN condeferred THEN 'YES' ELSE 'NO' END
             AS yes_or_no) AS initially_deferred
    FROM pg_namespace rs, pg_namespace n, pg_constraint con, pg_type t
    WHERE rs.oid = con.connamespace
          AND n.oid = t.typnamespace
          AND t.oid = con.contypid
          AND (pg_has_role(t.typowner, 'USAGE')
               OR has_type_privilege(t.oid, 'USAGE'));

I'm a little suspicious that the failure is actually coming from
somewhere down inside has_type_privilege(), but I traced through
that quickly and don't see how it could reach such an error.
In any case I thought we'd hardened all those functions in 403ac226d.
So I'm still pretty mystified.  Have you had any luck in making
the failure reproducible?

            regards, tom lane



Re: pg_get_expr locking

От
Tom Lane
Дата:
I wrote:
> Peter Eisentraut <peter@eisentraut.org> writes:
>> I think the situation is that one test (domain) runs pg_get_expr as part 
>> of an information_schema view, while at the same time another test 
>> (alter_table) drops a table that the pg_get_expr is just processing.

> The test case that's failing is, IIUC,

> +SELECT * FROM information_schema.domain_constraints
> +  WHERE domain_name IN ('con', 'dom', 'pos_int', 'things')
> +  ORDER BY constraint_name;

Oh, scratch that: there are two confusingly lookalike queries
in the patch.  The one that is failing is

SELECT * FROM information_schema.check_constraints
  WHERE (constraint_schema, constraint_name)
        IN (SELECT constraint_schema, constraint_name
            FROM information_schema.domain_constraints
            WHERE domain_name IN ('con', 'dom', 'pos_int', 'things'))
  ORDER BY constraint_name;

and we have trouble because the evaluation of pg_get_expr in
check_constraints is done before the semijoin that would restrict
it to just the desired objects.

After looking at the code I'm less worried about the
permissions-checking angle than I was before, because I see
that pg_get_expr already takes a transient AccessShareLock
on the rel, down inside set_relation_column_names.  This is
not ideal from a permissions standpoint perhaps, but it's
probably OK considering we've done that for a long time.
We just need to hold that lock a little while longer.

I propose the attached as a reasonably localized fix.
We could imagine a more aggressive refactoring that would
allow passing down the Relation instead of re-opening it
in set_relation_column_names, but I doubt it's worth the
trouble.

            regards, tom lane

diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index b625f471a8..644966721f 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -352,8 +352,7 @@ static char *pg_get_partkeydef_worker(Oid relid, int prettyFlags,
                                       bool attrsOnly, bool missing_ok);
 static char *pg_get_constraintdef_worker(Oid constraintId, bool fullCommand,
                                          int prettyFlags, bool missing_ok);
-static text *pg_get_expr_worker(text *expr, Oid relid, const char *relname,
-                                int prettyFlags);
+static text *pg_get_expr_worker(text *expr, Oid relid, int prettyFlags);
 static int    print_function_arguments(StringInfo buf, HeapTuple proctup,
                                      bool print_table_args, bool print_defaults);
 static void print_function_rettype(StringInfo buf, HeapTuple proctup);
@@ -2615,6 +2622,11 @@ decompile_column_index_array(Datum column_index_array, Oid relId,
  * partial indexes, column default expressions, etc.  We also support
  * Var-free expressions, for which the OID can be InvalidOid.
  *
+ * If the OID is nonzero but not actually valid, don't throw an error,
+ * just return NULL.  This is a bit questionable, but it's what we've
+ * done historically, and it can help avoid unwanted failures when
+ * examining catalog entries for just-deleted relations.
+ *
  * We expect this function to work, or throw a reasonably clean error,
  * for any node tree that can appear in a catalog pg_node_tree column.
  * Query trees, such as those appearing in pg_rewrite.ev_action, are
@@ -2627,29 +2639,16 @@ pg_get_expr(PG_FUNCTION_ARGS)
 {
     text       *expr = PG_GETARG_TEXT_PP(0);
     Oid            relid = PG_GETARG_OID(1);
+    text       *result;
     int            prettyFlags;
-    char       *relname;

     prettyFlags = PRETTYFLAG_INDENT;

-    if (OidIsValid(relid))
-    {
-        /* Get the name for the relation */
-        relname = get_rel_name(relid);
-
-        /*
-         * If the OID isn't actually valid, don't throw an error, just return
-         * NULL.  This is a bit questionable, but it's what we've done
-         * historically, and it can help avoid unwanted failures when
-         * examining catalog entries for just-deleted relations.
-         */
-        if (relname == NULL)
-            PG_RETURN_NULL();
-    }
+    result = pg_get_expr_worker(expr, relid, prettyFlags);
+    if (result)
+        PG_RETURN_TEXT_P(result);
     else
-        relname = NULL;
-
-    PG_RETURN_TEXT_P(pg_get_expr_worker(expr, relid, relname, prettyFlags));
+        PG_RETURN_NULL();
 }

 Datum
@@ -2658,33 +2657,27 @@ pg_get_expr_ext(PG_FUNCTION_ARGS)
     text       *expr = PG_GETARG_TEXT_PP(0);
     Oid            relid = PG_GETARG_OID(1);
     bool        pretty = PG_GETARG_BOOL(2);
+    text       *result;
     int            prettyFlags;
-    char       *relname;

     prettyFlags = GET_PRETTY_FLAGS(pretty);

-    if (OidIsValid(relid))
-    {
-        /* Get the name for the relation */
-        relname = get_rel_name(relid);
-        /* See notes above */
-        if (relname == NULL)
-            PG_RETURN_NULL();
-    }
+    result = pg_get_expr_worker(expr, relid, prettyFlags);
+    if (result)
+        PG_RETURN_TEXT_P(result);
     else
-        relname = NULL;
-
-    PG_RETURN_TEXT_P(pg_get_expr_worker(expr, relid, relname, prettyFlags));
+        PG_RETURN_NULL();
 }

 static text *
-pg_get_expr_worker(text *expr, Oid relid, const char *relname, int prettyFlags)
+pg_get_expr_worker(text *expr, Oid relid, int prettyFlags)
 {
     Node       *node;
     Node       *tst;
     Relids        relids;
     List       *context;
     char       *exprstr;
+    Relation    rel = NULL;
     char       *str;

     /* Convert input pg_node_tree (really TEXT) object to C string */
@@ -2729,9 +2722,19 @@ pg_get_expr_worker(text *expr, Oid relid, const char *relname, int prettyFlags)
                      errmsg("expression contains variables")));
     }

-    /* Prepare deparse context if needed */
+    /*
+     * Prepare deparse context if needed.  If we are deparsing with a relid,
+     * we need to transiently open and lock the rel, to make sure it won't go
+     * away underneath us.  (set_relation_column_names would lock it anyway,
+     * so this isn't really introducing any new behavior.)
+     */
     if (OidIsValid(relid))
-        context = deparse_context_for(relname, relid);
+    {
+        rel = try_relation_open(relid, AccessShareLock);
+        if (rel == NULL)
+            return NULL;
+        context = deparse_context_for(RelationGetRelationName(rel), relid);
+    }
     else
         context = NIL;

@@ -2739,6 +2742,9 @@ pg_get_expr_worker(text *expr, Oid relid, const char *relname, int prettyFlags)
     str = deparse_expression_pretty(node, context, false, false,
                                     prettyFlags, 0);

+    if (rel != NULL)
+        relation_close(rel, AccessShareLock);
+
     return string_to_text(str);
 }