pg_dump versus enum types, round N+1

Поиск
Список
Период
Сортировка
От Tom Lane
Тема pg_dump versus enum types, round N+1
Дата
Msg-id 1548468.1711220438@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: pg_dump versus enum types, round N+1  (Andrew Dunstan <andrew@dunslane.net>)
Список pgsql-hackers
I have a patch in the queue [1] that among other things tries to
reduce the number of XIDs consumed during pg_upgrade by making
pg_restore group its commands into batches of a thousand or so
per transaction.  This had been passing tests, so I was quite
surprised when the cfbot started to show it as falling over.
Investigation showed that it is now failing because 6185c9737
added these objects to the regression tests and didn't drop them:

CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple');
CREATE DOMAIN rgb AS rainbow CHECK (VALUE IN ('red', 'green', 'blue'));

In binary-upgrade mode, pg_dump dumps the enum type like this:

CREATE TYPE public.rainbow AS ENUM (
);
ALTER TYPE public.rainbow ADD VALUE 'red';
ALTER TYPE public.rainbow ADD VALUE 'orange';
ALTER TYPE public.rainbow ADD VALUE 'yellow';
ALTER TYPE public.rainbow ADD VALUE 'green';
ALTER TYPE public.rainbow ADD VALUE 'blue';
ALTER TYPE public.rainbow ADD VALUE 'purple';

and then, if we're within the same transaction, creation of the domain
falls over with

ERROR:  unsafe use of new value "red" of enum type rainbow
HINT:  New enum values must be committed before they can be used.

So I'm glad we found that sooner not later, but something needs
to be done about it if [1] is to get committed.  It doesn't seem
particularly hard to fix though: we just have to track the enum
type OIDs made in the current transaction, using largely the same
approach as is already used in pg_enum.c to track enum value OIDs.
enum.sql contains a comment opining that this is too expensive,
but I don't see why it is as long as we don't bother to track
commands that are nested within subtransactions.  That would be a bit
complicated to do correctly, but pg_dump/pg_restore doesn't need it.

Hence, I propose the attached.

            regards, tom lane

[1] https://commitfest.postgresql.org/47/4713/

diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c
index d2fe5ca2c8..7768bc97c3 100644
--- a/src/backend/catalog/pg_enum.c
+++ b/src/backend/catalog/pg_enum.c
@@ -36,17 +36,35 @@
 Oid            binary_upgrade_next_pg_enum_oid = InvalidOid;

 /*
- * Hash table of enum value OIDs created during the current transaction by
- * AddEnumLabel.  We disallow using these values until the transaction is
+ * We keep two transaction-lifespan hash tables, one containing the OIDs
+ * of enum types made in the current transaction, and one containing the
+ * OIDs of enum values created during the current transaction by
+ * AddEnumLabel (but only if their enum type is not in the first hash).
+ *
+ * We disallow using enum values in the second hash until the transaction is
  * committed; otherwise, they might get into indexes where we can't clean
  * them up, and then if the transaction rolls back we have a broken index.
  * (See comments for check_safe_enum_use() in enum.c.)  Values created by
  * EnumValuesCreate are *not* entered into the table; we assume those are
  * created during CREATE TYPE, so they can't go away unless the enum type
  * itself does.
+ *
+ * The motivation for treating enum values as safe if their type OID is
+ * in the first hash is to allow CREATE TYPE AS ENUM; ALTER TYPE ADD VALUE;
+ * followed by a use of the value in the same transaction.  This pattern
+ * is really just as safe as creating the value during CREATE TYPE.
+ * We need to support this because pg_dump in binary upgrade mode produces
+ * commands like that.  But currently we only support it when the commands
+ * are at the outermost transaction level, which is as much as we need for
+ * pg_dump.  We could track subtransaction nesting of the commands to
+ * analyze things more precisely, but for now we don't bother.
  */
-static HTAB *uncommitted_enums = NULL;
+static HTAB *uncommitted_enum_types = NULL;
+static HTAB *uncommitted_enum_values = NULL;

+static void init_uncommitted_enum_types(void);
+static void init_uncommitted_enum_values(void);
+static bool EnumTypeUncommitted(Oid typ_id);
 static void RenumberEnumType(Relation pg_enum, HeapTuple *existing, int nelems);
 static int    sort_order_cmp(const void *p1, const void *p2);

@@ -56,6 +74,11 @@ static int    sort_order_cmp(const void *p1, const void *p2);
  *        Create an entry in pg_enum for each of the supplied enum values.
  *
  * vals is a list of String values.
+ *
+ * We assume that this is called only by CREATE TYPE AS ENUM, and that it
+ * will be called even if the vals list is empty.  So we can enter the
+ * enum type's OID into uncommitted_enum_types here, rather than needing
+ * another entry point to do it.
  */
 void
 EnumValuesCreate(Oid enumTypeOid, List *vals)
@@ -70,6 +93,18 @@ EnumValuesCreate(Oid enumTypeOid, List *vals)
     CatalogIndexState indstate;
     TupleTableSlot **slot;

+    /*
+     * Remember the type OID as being made in the current transaction, but not
+     * if we're in a subtransaction.
+     */
+    if (GetCurrentTransactionNestLevel() == 1)
+    {
+        if (uncommitted_enum_types == NULL)
+            init_uncommitted_enum_types();
+        (void) hash_search(uncommitted_enum_types, &enumTypeOid,
+                           HASH_ENTER, NULL);
+    }
+
     num_elems = list_length(vals);

     /*
@@ -211,20 +246,37 @@ EnumValuesDelete(Oid enumTypeOid)
 }

 /*
- * Initialize the uncommitted enum table for this transaction.
+ * Initialize the uncommitted enum types table for this transaction.
+ */
+static void
+init_uncommitted_enum_types(void)
+{
+    HASHCTL        hash_ctl;
+
+    hash_ctl.keysize = sizeof(Oid);
+    hash_ctl.entrysize = sizeof(Oid);
+    hash_ctl.hcxt = TopTransactionContext;
+    uncommitted_enum_types = hash_create("Uncommitted enum types",
+                                         32,
+                                         &hash_ctl,
+                                         HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+}
+
+/*
+ * Initialize the uncommitted enum values table for this transaction.
  */
 static void
-init_uncommitted_enums(void)
+init_uncommitted_enum_values(void)
 {
     HASHCTL        hash_ctl;

     hash_ctl.keysize = sizeof(Oid);
     hash_ctl.entrysize = sizeof(Oid);
     hash_ctl.hcxt = TopTransactionContext;
-    uncommitted_enums = hash_create("Uncommitted enums",
-                                    32,
-                                    &hash_ctl,
-                                    HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+    uncommitted_enum_values = hash_create("Uncommitted enum values",
+                                          32,
+                                          &hash_ctl,
+                                          HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 }

 /*
@@ -520,12 +572,27 @@ restart:

     table_close(pg_enum, RowExclusiveLock);

-    /* Set up the uncommitted enum table if not already done in this tx */
-    if (uncommitted_enums == NULL)
-        init_uncommitted_enums();
+    /*
+     * If the enum type itself is uncommitted, we need not enter the new enum
+     * value into uncommitted_enum_values, because the type won't survive if
+     * the value doesn't.  (This is basically the same reasoning as for values
+     * made directly by CREATE TYPE AS ENUM.)  However, apply this rule only
+     * when we are not inside a subtransaction; if we're more deeply nested
+     * than the CREATE TYPE then the conclusion doesn't hold.  We could expend
+     * more effort to track the subtransaction level of CREATE TYPE, but for
+     * now we're only concerned about making the world safe for pg_dump in
+     * binary upgrade mode, and that won't use subtransactions.
+     */
+    if (GetCurrentTransactionNestLevel() == 1 &&
+        EnumTypeUncommitted(enumTypeOid))
+        return;
+
+    /* Set up the uncommitted values table if not already done in this tx */
+    if (uncommitted_enum_values == NULL)
+        init_uncommitted_enum_values();

     /* Add the new value to the table */
-    (void) hash_search(uncommitted_enums, &newOid, HASH_ENTER, NULL);
+    (void) hash_search(uncommitted_enum_values, &newOid, HASH_ENTER, NULL);
 }


@@ -614,19 +681,37 @@ RenameEnumLabel(Oid enumTypeOid,


 /*
- * Test if the given enum value is in the table of uncommitted enums.
+ * Test if the given type OID is in the table of uncommitted enum types.
+ */
+static bool
+EnumTypeUncommitted(Oid typ_id)
+{
+    bool        found;
+
+    /* If we've made no uncommitted types table, it's not in the table */
+    if (uncommitted_enum_types == NULL)
+        return false;
+
+    /* Else, is it in the table? */
+    (void) hash_search(uncommitted_enum_types, &typ_id, HASH_FIND, &found);
+    return found;
+}
+
+
+/*
+ * Test if the given enum value is in the table of uncommitted enum values.
  */
 bool
 EnumUncommitted(Oid enum_id)
 {
     bool        found;

-    /* If we've made no uncommitted table, all values are safe */
-    if (uncommitted_enums == NULL)
+    /* If we've made no uncommitted values table, it's not in the table */
+    if (uncommitted_enum_values == NULL)
         return false;

     /* Else, is it in the table? */
-    (void) hash_search(uncommitted_enums, &enum_id, HASH_FIND, &found);
+    (void) hash_search(uncommitted_enum_values, &enum_id, HASH_FIND, &found);
     return found;
 }

@@ -638,11 +723,12 @@ void
 AtEOXact_Enum(void)
 {
     /*
-     * Reset the uncommitted table, as all our enum values are now committed.
-     * The memory will go away automatically when TopTransactionContext is
-     * freed; it's sufficient to clear our pointer.
+     * Reset the uncommitted tables, as all our tuples are now committed. The
+     * memory will go away automatically when TopTransactionContext is freed;
+     * it's sufficient to clear our pointers.
      */
-    uncommitted_enums = NULL;
+    uncommitted_enum_types = NULL;
+    uncommitted_enum_values = NULL;
 }


@@ -723,15 +809,15 @@ sort_order_cmp(const void *p1, const void *p2)
 Size
 EstimateUncommittedEnumsSpace(void)
 {
-    size_t        entries;
+    size_t        entries = 0;

-    if (uncommitted_enums)
-        entries = hash_get_num_entries(uncommitted_enums);
-    else
-        entries = 0;
+    if (uncommitted_enum_types)
+        entries += hash_get_num_entries(uncommitted_enum_types);
+    if (uncommitted_enum_values)
+        entries += hash_get_num_entries(uncommitted_enum_values);

-    /* Add one for the terminator. */
-    return sizeof(Oid) * (entries + 1);
+    /* Add two for the terminators. */
+    return sizeof(Oid) * (entries + 2);
 }

 void
@@ -740,30 +826,44 @@ SerializeUncommittedEnums(void *space, Size size)
     Oid           *serialized = (Oid *) space;

     /*
-     * Make sure the hash table hasn't changed in size since the caller
+     * Make sure the hash tables haven't changed in size since the caller
      * reserved the space.
      */
     Assert(size == EstimateUncommittedEnumsSpace());

-    /* Write out all the values from the hash table, if there is one. */
-    if (uncommitted_enums)
+    /* Write out all the OIDs from the types hash table, if there is one. */
+    if (uncommitted_enum_types)
     {
         HASH_SEQ_STATUS status;
         Oid           *value;

-        hash_seq_init(&status, uncommitted_enums);
+        hash_seq_init(&status, uncommitted_enum_types);
         while ((value = (Oid *) hash_seq_search(&status)))
             *serialized++ = *value;
     }

     /* Write out the terminator. */
-    *serialized = InvalidOid;
+    *serialized++ = InvalidOid;
+
+    /* Write out all the OIDs from the values hash table, if there is one. */
+    if (uncommitted_enum_values)
+    {
+        HASH_SEQ_STATUS status;
+        Oid           *value;
+
+        hash_seq_init(&status, uncommitted_enum_values);
+        while ((value = (Oid *) hash_seq_search(&status)))
+            *serialized++ = *value;
+    }
+
+    /* Write out the terminator. */
+    *serialized++ = InvalidOid;

     /*
      * Make sure the amount of space we actually used matches what was
      * estimated.
      */
-    Assert((char *) (serialized + 1) == ((char *) space) + size);
+    Assert((char *) serialized == ((char *) space) + size);
 }

 void
@@ -771,20 +871,33 @@ RestoreUncommittedEnums(void *space)
 {
     Oid           *serialized = (Oid *) space;

-    Assert(!uncommitted_enums);
+    Assert(!uncommitted_enum_types);
+    Assert(!uncommitted_enum_values);

     /*
-     * As a special case, if the list is empty then don't even bother to
-     * create the hash table.  This is the usual case, since enum alteration
-     * is expected to be rare.
+     * If either list is empty then don't even bother to create that hash
+     * table.  This is the common case, since most transactions don't create
+     * or alter enums.
      */
-    if (!OidIsValid(*serialized))
-        return;
-
-    /* Read all the values into a new hash table. */
-    init_uncommitted_enums();
-    do
+    if (OidIsValid(*serialized))
     {
-        hash_search(uncommitted_enums, serialized++, HASH_ENTER, NULL);
-    } while (OidIsValid(*serialized));
+        /* Read all the types into a new hash table. */
+        init_uncommitted_enum_types();
+        do
+        {
+            (void) hash_search(uncommitted_enum_types, serialized++,
+                               HASH_ENTER, NULL);
+        } while (OidIsValid(*serialized));
+    }
+    serialized++;
+    if (OidIsValid(*serialized))
+    {
+        /* Read all the values into a new hash table. */
+        init_uncommitted_enum_values();
+        do
+        {
+            (void) hash_search(uncommitted_enum_values, serialized++,
+                               HASH_ENTER, NULL);
+        } while (OidIsValid(*serialized));
+    }
 }
diff --git a/src/backend/utils/adt/enum.c b/src/backend/utils/adt/enum.c
index f649ff2c56..814c7fb4e3 100644
--- a/src/backend/utils/adt/enum.c
+++ b/src/backend/utils/adt/enum.c
@@ -49,11 +49,12 @@ static ArrayType *enum_range_internal(Oid enumtypoid, Oid lower, Oid upper);
  * We don't implement that fully right now, but we do allow free use of enum
  * values created during CREATE TYPE AS ENUM, which are surely of the same
  * lifespan as the enum type.  (This case is required by "pg_restore -1".)
- * Values added by ALTER TYPE ADD VALUE are currently restricted, but could
- * be allowed if the enum type could be proven to have been created earlier
- * in the same transaction.  (Note that comparing tuple xmins would not work
- * for that, because the type tuple might have been updated in the current
- * transaction.  Subtransactions also create hazards to be accounted for.)
+ * Values added by ALTER TYPE ADD VALUE are also allowed if the enum type
+ * is known to have been created earlier in the same transaction.  (Note that
+ * we have to track that explicitly; comparing tuple xmins is insufficient,
+ * because the type tuple might have been updated in the current transaction.
+ * Subtransactions also create hazards to be accounted for; currently,
+ * pg_enum.c only handles ADD VALUE at the outermost transaction level.)
  *
  * This function needs to be called (directly or indirectly) in any of the
  * functions below that could return an enum value to SQL operations.
@@ -81,10 +82,10 @@ check_safe_enum_use(HeapTuple enumval_tup)
         return;

     /*
-     * Check if the enum value is uncommitted.  If not, it's safe, because it
-     * was made during CREATE TYPE AS ENUM and can't be shorter-lived than its
-     * owning type.  (This'd also be false for values made by other
-     * transactions; but the previous tests should have handled all of those.)
+     * Check if the enum value is listed as uncommitted.  If not, it's safe,
+     * because it can't be shorter-lived than its owning type.  (This'd also
+     * be false for values made by other transactions; but the previous tests
+     * should have handled all of those.)
      */
     if (!EnumUncommitted(en->oid))
         return;
diff --git a/src/test/regress/expected/enum.out b/src/test/regress/expected/enum.out
index 01159688e5..1d09c208bc 100644
--- a/src/test/regress/expected/enum.out
+++ b/src/test/regress/expected/enum.out
@@ -684,16 +684,18 @@ select enum_range(null::bogon);
 (1 row)

 ROLLBACK;
--- ideally, we'd allow this usage; but it requires keeping track of whether
--- the enum type was created in the current transaction, which is expensive
+-- we must allow this usage to support pg_dump in binary upgrade mode
 BEGIN;
 CREATE TYPE bogus AS ENUM('good');
 ALTER TYPE bogus RENAME TO bogon;
 ALTER TYPE bogon ADD VALUE 'bad';
 ALTER TYPE bogon ADD VALUE 'ugly';
-select enum_range(null::bogon);  -- fails
-ERROR:  unsafe use of new value "bad" of enum type bogon
-HINT:  New enum values must be committed before they can be used.
+select enum_range(null::bogon);
+   enum_range
+-----------------
+ {good,bad,ugly}
+(1 row)
+
 ROLLBACK;
 --
 -- Cleanup
diff --git a/src/test/regress/sql/enum.sql b/src/test/regress/sql/enum.sql
index 93171379f2..ecc4878a67 100644
--- a/src/test/regress/sql/enum.sql
+++ b/src/test/regress/sql/enum.sql
@@ -323,14 +323,13 @@ ALTER TYPE bogus RENAME TO bogon;
 select enum_range(null::bogon);
 ROLLBACK;

--- ideally, we'd allow this usage; but it requires keeping track of whether
--- the enum type was created in the current transaction, which is expensive
+-- we must allow this usage to support pg_dump in binary upgrade mode
 BEGIN;
 CREATE TYPE bogus AS ENUM('good');
 ALTER TYPE bogus RENAME TO bogon;
 ALTER TYPE bogon ADD VALUE 'bad';
 ALTER TYPE bogon ADD VALUE 'ugly';
-select enum_range(null::bogon);  -- fails
+select enum_range(null::bogon);
 ROLLBACK;

 --

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dmitry Dolgov
Дата:
Сообщение: Re: pg_stat_statements and "IN" conditions
Следующее
От: James Coleman
Дата:
Сообщение: Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers