Обсуждение: pg_subscription.subslotname is wrongly marked NOT NULL

Поиск
Список
Период
Сортировка

pg_subscription.subslotname is wrongly marked NOT NULL

От
Tom Lane
Дата:
In all branches back to v10, initdb marks pg_subscription.subslotname
as NOT NULL:

# \d pg_subscription
             Table "pg_catalog.pg_subscription"
     Column      |  Type   | Collation | Nullable | Default 
-----------------+---------+-----------+----------+---------
 oid             | oid     |           | not null | 
 subdbid         | oid     |           | not null | 
 subname         | name    |           | not null | 
 subowner        | oid     |           | not null | 
 subenabled      | boolean |           | not null | 
 subbinary       | boolean |           | not null | 
 subconninfo     | text    | C         | not null | 
 subslotname     | name    |           | not null | 
 subsynccommit   | text    | C         | not null | 
 subpublications | text[]  | C         | not null | 

Nonetheless, CREATE/ALTER SUBSCRIPTION blithely set it to null
when slot_name = NONE is specified.

This apparently causes few ill effects, unless somebody decides
to JIT-compile deconstruction of pg_subscription tuples.  Which
is why all of Andres' JIT-enabled buildfarm animals are unhappy
with 9de77b545 --- quite unintentionally, that commit added a
test case that exposed the problem.

What would we like to do about this?  Removing the NOT NULL
marking wouldn't be too hard in HEAD, but telling users to
fix it manually in the back branches seems like a mess.

On the whole it seems like changing the code to use some other
representation of slot_name = NONE, like say an empty string,
would be less of a mess.

It's also a bit annoying that we have no mechanized checks that
would catch this inconsistency.  If JIT is going to be absolutely
dependent on NOT NULL markings being accurate, we can't really
have such a laissez-faire attitude to C code getting it wrong.

Thoughts?

            regards, tom lane



Re: pg_subscription.subslotname is wrongly marked NOT NULL

От
Tom Lane
Дата:
I wrote:
> In all branches back to v10, initdb marks pg_subscription.subslotname
> as NOT NULL: ...
> Nonetheless, CREATE/ALTER SUBSCRIPTION blithely set it to null
> when slot_name = NONE is specified.

> What would we like to do about this?  Removing the NOT NULL
> marking wouldn't be too hard in HEAD, but telling users to
> fix it manually in the back branches seems like a mess.

After further thought, it seems like changing the definition that
subslotname is null for "NONE" is unworkable, because client-side
code might be depending on that.  (pg_dump certainly is; we could
change that, but other code might have the same expectation.)

What I propose we do is

(1) Fix the NOT NULL marking in HEAD and v13.  We could perhaps
alter it in older branches as well, but we cannot force initdb
so such a change would only affect newly-initdb'd installations.

(2) In pre-v13 branches, hack the JIT tuple deconstruction code
to be specifically aware that it can't trust attnotnull for
pg_subscription.subslotname.  Yeah, it's ugly, but at least it's
not ugly going forwards.

I haven't looked to see where or how we might do (2), but I assume
it's possible.

> It's also a bit annoying that we have no mechanized checks that
> would catch this inconsistency.  If JIT is going to be absolutely
> dependent on NOT NULL markings being accurate, we can't really
> have such a laissez-faire attitude to C code getting it wrong.

It seems like at least in assert-enabled builds, we'd better have
a cross-check for that.  I'm not sure where's the best place.

            regards, tom lane



Re: pg_subscription.subslotname is wrongly marked NOT NULL

От
Tom Lane
Дата:
I wrote:
> (2) In pre-v13 branches, hack the JIT tuple deconstruction code
> to be specifically aware that it can't trust attnotnull for
> pg_subscription.subslotname.  Yeah, it's ugly, but at least it's
> not ugly going forwards.

Concretely, as attached for v12.

            regards, tom lane

diff --git a/src/backend/jit/llvm/llvmjit_deform.c b/src/backend/jit/llvm/llvmjit_deform.c
index 835aea83e9..4b2144e1a7 100644
--- a/src/backend/jit/llvm/llvmjit_deform.c
+++ b/src/backend/jit/llvm/llvmjit_deform.c
@@ -22,11 +22,24 @@

 #include "access/htup_details.h"
 #include "access/tupdesc_details.h"
+#include "catalog/pg_subscription.h"
 #include "executor/tuptable.h"
 #include "jit/llvmjit.h"
 #include "jit/llvmjit_emit.h"


+/*
+ * Through an embarrassing oversight, pre-v13 installations may have
+ * pg_subscription.subslotname marked as attnotnull, which it should not be.
+ * To avoid possible crashes, use this macro instead of testing attnotnull
+ * directly.
+ */
+#define ATTNOTNULL(att) \
+    ((att)->attnotnull && \
+     ((att)->attrelid != SubscriptionRelationId || \
+      (att)->attnum != Anum_pg_subscription_subslotname))
+
+
 /*
  * Create a function that deforms a tuple of type desc up to natts columns.
  */
@@ -121,7 +134,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
          * combination of attisdropped && attnotnull combination shouldn't
          * exist.
          */
-        if (att->attnotnull &&
+        if (ATTNOTNULL(att) &&
             !att->atthasmissing &&
             !att->attisdropped)
             guaranteed_column_number = attnum;
@@ -419,7 +432,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
          * into account, because if they're present the heaptuple's natts
          * would have indicated that a slot_getmissingattrs() is needed.
          */
-        if (!att->attnotnull)
+        if (!ATTNOTNULL(att))
         {
             LLVMBasicBlockRef b_ifnotnull;
             LLVMBasicBlockRef b_ifnull;
@@ -586,6 +599,9 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
          * to generate better code. Without that LLVM can't figure out that
          * the offset might be constant due to the jumps for previously
          * decoded columns.
+         *
+         * Note: these two tests on attnotnull don't need the ATTNOTNULL hack,
+         * because they are harmless on pg_subscription anyway.
          */
         if (attguaranteedalign)
         {
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index e7add9d2b8..3ba1e5dcdd 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -147,6 +147,13 @@ DROP SUBSCRIPTION regress_testsub;
 ERROR:  DROP SUBSCRIPTION cannot run inside a transaction block
 COMMIT;
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
+\dRs+
+                                                      List of subscriptions
+      Name       |           Owner            | Enabled |     Publication     | Synchronous commit |
Conninfo           

+-----------------+----------------------------+---------+---------------------+--------------------+------------------------------
+ regress_testsub | regress_subscription_user2 | f       | {testpub2,testpub3} | local              |
dbname=regress_doesnotexist2
+(1 row)
+
 -- now it works
 BEGIN;
 DROP SUBSCRIPTION regress_testsub;
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 9e234ab8b3..1bc58887f7 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -109,6 +109,8 @@ COMMIT;

 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);

+\dRs+
+
 -- now it works
 BEGIN;
 DROP SUBSCRIPTION regress_testsub;

Re: pg_subscription.subslotname is wrongly marked NOT NULL

От
Tom Lane
Дата:
I wrote:
>> It's also a bit annoying that we have no mechanized checks that
>> would catch this inconsistency.  If JIT is going to be absolutely
>> dependent on NOT NULL markings being accurate, we can't really
>> have such a laissez-faire attitude to C code getting it wrong.

> It seems like at least in assert-enabled builds, we'd better have
> a cross-check for that.  I'm not sure where's the best place.

I concluded that we should put this into CatalogTupleInsert and
CatalogTupleUpdate.  The bootstrap data path already has a check
(see InsertOneNull()), and so does the executor, so we only need
to worry about tuples that're built manually by catalog manipulation
code.  I think all of that goes through these functions.  Hence,
as attached.

... and apparently, I should have done this task first, because
damn if it didn't immediately expose another bug of the same ilk.
pg_subscription_rel.srsublsn also needs to be marked nullable.

            regards, tom lane

diff --git a/src/backend/catalog/indexing.c b/src/backend/catalog/indexing.c
index d63fcf58cf..fe277f3ad3 100644
--- a/src/backend/catalog/indexing.c
+++ b/src/backend/catalog/indexing.c
@@ -167,6 +167,43 @@ CatalogIndexInsert(CatalogIndexState indstate, HeapTuple heapTuple)
     ExecDropSingleTupleTableSlot(slot);
 }

+/*
+ * Subroutine to verify that catalog constraints are honored.
+ *
+ * Tuples inserted via CatalogTupleInsert/CatalogTupleUpdate are generally
+ * "hand made", so that it's possible that they fail to satisfy constraints
+ * that would be checked if they were being inserted by the executor.  That's
+ * a coding error, so we only bother to check for it in assert-enabled builds.
+ */
+#ifdef USE_ASSERT_CHECKING
+
+static void
+CatalogTupleCheckConstraints(Relation heapRel, HeapTuple tup)
+{
+    /*
+     * Currently, the only constraints implemented for system catalogs are
+     * attnotnull constraints.
+     */
+    if (HeapTupleHasNulls(tup))
+    {
+        TupleDesc    tupdesc = RelationGetDescr(heapRel);
+        bits8       *bp = tup->t_data->t_bits;
+
+        for (int attnum = 0; attnum < tupdesc->natts; attnum++)
+        {
+            Form_pg_attribute thisatt = TupleDescAttr(tupdesc, attnum);
+
+            Assert(!(thisatt->attnotnull && att_isnull(attnum, bp)));
+        }
+    }
+}
+
+#else                            /* !USE_ASSERT_CHECKING */
+
+#define CatalogTupleCheckConstraints(heapRel, tup)  ((void) 0)
+
+#endif                            /* USE_ASSERT_CHECKING */
+
 /*
  * CatalogTupleInsert - do heap and indexing work for a new catalog tuple
  *
@@ -184,6 +221,8 @@ CatalogTupleInsert(Relation heapRel, HeapTuple tup)
 {
     CatalogIndexState indstate;

+    CatalogTupleCheckConstraints(heapRel, tup);
+
     indstate = CatalogOpenIndexes(heapRel);

     simple_heap_insert(heapRel, tup);
@@ -204,6 +243,8 @@ void
 CatalogTupleInsertWithInfo(Relation heapRel, HeapTuple tup,
                            CatalogIndexState indstate)
 {
+    CatalogTupleCheckConstraints(heapRel, tup);
+
     simple_heap_insert(heapRel, tup);

     CatalogIndexInsert(indstate, tup);
@@ -225,6 +266,8 @@ CatalogTupleUpdate(Relation heapRel, ItemPointer otid, HeapTuple tup)
 {
     CatalogIndexState indstate;

+    CatalogTupleCheckConstraints(heapRel, tup);
+
     indstate = CatalogOpenIndexes(heapRel);

     simple_heap_update(heapRel, otid, tup);
@@ -245,6 +288,8 @@ void
 CatalogTupleUpdateWithInfo(Relation heapRel, ItemPointer otid, HeapTuple tup,
                            CatalogIndexState indstate)
 {
+    CatalogTupleCheckConstraints(heapRel, tup);
+
     simple_heap_update(heapRel, otid, tup);

     CatalogIndexInsert(indstate, tup);

Re: pg_subscription.subslotname is wrongly marked NOT NULL

От
Tom Lane
Дата:
I wrote:
> pg_subscription_rel.srsublsn also needs to be marked nullable.

Not only is it wrongly marked attnotnull, but two of the three places
that read it are doing so unsafely (ie, as though it *were*
non-nullable).  So I think we'd better fix it as attached.

            regards, tom lane

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 80c183b235..13c871d2a8 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7631,7 +7631,9 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
        <structfield>srsublsn</structfield> <type>pg_lsn</type>
       </para>
       <para>
-       End LSN for <literal>s</literal> and <literal>r</literal> states.
+       Remote LSN of the state change used for synchronization coordination
+       when in <literal>s</literal> or <literal>r</literal> states,
+       otherwise null
       </para></entry>
      </row>
     </tbody>
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index e6afb3203e..90bf5cf0c6 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -452,13 +452,20 @@ GetSubscriptionRelations(Oid subid)
     {
         Form_pg_subscription_rel subrel;
         SubscriptionRelState *relstate;
+        Datum        d;
+        bool        isnull;

         subrel = (Form_pg_subscription_rel) GETSTRUCT(tup);

         relstate = (SubscriptionRelState *) palloc(sizeof(SubscriptionRelState));
         relstate->relid = subrel->srrelid;
         relstate->state = subrel->srsubstate;
-        relstate->lsn = subrel->srsublsn;
+        d = SysCacheGetAttr(SUBSCRIPTIONRELMAP, tup,
+                            Anum_pg_subscription_rel_srsublsn, &isnull);
+        if (isnull)
+            relstate->lsn = InvalidXLogRecPtr;
+        else
+            relstate->lsn = DatumGetLSN(d);

         res = lappend(res, relstate);
     }
@@ -504,13 +511,20 @@ GetSubscriptionNotReadyRelations(Oid subid)
     {
         Form_pg_subscription_rel subrel;
         SubscriptionRelState *relstate;
+        Datum        d;
+        bool        isnull;

         subrel = (Form_pg_subscription_rel) GETSTRUCT(tup);

         relstate = (SubscriptionRelState *) palloc(sizeof(SubscriptionRelState));
         relstate->relid = subrel->srrelid;
         relstate->state = subrel->srsubstate;
-        relstate->lsn = subrel->srsublsn;
+        d = SysCacheGetAttr(SUBSCRIPTIONRELMAP, tup,
+                            Anum_pg_subscription_rel_srsublsn, &isnull);
+        if (isnull)
+            relstate->lsn = InvalidXLogRecPtr;
+        else
+            relstate->lsn = DatumGetLSN(d);

         res = lappend(res, relstate);
     }
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index c44df04c72..f384f4e7fa 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -33,8 +33,18 @@ CATALOG(pg_subscription_rel,6102,SubscriptionRelRelationId)
     Oid            srsubid;        /* Oid of subscription */
     Oid            srrelid;        /* Oid of relation */
     char        srsubstate;        /* state of the relation in subscription */
-    XLogRecPtr    srsublsn;        /* remote lsn of the state change used for
-                                 * synchronization coordination */
+
+    /*
+     * Although srsublsn is a fixed-width type, it is allowed to be NULL, so
+     * we prevent direct C code access to it just as for a varlena field.
+     */
+#ifdef CATALOG_VARLEN            /* variable-length fields start here */
+
+    XLogRecPtr    srsublsn BKI_FORCE_NULL;    /* remote LSN of the state change
+                                             * used for synchronization
+                                             * coordination, or NULL if not
+                                             * valid */
+#endif
 } FormData_pg_subscription_rel;

 typedef FormData_pg_subscription_rel *Form_pg_subscription_rel;

Re: pg_subscription.subslotname is wrongly marked NOT NULL

От
Tom Lane
Дата:
Mopping this up ... the attached patch against v12 shows the portions
of 72eab84a5 and 0fa0b487b that I'm thinking of putting into v10-v12.

The doc changes, which just clarify that subslotname and srsublsn can
be null, should be uncontroversial.  The changes in pg_subscription.c
prevent it from accessing data that might not be there.  99.999% of
the time, that doesn't matter; we'd copy garbage into
SubscriptionRelState.lsn, but the callers shouldn't look at that field
in states where it's not valid.  However, it's possible that the code
could access data off the end of the heap page, and at least in theory
that could lead to a SIGSEGV.

What I'm not quite sure about is whether to add the BKI_FORCE_NULL
annotations to the headers or not.  There are some pros and cons:

* While Catalog.pm has had support for BKI_FORCE_NULL for quite some
time, we never used it in anger before yesterday.  It's easy to
check that it works, but I wonder whether anybody has third-party
analysis tools that look at the catalog headers and would get broken
because they didn't cover this.

* If we change these markings, then our own testing in the buildfarm
etc. will not reflect the state of affairs seen in many/most actual
v10-v12 installations.  The scope of code where it'd matter seems
pretty tiny, so I don't think there's a big risk, but there's more
than zero risk.  (In any case, I would not push this part until all
the buildfarm JIT critters have reported happiness with 798b4faef,
as that's the one specific spot where it surely does matter.)

* On the other side of the ledger, if we don't fix these markings
we cannot back-patch the additional assertions I proposed at [1].

I'm kind of leaning to committing this as shown and back-patching
the patch at [1], but certainly a case could be made in the other
direction.  Thoughts?

            regards, tom lane

[1] https://www.postgresql.org/message-id/298837.1595196283%40sss.pgh.pa.us

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 76ed2fbea8..582f6f66c6 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -6786,8 +6786,9 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
       <entry><structfield>subslotname</structfield></entry>
       <entry><type>name</type></entry>
       <entry></entry>
-      <entry>Name of the replication slot in the upstream database. Also used
-       for local replication origin name.</entry>
+      <entry>Name of the replication slot in the upstream database (also used
+       for the local replication origin name);
+       null represents <literal>NONE</literal></entry>
      </row>

      <row>
@@ -6869,7 +6870,9 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
       <entry><type>pg_lsn</type></entry>
       <entry></entry>
       <entry>
-       End LSN for <literal>s</literal> and <literal>r</literal> states.
+       Remote LSN of the state change used for synchronization coordination
+       when in <literal>s</literal> or <literal>r</literal> states,
+       otherwise null
       </entry>
      </row>
     </tbody>
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index afee2838cc..a3a8b4e599 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -457,13 +457,20 @@ GetSubscriptionRelations(Oid subid)
     {
         Form_pg_subscription_rel subrel;
         SubscriptionRelState *relstate;
+        Datum        d;
+        bool        isnull;

         subrel = (Form_pg_subscription_rel) GETSTRUCT(tup);

         relstate = (SubscriptionRelState *) palloc(sizeof(SubscriptionRelState));
         relstate->relid = subrel->srrelid;
         relstate->state = subrel->srsubstate;
-        relstate->lsn = subrel->srsublsn;
+        d = SysCacheGetAttr(SUBSCRIPTIONRELMAP, tup,
+                            Anum_pg_subscription_rel_srsublsn, &isnull);
+        if (isnull)
+            relstate->lsn = InvalidXLogRecPtr;
+        else
+            relstate->lsn = DatumGetLSN(d);

         res = lappend(res, relstate);
     }
@@ -509,13 +516,20 @@ GetSubscriptionNotReadyRelations(Oid subid)
     {
         Form_pg_subscription_rel subrel;
         SubscriptionRelState *relstate;
+        Datum        d;
+        bool        isnull;

         subrel = (Form_pg_subscription_rel) GETSTRUCT(tup);

         relstate = (SubscriptionRelState *) palloc(sizeof(SubscriptionRelState));
         relstate->relid = subrel->srrelid;
         relstate->state = subrel->srsubstate;
-        relstate->lsn = subrel->srsublsn;
+        d = SysCacheGetAttr(SUBSCRIPTIONRELMAP, tup,
+                            Anum_pg_subscription_rel_srsublsn, &isnull);
+        if (isnull)
+            relstate->lsn = InvalidXLogRecPtr;
+        else
+            relstate->lsn = DatumGetLSN(d);

         res = lappend(res, relstate);
     }
diff --git a/src/include/catalog/pg_subscription.h b/src/include/catalog/pg_subscription.h
index 3cb13d897e..796c8794ca 100644
--- a/src/include/catalog/pg_subscription.h
+++ b/src/include/catalog/pg_subscription.h
@@ -53,7 +53,7 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) BKI_SHARED_RELATION BKI_ROW
     text        subconninfo BKI_FORCE_NOT_NULL;

     /* Slot name on publisher */
-    NameData    subslotname;
+    NameData    subslotname BKI_FORCE_NULL;

     /* Synchronous commit setting for worker */
     text        subsynccommit BKI_FORCE_NOT_NULL;
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index f7df814a18..6f5607d604 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -34,8 +34,18 @@ CATALOG(pg_subscription_rel,6102,SubscriptionRelRelationId)
     Oid            srsubid;        /* Oid of subscription */
     Oid            srrelid;        /* Oid of relation */
     char        srsubstate;        /* state of the relation in subscription */
-    XLogRecPtr    srsublsn;        /* remote lsn of the state change used for
-                                 * synchronization coordination */
+
+    /*
+     * Although srsublsn is a fixed-width type, it is allowed to be NULL, so
+     * we prevent direct C code access to it just as for a varlena field.
+     */
+#ifdef CATALOG_VARLEN            /* variable-length fields start here */
+
+    XLogRecPtr    srsublsn BKI_FORCE_NULL;    /* remote LSN of the state change
+                                             * used for synchronization
+                                             * coordination, or NULL if not
+                                             * valid */
+#endif
 } FormData_pg_subscription_rel;

 typedef FormData_pg_subscription_rel *Form_pg_subscription_rel;

Re: pg_subscription.subslotname is wrongly marked NOT NULL

От
Tom Lane
Дата:
I wrote:
> * On the other side of the ledger, if we don't fix these markings
> we cannot back-patch the additional assertions I proposed at [1].

> I'm kind of leaning to committing this as shown and back-patching
> the patch at [1], but certainly a case could be made in the other
> direction.  Thoughts?

After further thought about that I realized that the assertion patch
could be kluged in the same way as we did in llvmjit_deform.c, and
that that would really be the only safe way to do it pre-v13.
Otherwise the assertions would trip in pre-existing databases,
which would not be nice.

So what I've done is to back-patch the assertions that way, and
*not* apply BKI_FORCE_NULL in the back branches.  The possible
downsides of doing that seem to outweigh the upside of making
the catalog state cleaner in new installations.

            regards, tom lane