Recovering from detoast-related catcache invalidations

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Recovering from detoast-related catcache invalidations
Дата
Msg-id 1393953.1698353013@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: Recovering from detoast-related catcache invalidations  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
In bug #18163 [1], Alexander proved the misgivings I had in [2]
about catcache detoasting being possibly unsafe:

>> BTW, while nosing around I found what seems like a very nasty related
>> bug.  Suppose that a catalog tuple being loaded into syscache contains
>> some toasted fields.  CatalogCacheCreateEntry will flatten the tuple,
>> involving fetches from toast tables that will certainly cause
>> AcceptInvalidationMessages calls.  What if one of those should have
>> invalidated this tuple?  We will not notice, because it's not in
>> the hashtable yet.  When we do add it, we will mark it not-dead,
>> meaning that the stale entry looks fine and could persist for a long
>> while.

Attached is a POC patch for fixing this.  The idea is that if we get
an invalidation while trying to detoast a catalog tuple, we should
go around and re-read the tuple a second time to get an up-to-date
version (or, possibly, discover that it no longer exists).  In the
case of SearchCatCacheList, we have to drop and reload the entire
catcache list, but fortunately not a lot of new code is needed.

The detection of "get an invalidation" could be refined: what I did
here is to check for any advance of SharedInvalidMessageCounter,
which clearly will have a significant number of false positives.
However, the only way I see to make that a lot better is to
temporarily create a placeholder catcache entry (probably a negative
one) with the same keys, and then see if it got marked dead.
This seems a little expensive, plus I'm afraid that it'd be actively
wrong in the recursive-lookup cases that the existing comment in
SearchCatCacheMiss is talking about (that is, the catcache entry
might mislead any recursive lookup that happens).

Moreover, if we did do something like that then the new code
paths would be essentially untested.  As the patch stands, the
reload path seems to get taken 10 to 20 times during a
"make installcheck-parallel" run of the core regression tests
(out of about 150 times that catcache detoasting is required).
Probably all of those are false-positive cases, but at least
they're exercising the logic.

So I'm inclined to leave it like this, but perhaps somebody
else will have a different opinion.

(BTW, there's a fair amount of existing catcache.c code that
will need to be indented another tab stop, but in the interests
of keeping the patch legible I didn't do that yet.)

Comments?

            regards, tom lane

[1] https://www.postgresql.org/message-id/18163-859bad19a43edcf6%40postgresql.org
[2] https://www.postgresql.org/message-id/1389919.1697144487%40sss.pgh.pa.us

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 1aacb736c2..8255162a72 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -1319,6 +1319,7 @@ SearchCatCacheMiss(CatCache *cache,
     SysScanDesc scandesc;
     HeapTuple    ntp;
     CatCTup    *ct;
+    bool        stale;
     Datum        arguments[CATCACHE_MAXKEYS];

     /* Initialize local parameter array */
@@ -1327,16 +1328,6 @@ SearchCatCacheMiss(CatCache *cache,
     arguments[2] = v3;
     arguments[3] = v4;

-    /*
-     * Ok, need to make a lookup in the relation, copy the scankey and fill
-     * out any per-call fields.
-     */
-    memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys);
-    cur_skey[0].sk_argument = v1;
-    cur_skey[1].sk_argument = v2;
-    cur_skey[2].sk_argument = v3;
-    cur_skey[3].sk_argument = v4;
-
     /*
      * Tuple was not found in cache, so we have to try to retrieve it directly
      * from the relation.  If found, we will add it to the cache; if not
@@ -1351,9 +1342,28 @@ SearchCatCacheMiss(CatCache *cache,
      * will eventually age out of the cache, so there's no functional problem.
      * This case is rare enough that it's not worth expending extra cycles to
      * detect.
+     *
+     * Another case, which we *must* handle, is that the tuple could become
+     * outdated during CatalogCacheCreateEntry's attempt to detoast it (since
+     * AcceptInvalidationMessages can run during TOAST table access).  We do
+     * not want to return already-stale catcache entries, so we loop around
+     * and do the table scan again if that happens.
      */
     relation = table_open(cache->cc_reloid, AccessShareLock);

+    do
+    {
+        /*
+         * Ok, need to make a lookup in the relation, copy the scankey and
+         * fill out any per-call fields.  (We must re-do this when retrying,
+         * because systable_beginscan scribbles on the scankey.)
+         */
+        memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys);
+        cur_skey[0].sk_argument = v1;
+        cur_skey[1].sk_argument = v2;
+        cur_skey[2].sk_argument = v3;
+        cur_skey[3].sk_argument = v4;
+
     scandesc = systable_beginscan(relation,
                                   cache->cc_indexoid,
                                   IndexScanOK(cache, cur_skey),
@@ -1362,12 +1372,19 @@ SearchCatCacheMiss(CatCache *cache,
                                   cur_skey);

     ct = NULL;
+    stale = false;

     while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
     {
         ct = CatalogCacheCreateEntry(cache, ntp, arguments,
                                      hashValue, hashIndex,
                                      false);
+        /* upon failure, we must start the scan over */
+        if (ct == NULL)
+        {
+            stale = true;
+            break;
+        }
         /* immediately set the refcount to 1 */
         ResourceOwnerEnlargeCatCacheRefs(CurrentResourceOwner);
         ct->refcount++;
@@ -1376,6 +1393,7 @@ SearchCatCacheMiss(CatCache *cache,
     }

     systable_endscan(scandesc);
+    } while (stale);

     table_close(relation, AccessShareLock);

@@ -1398,6 +1416,9 @@ SearchCatCacheMiss(CatCache *cache,
                                      hashValue, hashIndex,
                                      true);

+        /* Creating a negative cache entry shouldn't fail */
+        Assert(ct != NULL);
+
         CACHE_elog(DEBUG2, "SearchCatCache(%s): Contains %d/%d tuples",
                    cache->cc_relname, cache->cc_ntup, CacheHdr->ch_ntup);
         CACHE_elog(DEBUG2, "SearchCatCache(%s): put neg entry in bucket %d",
@@ -1607,6 +1628,7 @@ SearchCatCacheList(CatCache *cache,
      */
     ResourceOwnerEnlargeCatCacheListRefs(CurrentResourceOwner);

+    /* ctlist must be valid throughout the PG_TRY block */
     ctlist = NIL;

     PG_TRY();
@@ -1614,19 +1636,23 @@ SearchCatCacheList(CatCache *cache,
         ScanKeyData cur_skey[CATCACHE_MAXKEYS];
         Relation    relation;
         SysScanDesc scandesc;
-
-        /*
-         * Ok, need to make a lookup in the relation, copy the scankey and
-         * fill out any per-call fields.
-         */
-        memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys);
-        cur_skey[0].sk_argument = v1;
-        cur_skey[1].sk_argument = v2;
-        cur_skey[2].sk_argument = v3;
-        cur_skey[3].sk_argument = v4;
+        bool        stale;

         relation = table_open(cache->cc_reloid, AccessShareLock);

+        do
+        {
+            /*
+             * Ok, need to make a lookup in the relation, copy the scankey and
+             * fill out any per-call fields.  (We must re-do this when
+             * retrying, because systable_beginscan scribbles on the scankey.)
+             */
+            memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys);
+            cur_skey[0].sk_argument = v1;
+            cur_skey[1].sk_argument = v2;
+            cur_skey[2].sk_argument = v3;
+            cur_skey[3].sk_argument = v4;
+
         scandesc = systable_beginscan(relation,
                                       cache->cc_indexoid,
                                       IndexScanOK(cache, cur_skey),
@@ -1637,6 +1663,8 @@ SearchCatCacheList(CatCache *cache,
         /* The list will be ordered iff we are doing an index scan */
         ordered = (scandesc->irel != NULL);

+        stale = false;
+
         while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
         {
             uint32        hashValue;
@@ -1682,6 +1710,30 @@ SearchCatCacheList(CatCache *cache,
                 ct = CatalogCacheCreateEntry(cache, ntp, arguments,
                                              hashValue, hashIndex,
                                              false);
+                /* upon failure, we must start the scan over */
+                if (ct == NULL)
+                {
+                    /*
+                     * Release refcounts on any items we already had.  We dare
+                     * not try to free them if they're now unreferenced, since
+                     * an error while doing that would result in the PG_CATCH
+                     * below doing extra refcount decrements.  Besides, we'll
+                     * likely re-adopt those items in the next iteration, so
+                     * it's not worth complicating matters to try to get rid
+                     * of them.
+                     */
+                    foreach(ctlist_item, ctlist)
+                    {
+                        ct = (CatCTup *) lfirst(ctlist_item);
+                        Assert(ct->c_list == NULL);
+                        Assert(ct->refcount > 0);
+                        ct->refcount--;
+                    }
+                    /* Reset ctlist in preparation for new try */
+                    ctlist = NIL;
+                    stale = true;
+                    break;
+                }
             }

             /* Careful here: add entry to ctlist, then bump its refcount */
@@ -1691,6 +1743,7 @@ SearchCatCacheList(CatCache *cache,
         }

         systable_endscan(scandesc);
+        } while (stale);

         table_close(relation, AccessShareLock);

@@ -1797,6 +1850,10 @@ ReleaseCatCacheList(CatCList *list)
  * CatalogCacheCreateEntry
  *        Create a new CatCTup entry, copying the given HeapTuple and other
  *        supplied data into it.  The new entry initially has refcount 0.
+ *
+ * Returns NULL if we attempt to detoast the tuple and observe that it
+ * became stale.  (This cannot happen for a negative entry.)  Caller must
+ * retry the tuple lookup in that case.
  */
 static CatCTup *
 CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
@@ -1820,9 +1877,24 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
          * entry, and also protects us against the possibility of the toast
          * tuples being freed before we attempt to fetch them, in case of
          * something using a slightly stale catcache entry.
+         *
+         * If we see SharedInvalidMessageCounter advance during detoasting,
+         * assume that the tuple we are interested in might have been
+         * invalidated, and report failure.  (Since we don't yet have a
+         * catcache entry representing the tuple, it's impossible to tell for
+         * sure, and we have to be conservative.)
          */
         if (HeapTupleHasExternal(ntp))
+        {
+            uint64        inval_count = SharedInvalidMessageCounter;
+
             dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
+            if (inval_count != SharedInvalidMessageCounter)
+            {
+                heap_freetuple(dtp);
+                return NULL;
+            }
+        }
         else
             dtp = ntp;


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alena Rybakina
Дата:
Сообщение: Re: POC, WIP: OR-clause support for indexes
Следующее
От: Nikita Malakhov
Дата:
Сообщение: Re: RFC: Pluggable TOAST