Обсуждение: Timing of relcache inval at parallel worker init

Поиск
Список
Период
Сортировка

Timing of relcache inval at parallel worker init

От
Noah Misch
Дата:
While reviewing what became commit fe4d022, I was surprised at the sequence of
relfilenode values that RelationInitPhysicalAddr() computed for pg_class,
during ParallelWorkerMain(), when running the last command of this recipe:

  begin;
  cluster pg_class using pg_class_oid_index;
  set force_parallel_mode = 'regress';
  values (1);

There's $OLD_NODE (relfilenode in the committed relation map) and $NEW_NODE
(relfilenode in this transaction's active_local_updates).  The worker performs
RelationInitPhysicalAddr(pg_class) four times:

1) $OLD_NODE in BackgroundWorkerInitializeConnectionByOid().
2) $OLD_NODE in RelationCacheInvalidate() directly.
3) $OLD_NODE in RelationReloadNailed(), indirectly via RelationCacheInvalidate().
4) $NEW_NODE indirectly as part of the executor running the query.

I did expect $OLD_NODE in (1), since ParallelWorkerMain() calls
BackgroundWorkerInitializeConnectionByOid() before
StartParallelWorkerTransaction().  I expected $NEW_NODE in (2) and (3); that
didn't happen, because ParallelWorkerMain() calls InvalidateSystemCaches()
before RestoreRelationMap().  Let's move InvalidateSystemCaches() later.
Invalidation should follow any worker initialization step that changes the
results of relcache validation; otherwise, we'd need to ensure the
InvalidateSystemCaches() will not validate any relcache entry.  Invalidation
should precede any step that reads from a cache; otherwise, we'd need to redo
that step after inval.  (Currently, no step reads from a cache.)  Many steps,
e.g. AttachSerializableXact(), have no effect on relcache validation, so it's
arbitrary whether they happen before or after inval.  I'm putting inval as
late as possible, because I think it's easier to confirm that a step doesn't
read from a cache than to confirm that a step doesn't affect relcache
validation.  An also-reasonable alternative would be to move inval and its
prerequisites as early as possible.

For reasons described in the attached commit message, this doesn't have
user-visible consequences today.  Innocent-looking relcache.c changes might
upheave that, so I'm proposing this on robustness grounds.  No need to
back-patch.

Вложения

Re: Timing of relcache inval at parallel worker init

От
Kyotaro Horiguchi
Дата:
At Sat, 17 Oct 2020 04:53:06 -0700, Noah Misch <noah@leadboat.com> wrote in 
> While reviewing what became commit fe4d022, I was surprised at the sequence of
> relfilenode values that RelationInitPhysicalAddr() computed for pg_class,
> during ParallelWorkerMain(), when running the last command of this recipe:
> 
>   begin;
>   cluster pg_class using pg_class_oid_index;
>   set force_parallel_mode = 'regress';
>   values (1);
> 
> There's $OLD_NODE (relfilenode in the committed relation map) and $NEW_NODE
> (relfilenode in this transaction's active_local_updates).  The worker performs
> RelationInitPhysicalAddr(pg_class) four times:
> 
> 1) $OLD_NODE in BackgroundWorkerInitializeConnectionByOid().
> 2) $OLD_NODE in RelationCacheInvalidate() directly.
> 3) $OLD_NODE in RelationReloadNailed(), indirectly via RelationCacheInvalidate().
> 4) $NEW_NODE indirectly as part of the executor running the query.
> 
> I did expect $OLD_NODE in (1), since ParallelWorkerMain() calls
> BackgroundWorkerInitializeConnectionByOid() before
> StartParallelWorkerTransaction().  I expected $NEW_NODE in (2) and (3); that
> didn't happen, because ParallelWorkerMain() calls InvalidateSystemCaches()
> before RestoreRelationMap().  Let's move InvalidateSystemCaches() later.
> Invalidation should follow any worker initialization step that changes the
> results of relcache validation; otherwise, we'd need to ensure the
> InvalidateSystemCaches() will not validate any relcache entry.  Invalidation
> should precede any step that reads from a cache; otherwise, we'd need to redo
> that step after inval.  (Currently, no step reads from a cache.)  Many steps,
> e.g. AttachSerializableXact(), have no effect on relcache validation, so it's
> arbitrary whether they happen before or after inval.  I'm putting inval as

I agree to both the discussions.

> late as possible, because I think it's easier to confirm that a step doesn't
> read from a cache than to confirm that a step doesn't affect relcache
> validation.  An also-reasonable alternative would be to move inval and its
> prerequisites as early as possible.

The steps became moved before the invalidation by this patch seems to
be in the lower layer than snapshot, so it seems to be reasonable.

> For reasons described in the attached commit message, this doesn't have
> user-visible consequences today.  Innocent-looking relcache.c changes might
> upheave that, so I'm proposing this on robustness grounds.  No need to
> back-patch.

I'm not sure about the necessity but lower-to-upper initialization
order is neat. I agree about not back-patching.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Timing of relcache inval at parallel worker init

От
Robert Haas
Дата:
On Sat, Oct 17, 2020 at 7:53 AM Noah Misch <noah@leadboat.com> wrote:
> Let's move InvalidateSystemCaches() later.
> Invalidation should follow any worker initialization step that changes the
> results of relcache validation; otherwise, we'd need to ensure the
> InvalidateSystemCaches() will not validate any relcache entry.

The thinking behind the current placement was this: just before the
call to InvalidateSystemCaches(), we restore the active and
transaction snapshots. I think that we must now flush the caches
before anyone does any more lookups; otherwise, they might get wrong
answers. So, putting this code later makes the assumption that no such
lookups happen meanwhile. That feels a little risky to me; at the very
least, it should be clearly spelled out in the comments that no system
cache lookups can happen in the functions we call in the interim.
Would it be obvious to a future developer that e.g.
RestoreEnumBlacklist cannot perform any syscache lookups? It doesn't
seem so to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Timing of relcache inval at parallel worker init

От
Noah Misch
Дата:
On Wed, Oct 21, 2020 at 11:31:54AM -0400, Robert Haas wrote:
> On Sat, Oct 17, 2020 at 7:53 AM Noah Misch <noah@leadboat.com> wrote:
> > Let's move InvalidateSystemCaches() later.
> > Invalidation should follow any worker initialization step that changes the
> > results of relcache validation; otherwise, we'd need to ensure the
> > InvalidateSystemCaches() will not validate any relcache entry.
> 
> The thinking behind the current placement was this: just before the
> call to InvalidateSystemCaches(), we restore the active and
> transaction snapshots. I think that we must now flush the caches
> before anyone does any more lookups; otherwise, they might get wrong
> answers. So, putting this code later makes the assumption that no such
> lookups happen meanwhile. That feels a little risky to me; at the very
> least, it should be clearly spelled out in the comments that no system
> cache lookups can happen in the functions we call in the interim.

My comment edits did attempt that.  I could enlarge those comments, at the
risk of putting undue weight on the topic.  One could also arrange for an
assertion failure if something takes a snapshot in the unwelcome period,
between StartParallelWorkerTransaction() and InvalidateSystemCaches().
Looking closer, we have live bugs from lookups during RestoreGUCState():

$ echo "begin; create user alice; set session authorization alice; set force_parallel_mode = regress; select 1" | psql
-X
BEGIN
CREATE ROLE
SET
SET
ERROR:  role "alice" does not exist
CONTEXT:  while setting parameter "session_authorization" to "alice"

$ echo "begin; create text search configuration foo (copy=english); set default_text_search_config = foo; set
force_parallel_mode= regress; select 1" | psql -X
 
BEGIN
CREATE TEXT SEARCH CONFIGURATION
SET
SET
ERROR:  invalid value for parameter "default_text_search_config": "public.foo"
CONTEXT:  while setting parameter "default_text_search_config" to "public.foo"

I gather $SUBJECT is the tip of an iceberg, so I'm withdrawing the patch and
abandoning the topic.