Hi,
Andres Freund a écrit :
> On 2021-01-21 16:23:58 +0100, Denis Laxalde wrote:
> > We found an issue in pg_upgrade on a cluster with a third-party
> > background worker. The upgrade goes fine, but the new cluster is then in
> > an inconsistent state. The background worker comes from the PoWA
> > extension but the issue does not appear to related to this particular
> > code.
>
> Well, it does imply that that backgrounder did something, as the pure
> existence of a bgworker shouldn't affect
>
> anything. Presumably the issue is that the bgworker actually does
> transactional writes, which causes problems because the xids /
> multixacts from early during pg_upgrade won't actually be valid after we
> do pg_resetxlog etc.
>
>
> > As a solution, it seems that, for similar reasons that we restrict
> > socket access to prevent accidental connections (from commit
> > f763b77193), we should also prevent background workers to start at this
> > step.
>
> I think that'd have quite the potential for negative impact - imagine
> extensions that refuse to be loaded outside of shared_preload_libraries
> (e.g. because they need to allocate shared memory) but that are required
> during the course of pg_upgrade (e.g. because it's a tableam, a PL or
> such). Those libraries will then tried to be loaded during the upgrade
> (due to the _PG_init() hook being called when functions from the
> extension are needed, e.g. the tableam or PL handler).
>
> Nor is it clear to me that the only way this would be problematic is
> with shared_preload_libraries. A library in local_preload_libraries, or
> a demand loaded library can trigger bgworkers (or database writes in
> some other form) as well.
Thank you for those insights!
> I wonder if we could
>
> a) set default_transaction_read_only to true, and explicitly change it
> in the sessions that need that.
> b) when in binary upgrade mode / -b, error out on all wal writes in
> sessions that don't explicitly set a session-level GUC to allow
> writes.
Solution "a" appears to be enough to solve the problem described in my
initial email. See attached patch implementing this.
Cheers,
Denis