Обсуждение: Patch to avoid orphaned dependencies
Hi,
This new thread is a follow-up of [1].
Problem description:
We have occasionally observed objects having an orphaned dependency, the most common case we have seen (if not the only one) is functions not linked to any namespaces.
A patch has been initially proposed to fix this particular (function-to-namespace) dependency (see [1]), but there could be much more scenarios (like the function-to-datatype one highlighted by Gilles in [1] that could lead to a function having an invalid parameter datatype).
As Tom said there are dozens more cases that would need to be considered, and a global approach to avoid those race conditions should be considered instead.
The attached patch is avoiding those race conditions globally by changing the dependency mechanism: we are using a dirty snapshot any time we’re about to create a pg_depend or pg_shdepend entry.
That way we can check if there is in-flight transactions that are affecting the dependency: if that’s the case, an error is being reported.
This approach has been chosen over another one that would have make use of the locking machinery.
The reason for this choice is to avoid possible slow down of typical DDL command, risk of deadlock, number of locks taken by transaction...
Implementation overview:
- A new catalog snapshot is added: DirtyCatalogSnapshot.
- This catalog snapshot is a dirty one to be able to look for in-flight dependencies.
- Its usage is controlled by a new UseDirtyCatalogSnapshot variable.
- Any time this variable is being set to true, then the next call to GetNonHistoricCatalogSnapshot() is returning the dirty snapshot.
- This snapshot is being used to check for in-flight dependencies and also to get the objects description to generate the error messages.
Testing:
Test 1
Session1:
postgres=# create schema tobeorph;
CREATE SCHEMA
postgres=# create table tobeorph.bdt (a int);
CREATE TABLE
postgres=# begin;
BEGIN
postgres=*# CREATE OR REPLACE FUNCTION tobeorph.bdttime() RETURNS TIMESTAMP AS $$
DECLARE outTS TIMESTAMP;
BEGIN perform pg_sleep(10); RETURN CURRENT_TIMESTAMP;
END;
$$ LANGUAGE plpgsql volatile;
CREATE FUNCTION
Session 1 does not commit, then session 2:
postgres=# drop schema tobeorph;
ERROR: cannot drop schema tobeorph because other objects depend on it
DETAIL: table tobeorph.bdt depends on schema tobeorph
function tobeorph.bdttime() (not yet committed) depends on schema tobeorph
HINT: DROP and DROP CASCADE won't work when there are uncommitted dependencies.
Test 2
Session 1:
postgres=# create schema toinsert;
CREATE SCHEMA
postgres=# begin;
BEGIN
postgres=*# drop schema toinsert;
DROP SCHEMA
Session 1 does not commit, then session 2:
postgres=# CREATE OR REPLACE FUNCTION toinsert.bdttime() RETURNS TIMESTAMP AS $$
DECLARE outTS TIMESTAMP;
BEGIN perform pg_sleep(10); RETURN CURRENT_TIMESTAMP;
END;
$$ LANGUAGE plpgsql volatile;
ERROR: cannot create function toinsert.bdttime() because it depends of other objects uncommitted dependencies
DETAIL: function toinsert.bdttime() depends on schema toinsert (dependency not yet committed)
HINT: CREATE won't work as long as there is uncommitted dependencies.
Test3
Session1:
psql -U toorph postgres
psql (14devel)
Type "help" for help.
postgres=> begin;
BEGIN
postgres=*> CREATE OR REPLACE FUNCTION bdttime() RETURNS TIMESTAMP AS $$
DECLARE outTS TIMESTAMP;
BEGIN perform pg_sleep(10); RETURN CURRENT_TIMESTAMP;
END;
$$ LANGUAGE plpgsql volatile;
CREATE FUNCTION
Session 1 does not commit, then session 2:
postgres=# drop owned by toorph;
ERROR: cannot drop objects owned by role toorph because other uncommitted objects depend on it
DETAIL: function public.bdttime() (not yet committed) depends on role toorph
HINT: Commit or rollback function public.bdttime() creation.
I'm creating a new commitfest entry for this patch.
Thanks
Bertrand
Вложения
Hello Bertrand, Le mardi 4 mai 2021, 11:55:43 CEST Drouvot, Bertrand a écrit : > > Implementation overview: > > * A new catalog snapshot is added: DirtyCatalogSnapshot. > * This catalog snapshot is a dirty one to be able to look for > in-flight dependencies. > * Its usage is controlled by a new UseDirtyCatalogSnapshot variable. > * Any time this variable is being set to true, then the next call to > GetNonHistoricCatalogSnapshot() is returning the dirty snapshot. > * This snapshot is being used to check for in-flight dependencies and > also to get the objects description to generate the error messages. > I quickly tested the patch, it behaves as advertised, and passes tests. Isolation tests should be added to demonstrate the issues it is solving. However, I am bit wary of this behaviour of setting the DirtyCatalogSnapshot global variable which is then reset after each snapshot acquisition: I'm having trouble understanding all the implications of that, if it would be possible to introduce an unforeseen snapshot before the one we actually want to be dirty. I don't want to derail this thread, but couldn't predicate locks on the pg_depend index pages corresponding to the dropped object / referenced objects work as a different approach ? I'm not familiar enough with them so maybe there is some fundamental misunderstanding on my end. -- Ronan Dunklau
Hi Ronan, On 9/17/21 10:09 AM, Ronan Dunklau wrote: > Hello Bertrand, > > Le mardi 4 mai 2021, 11:55:43 CEST Drouvot, Bertrand a écrit : >> Implementation overview: >> >> * A new catalog snapshot is added: DirtyCatalogSnapshot. >> * This catalog snapshot is a dirty one to be able to look for >> in-flight dependencies. >> * Its usage is controlled by a new UseDirtyCatalogSnapshot variable. >> * Any time this variable is being set to true, then the next call to >> GetNonHistoricCatalogSnapshot() is returning the dirty snapshot. >> * This snapshot is being used to check for in-flight dependencies and >> also to get the objects description to generate the error messages. >> > I quickly tested the patch, it behaves as advertised, and passes tests. Thanks for looking at it! > > Isolation tests should be added to demonstrate the issues it is solving. Good point. I'll have a look. > > However, I am bit wary of this behaviour of setting the DirtyCatalogSnapshot > global variable which is then reset after each snapshot acquisition: I'm > having trouble understanding all the implications of that, if it would be > possible to introduce an unforeseen snapshot before the one we actually want > to be dirty. I don't think that could be possible as long as: - this is a per backend variable - we pay attention where we set it to true But i might be missing something. Do you have any corner cases in mind? > I don't want to derail this thread, but couldn't predicate locks on the > pg_depend index pages corresponding to the dropped object / referenced objects > work as a different approach ? I'm fine to have a look at another approach if needed, but does it mean we are not happy with the current approach proposal? Thanks Bertrand
> On 20 Sep 2021, at 12:50, Drouvot, Bertrand <bdrouvot@amazon.com> wrote: > > Hi Ronan, > > On 9/17/21 10:09 AM, Ronan Dunklau wrote: >> Hello Bertrand, >> >> Le mardi 4 mai 2021, 11:55:43 CEST Drouvot, Bertrand a écrit : >>> Implementation overview: >>> >>> * A new catalog snapshot is added: DirtyCatalogSnapshot. >>> * This catalog snapshot is a dirty one to be able to look for >>> in-flight dependencies. >>> * Its usage is controlled by a new UseDirtyCatalogSnapshot variable. >>> * Any time this variable is being set to true, then the next call to >>> GetNonHistoricCatalogSnapshot() is returning the dirty snapshot. >>> * This snapshot is being used to check for in-flight dependencies and >>> also to get the objects description to generate the error messages. >>> >> I quickly tested the patch, it behaves as advertised, and passes tests. > > Thanks for looking at it! > >> >> Isolation tests should be added to demonstrate the issues it is solving. > > Good point. I'll have a look. > >> >> However, I am bit wary of this behaviour of setting the DirtyCatalogSnapshot >> global variable which is then reset after each snapshot acquisition: I'm >> having trouble understanding all the implications of that, if it would be >> possible to introduce an unforeseen snapshot before the one we actually want >> to be dirty. > > I don't think that could be possible as long as: > > - this is a per backend variable > > - we pay attention where we set it to true > > But i might be missing something. > > Do you have any corner cases in mind? > >> I don't want to derail this thread, but couldn't predicate locks on the >> pg_depend index pages corresponding to the dropped object / referenced objects >> work as a different approach ? > > I'm fine to have a look at another approach if needed, but does it mean we are not happy with the current approach proposal? This patch fails to apply as a whole, with the parts applying showing quite large offsets. Have you had the chance to look at the isolation test asked for above? -- Daniel Gustafsson https://vmware.com/
Hi, On 11/17/21 2:25 PM, Daniel Gustafsson wrote: > > This patch fails to apply as a whole, with the parts applying showing quite > large offsets. Thanks for the warning, please find attached a rebase of it. > Have you had the chance to look at the isolation test asked for > above? Not yet, but I'll look at it for sure. Thanks Bertrand
Вложения
Hi, On 11/23/21 4:22 PM, Drouvot, Bertrand wrote: > Hi, > > On 11/17/21 2:25 PM, Daniel Gustafsson wrote: >> >> This patch fails to apply as a whole, with the parts applying showing >> quite >> large offsets. > Thanks for the warning, please find attached a rebase of it. >> Have you had the chance to look at the isolation test asked for >> above? > > Not yet, but I'll look at it for sure. > Please find enclosed v1-0003-orphaned-dependencies.patch, that contains: - a mandatory rebase - a few isolation tests added in src/test/modules/test_dependencies (but I'm not sure at all that's the right place to add them, is it?) Thanks Bertrand
Вложения
Hi, On 2021-12-17 14:19:18 +0100, Drouvot, Bertrand wrote: > Please find enclosed v1-0003-orphaned-dependencies.patch, that contains: > > - a mandatory rebase > > - a few isolation tests added in src/test/modules/test_dependencies (but I'm > not sure at all that's the right place to add them, is it?) This fails on windows w/ msvc: https://cirrus-ci.com/task/5368174125252608?logs=configure#L102 https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.157904#L12 Greetings, Andres Freund
Hi, On 12/31/21 7:03 AM, Andres Freund wrote: > Hi, > > On 2021-12-17 14:19:18 +0100, Drouvot, Bertrand wrote: >> Please find enclosed v1-0003-orphaned-dependencies.patch, that contains: >> >> - a mandatory rebase >> >> - a few isolation tests added in src/test/modules/test_dependencies (but I'm >> not sure at all that's the right place to add them, is it?) > This fails on windows w/ msvc: > > https://cirrus-ci.com/task/5368174125252608?logs=configure#L102 > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.157904#L12 Thanks Andres for the warning. Please find enclosed v1-0004-orphaned-dependencies.patch that addresses the issue. Thanks Bertrand
Вложения
+
+
+ dirtySnapshot = GetCatalogSnapshot(RelationGetRelid(*depRel));
Bertand, do you think this has any chance of making it into v15? If not, are you alright with adjusting the commitfest entry to v16 and moving it to the next commitfest? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Nathan Bossart <nathandbossart@gmail.com> writes: > Bertand, do you think this has any chance of making it into v15? If not, > are you alright with adjusting the commitfest entry to v16 and moving it to > the next commitfest? I looked this over briefly, and IMO it should have no chance of being committed in anything like this form. The lesser problem is that (as already noted) the reliance on a global variable that changes catalog lookup semantics is just unbelievably scary. An example of the possible consequences here is that a syscache entry could get made while that's set, containing a row that we should not be able to see yet, and indeed might never get committed at all. Perhaps that could be addressed by abandoning the patch's ambition to tell you the details of an uncommitted object (which would have race conditions anyway), so that *only* reads of pg_[sh]depend itself need be dirty. The bigger problem is that it fails to close the race condition that it's intending to solve. This patch will catch a race like this: Session doing DROP Session doing CREATE DROP begins CREATE makes a dependent catalog entry DROP scans for dependent objects CREATE commits DROP removes catalog entry DROP commits However, it will not catch this slightly different timing: Session doing DROP Session doing CREATE DROP begins DROP scans for dependent objects CREATE makes a dependent catalog entry CREATE commits DROP removes catalog entry DROP commits So I don't see that we've moved the goalposts very far at all. Realistically, if we want to prevent this type of problem, then all creation DDL will have to take a lock on each referenced object that'd conflict with a lock taken by DROP. This might not be out of reach: I think we do already take such locks while dropping objects. The reference-side lock could be taken by the recordDependency mechanism itself, ensuring that we don't miss anything; and that would also allow us to not bother taking such a lock on pinned objects, which'd greatly cut the cost (though not to zero). regards, tom lane
On Wed, Mar 23, 2022 at 12:49:06PM -0400, Tom Lane wrote: > Nathan Bossart <nathandbossart@gmail.com> writes: >> Bertand, do you think this has any chance of making it into v15? If not, >> are you alright with adjusting the commitfest entry to v16 and moving it to >> the next commitfest? > > I looked this over briefly, and IMO it should have no chance of being > committed in anything like this form. I marked the commitfest entry as waiting-on-author, set the target version to v16, and moved it to the next commitfest. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
This entry has been waiting on author input for a while (our current threshold is roughly two weeks), so I've marked it Returned with Feedback. Once you think the patchset is ready for review again, you (or any interested party) can resurrect the patch entry by visiting https://commitfest.postgresql.org/38/3106/ and changing the status to "Needs Review", and then changing the status again to "Move to next CF". (Don't forget the second step; hopefully we will have streamlined this in the near future!) Thanks, --Jacob
Hi, On Wed, Mar 23, 2022 at 12:49:06PM -0400, Tom Lane wrote: > Realistically, if we want to prevent this type of problem, then all > creation DDL will have to take a lock on each referenced object that'd > conflict with a lock taken by DROP. This might not be out of reach: > I think we do already take such locks while dropping objects. The > reference-side lock could be taken by the recordDependency mechanism > itself, ensuring that we don't miss anything; and that would also > allow us to not bother taking such a lock on pinned objects, which'd > greatly cut the cost (though not to zero). Thanks for the idea (and sorry for the delay replying to it)! I had a look at it and just created a new thread [1] based on your proposal. [1]: https://www.postgresql.org/message-id/flat/ZiYjn0eVc7pxVY45%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com