Обсуждение: [BUG] pg_dump does not properly deal with BEGIN ATOMIC function
Hi,
What appears to be a pg_dump/pg_restore bug was observed with the new
BEGIN ATOMIC function body syntax introduced in Postgres 14.
Dependencies inside a BEGIN ATOMIC function cannot be resolved
if those dependencies are dumped after the function body. The repro
case is when a primary key constraint is used in a ON CONFLICT ON CONSTRAINT
used within the function.
With the attached repro, pg_restore fails with
pg_restore: error: could not execute query: ERROR: constraint "a_pkey" for table "a" does not exist
Command was: CREATE FUNCTION public.a_f(c1_in text, c2 integer DEFAULT 60) RETURNS void
I am not sure if the answer if to dump functions later on in the process.
Would appreciate some feedback on this issue.
Regards,
Sami Imseih
Amazon Web Services (AWS)
Вложения
"Imseih (AWS), Sami" <simseih@amazon.com> writes: > With the attached repro, pg_restore fails with > pg_restore: error: could not execute query: ERROR: constraint "a_pkey" for table "a" does not exist > Command was: CREATE FUNCTION public.a_f(c1_in text, c2 integer DEFAULT 60) RETURNS void Hmph. The other thing worth noticing is that pg_dump prints a warning: pg_dump: warning: could not resolve dependency loop among these items: or with -v: pg_dump: warning: could not resolve dependency loop among these items: pg_dump: FUNCTION a_f (ID 218 OID 40664) pg_dump: CONSTRAINT a_pkey (ID 4131 OID 40663) pg_dump: POST-DATA BOUNDARY (ID 4281) pg_dump: TABLE DATA a (ID 4278 OID 40657) pg_dump: PRE-DATA BOUNDARY (ID 4280) So it's lacking a rule to tell it what to do in this case, and the default is the wrong way around. I think we need to fix it in about the same way as the equivalent case for matviews, which leads to the attached barely-tested patch. BTW, now that I see a case the default printout here seems completely ridiculous. I think we need to do pg_log_warning("could not resolve dependency loop among these items:"); for (i = 0; i < nLoop; i++) { char buf[1024]; describeDumpableObject(loop[i], buf, sizeof(buf)); - pg_log_info(" %s", buf); + pg_log_warning(" %s", buf); } but I didn't actually change that in the attached. regards, tom lane diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 3af97a6039..689a3acff5 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -6085,6 +6085,7 @@ getAggregates(Archive *fout, int *numAggs) agginfo[i].aggfn.argtypes, agginfo[i].aggfn.nargs); } + agginfo[i].aggfn.postponed_def = false; /* might get set during sort */ /* Decide whether we want to dump it */ selectDumpableObject(&(agginfo[i].aggfn.dobj), fout); @@ -6283,6 +6284,7 @@ getFuncs(Archive *fout, int *numFuncs) parseOidArray(PQgetvalue(res, i, i_proargtypes), finfo[i].argtypes, finfo[i].nargs); } + finfo[i].postponed_def = false; /* might get set during sort */ /* Decide whether we want to dump it */ selectDumpableObject(&(finfo[i].dobj), fout); @@ -12168,7 +12170,8 @@ dumpFunc(Archive *fout, const FuncInfo *finfo) .namespace = finfo->dobj.namespace->dobj.name, .owner = finfo->rolname, .description = keyword, - .section = SECTION_PRE_DATA, + .section = finfo->postponed_def ? + SECTION_POST_DATA : SECTION_PRE_DATA, .createStmt = q->data, .dropStmt = delqry->data)); diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h index ed6ce41ad7..bc8f2ec36d 100644 --- a/src/bin/pg_dump/pg_dump.h +++ b/src/bin/pg_dump/pg_dump.h @@ -227,6 +227,7 @@ typedef struct _funcInfo int nargs; Oid *argtypes; Oid prorettype; + bool postponed_def; /* function must be postponed into post-data */ } FuncInfo; /* AggInfo is a superset of FuncInfo */ diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c index 745578d855..759be027c8 100644 --- a/src/bin/pg_dump/pg_dump_sort.c +++ b/src/bin/pg_dump/pg_dump_sort.c @@ -868,6 +868,28 @@ repairMatViewBoundaryMultiLoop(DumpableObject *boundaryobj, } } +/* + * If a function is involved in a multi-object loop, we can't currently fix + * that by splitting it into two DumpableObjects. As a stopgap, we try to fix + * it by dropping the constraint that the function be dumped in the pre-data + * section. This is sufficient to handle cases where a function depends on + * some unique index, as can happen if it has a GROUP BY for example. + */ +static void +repairFunctionBoundaryMultiLoop(DumpableObject *boundaryobj, + DumpableObject *nextobj) +{ + /* remove boundary's dependency on object after it in loop */ + removeObjectDependency(boundaryobj, nextobj->dumpId); + /* if that object is a function, mark it as postponed into post-data */ + if (nextobj->objType == DO_FUNC) + { + FuncInfo *nextinfo = (FuncInfo *) nextobj; + + nextinfo->postponed_def = true; + } +} + /* * Because we make tables depend on their CHECK constraints, while there * will be an automatic dependency in the other direction, we need to break @@ -1062,6 +1084,28 @@ repairDependencyLoop(DumpableObject **loop, } } + /* Indirect loop involving function and data boundary */ + if (nLoop > 2) + { + for (i = 0; i < nLoop; i++) + { + if (loop[i]->objType == DO_FUNC) + { + for (j = 0; j < nLoop; j++) + { + if (loop[j]->objType == DO_PRE_DATA_BOUNDARY) + { + DumpableObject *nextobj; + + nextobj = (j < nLoop - 1) ? loop[j + 1] : loop[0]; + repairFunctionBoundaryMultiLoop(loop[j], nextobj); + return; + } + } + } + } + } + /* Table and CHECK constraint */ if (nLoop == 2 && loop[0]->objType == DO_TABLE &&
or with -v:
pg_dump: warning: could not resolve dependency loop among these items:
pg_dump: FUNCTION a_f (ID 218 OID 40664)
pg_dump: CONSTRAINT a_pkey (ID 4131 OID 40663)
pg_dump: POST-DATA BOUNDARY (ID 4281)
pg_dump: TABLE DATA a (ID 4278 OID 40657)
pg_dump: PRE-DATA BOUNDARY (ID 4280)
...
BTW, now that I see a case the default printout here seems
completely ridiculous. I think we need to do
pg_log_warning("could not resolve dependency loop among these items:");
for (i = 0; i < nLoop; i++)
{
char buf[1024];
describeDumpableObject(loop[i], buf, sizeof(buf));
- pg_log_info(" %s", buf);
+ pg_log_warning(" %s", buf);
}
> So it's lacking a rule to tell it what to do in this case, and the > default is the wrong way around. I think we need to fix it in > about the same way as the equivalent case for matviews, which > leads to the attached barely-tested patch. Thanks for the patch! A test on the initially reported use case and some other cases show it does the expected. Some minor comments I have: 1/ + agginfo[i].aggfn.postponed_def = false; /* might get set during sort */ This is probably not needed as it seems that we can only get into this situation when function dependencies are tracked. This is either the argument or results types of a function which are already handled correctly, or when the function body is examined as is the case with BEGIN ATOMIC. 2/ Instead of + * section. This is sufficient to handle cases where a function depends on + * some unique index, as can happen if it has a GROUP BY for example. + */ The following description makes more sense. + * section. This is sufficient to handle cases where a function depends on + * some constraints, as can happen if a BEGIN ATOMIC function + * references a constraint directly. 3/ The docs in https://www.postgresql.org/docs/current/ddl-depend.html should be updated. The entire section after "For user-defined functions.... " there is no mention of BEGIN ATOMIC as one way that the function body can be examined for dependencies. This could be tracked in a separate doc update patch. What do you think? > BTW, now that I see a case the default printout here seems > completely ridiculous. I think we need to do > describeDumpableObject(loop[i], buf, sizeof(buf)); > - pg_log_info(" %s", buf); > + pg_log_warning(" %s", buf); > } Not sure I like this more than what is there now. The current indentation in " pg_dump: " makes it obvious that these lines are details for the warning message. Additional "warning" messages will be confusing. Regards, Sami Imseih Amazon Web Services (AWS)
Kirk Wolak <wolakk@gmail.com> writes: > On Fri, Jun 2, 2023 at 8:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> BTW, now that I see a case the default printout here seems >> completely ridiculous. I think we need to do >> - pg_log_info(" %s", buf); >> + pg_log_warning(" %s", buf); > If I comprehend the suggestion, it will label each line with a warning. > Which implies I have 6 Warnings. Right, I'd forgotten that pg_log_warning() will interpose "warning:". Attached are two more-carefully-thought-out suggestions. The easy way is to use pg_log_warning_detail(), which produces output like pg_dump: warning: could not resolve dependency loop among these items: pg_dump: detail: FUNCTION a_f (ID 216 OID 40532) pg_dump: detail: CONSTRAINT a_pkey (ID 3466 OID 40531) pg_dump: detail: POST-DATA BOUNDARY (ID 3612) pg_dump: detail: TABLE DATA a (ID 3610 OID 40525) pg_dump: detail: PRE-DATA BOUNDARY (ID 3611) Alternatively, we could assemble the details by hand, as in the second patch, producing pg_dump: warning: could not resolve dependency loop among these items: FUNCTION a_f (ID 216 OID 40532) CONSTRAINT a_pkey (ID 3466 OID 40531) POST-DATA BOUNDARY (ID 3612) TABLE DATA a (ID 3610 OID 40525) PRE-DATA BOUNDARY (ID 3611) I'm not really sure which of these I like better. The first one is a much simpler code change, and there is some value in labeling the output like that. The second patch's output seems less cluttered, but it's committing a modularity sin by embedding formatting knowledge at the caller level. Thoughts? BTW, there is a similar abuse of pg_log_info just a few lines above this, and probably others elsewhere. I won't bother writing patches for other places till we have agreement on what the output ought to look like. regards, tom lane diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c index 745578d855..3c66b8bf1a 100644 --- a/src/bin/pg_dump/pg_dump_sort.c +++ b/src/bin/pg_dump/pg_dump_sort.c @@ -1253,7 +1253,7 @@ repairDependencyLoop(DumpableObject **loop, char buf[1024]; describeDumpableObject(loop[i], buf, sizeof(buf)); - pg_log_info(" %s", buf); + pg_log_warning_detail(" %s", buf); } if (nLoop > 1) diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c index 745578d855..f0eef849d3 100644 --- a/src/bin/pg_dump/pg_dump_sort.c +++ b/src/bin/pg_dump/pg_dump_sort.c @@ -971,6 +971,7 @@ static void repairDependencyLoop(DumpableObject **loop, int nLoop) { + PQExpBuffer msgbuf; int i, j; @@ -1247,14 +1248,17 @@ repairDependencyLoop(DumpableObject **loop, * If we can't find a principled way to break the loop, complain and break * it in an arbitrary fashion. */ - pg_log_warning("could not resolve dependency loop among these items:"); + msgbuf = createPQExpBuffer(); for (i = 0; i < nLoop; i++) { char buf[1024]; describeDumpableObject(loop[i], buf, sizeof(buf)); - pg_log_info(" %s", buf); + appendPQExpBuffer(msgbuf, "\n %s", buf); } + pg_log_warning("could not resolve dependency loop among these items:%s", + msgbuf->data); + destroyPQExpBuffer(msgbuf); if (nLoop > 1) removeObjectDependency(loop[0], loop[1]->dumpId);
Kirk Wolak <wolakk@gmail.com> writes:
> On Fri, Jun 2, 2023 at 8:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> If I comprehend the suggestion, it will label each line with a warning.
> Which implies I have 6 Warnings.
Right, I'd forgotten that pg_log_warning() will interpose "warning:".
Attached are two more-carefully-thought-out suggestions. The easy
way is to use pg_log_warning_detail(), which produces output like
pg_dump: warning: could not resolve dependency loop among these items:
pg_dump: detail: FUNCTION a_f (ID 216 OID 40532)
pg_dump: detail: CONSTRAINT a_pkey (ID 3466 OID 40531)
pg_dump: detail: POST-DATA BOUNDARY (ID 3612)
pg_dump: detail: TABLE DATA a (ID 3610 OID 40525)
pg_dump: detail: PRE-DATA BOUNDARY (ID 3611)
Alternatively, we could assemble the details by hand, as in the
second patch, producing
pg_dump: warning: could not resolve dependency loop among these items:
FUNCTION a_f (ID 216 OID 40532)
CONSTRAINT a_pkey (ID 3466 OID 40531)
POST-DATA BOUNDARY (ID 3612)
TABLE DATA a (ID 3610 OID 40525)
PRE-DATA BOUNDARY (ID 3611)
I'm not really sure which of these I like better. The first one
is a much simpler code change, and there is some value in labeling
the output like that. The second patch's output seems less cluttered,
but it's committing a modularity sin by embedding formatting knowledge
at the caller level. Thoughts?
Kirk...
"Imseih (AWS), Sami" <simseih@amazon.com> writes: > Some minor comments I have: > 1/ > + agginfo[i].aggfn.postponed_def = false; /* might get set during sort */ > This is probably not needed as it seems that we can only > get into this situation when function dependencies are tracked. That's just to keep from leaving an undefined field in the DumpableObject struct. You are very likely right that nothing would examine that field today, but that seems like a poor excuse for not initializing it. > 2/ > The following description makes more sense. > + * section. This is sufficient to handle cases where a function depends on > + * some constraints, as can happen if a BEGIN ATOMIC function > + * references a constraint directly. I did not think this was an improvement. For one thing, I doubt that BEGIN ATOMIC is essential to cause the problem. I didn't prove the point by making a test case, but a "RETURN expression" function could probably trip over it too by putting a sub-SELECT with GROUP BY into the expression. Also, your wording promises more about what cases we can handle than I think is justified. > 3/ > The docs in https://www.postgresql.org/docs/current/ddl-depend.html > should be updated. Right. In general, I thought that the new-style-SQL-functions patch was very slipshod about updating the docs, and omitted touching many places where it would be appropriate to mention the new style, or even flat-out replace examples with new syntax. I did something about this particular point, but perhaps someone would like to look around and work on that topic? regards, tom lane
Kirk Wolak <wolakk@gmail.com> writes: > On Sat, Jun 3, 2023 at 2:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm not really sure which of these I like better. The first one >> is a much simpler code change, and there is some value in labeling >> the output like that. The second patch's output seems less cluttered, >> but it's committing a modularity sin by embedding formatting knowledge >> at the caller level. Thoughts? > Honestly the double space in front of the strings with either the Original > version, > or the "detail:" version is great. > While I get the "Less Cluttered" version.. It "detaches" it a bit too much > from the lead in, for me. Done with pg_log_warning_detail. I ended up taking out the two spaces, as that still felt like a modularity violation. Also, although the extra space looks alright in English, I'm less sure about how it'd look in another language where "warning:" and "detail:" get translated to strings of other lengths. So the new output (before 016107478 fixed it) is pg_dump: warning: could not resolve dependency loop among these items: pg_dump: detail: FUNCTION a_f (ID 216 OID 40532) pg_dump: detail: CONSTRAINT a_pkey (ID 3466 OID 40531) pg_dump: detail: POST-DATA BOUNDARY (ID 3612) pg_dump: detail: TABLE DATA a (ID 3610 OID 40525) pg_dump: detail: PRE-DATA BOUNDARY (ID 3611) regards, tom lane
-- New database
------------------------------------
CREATE DATABASE ba;
------------------------------------
-- Connect
------------------------------------
-- Connect to new database...
------------------------------------
-- Setup schemas
------------------------------------
CREATE SCHEMA data; -- I don't have public running, so create a schema.
-- All of my UPSERT view and function plumbing is tucked away here:
CREATE SCHEMA types_plus;
------------------------------------
-- Define table
------------------------------------
DROP TABLE IF EXISTS data.test_event;
CREATE TABLE IF NOT EXISTS data.test_event (
id uuid NOT NULL DEFAULT NULL PRIMARY KEY,
ts_dts timestamp NOT NULL DEFAULT 'epoch',
who text NOT NULL DEFAULT NULL,
what text NOT NULL DEFAULT NULL
);
-- PK is created by default as test_event_pkey, used in ON CONFLICT later.
------------------------------------
-- Create view, get type for free
------------------------------------
CREATE VIEW types_plus.test_event_v1 AS
SELECT
id,
ts_dts,
who,
what
FROM data.test_event;
-- Create a function to accept an array of rows formatted as test_event_v1 for UPSERT into test_event.
DROP FUNCTION IF EXISTS types_plus.insert_test_event_v1 (types_plus.test_event_v1[]);
CREATE OR REPLACE FUNCTION types_plus.insert_test_event_v1 (data_in types_plus.test_event_v1[])
RETURNS void
LANGUAGE SQL
BEGIN ATOMIC
INSERT INTO data.test_event (
id,
ts_dts,
who,
what)
SELECT
rows_in.id,
rows_in.ts_dts,
rows_in.who,
rows_in.what
FROM unnest(data_in) as rows_in
ON CONFLICT ON CONSTRAINT test_event_pkey DO UPDATE SET
id = EXCLUDED.id,
ts_dts = EXCLUDED.ts_dts,
who = EXCLUDED.who,
what = EXCLUDED.what;
END;
pg_dump: FUNCTION insert_test_event_v1 (ID 224 OID 1061258)
pg_dump: CONSTRAINT test_event_pkey (ID 3441 OID 1061253)
pg_dump: POST-DATA BOUNDARY (ID 3584)
pg_dump: TABLE DATA test_event (ID 3582 OID 1061246)
pg_dump: PRE-DATA BOUNDARY (ID 3583)
1. CREATE TABLE foo
2. CREATE PK foo_pk and other constraints.
3. CREATE VIEW foo_v1 (I could use CREATE TYPE here, for my purposes, but prefer CREATE VIEW.)
4. CREATE FUNCTION insert_foo_v1 (foo_v1[])
Morris de Oryx <morrisdeoryx@gmail.com> writes: > Can anyone suggest a path forward for me with the upgrade to PG 15? Apply this patch: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ca9e79274938d8ede07d9990c2f6f5107553b524 or more likely, pester RDS to do so sooner than the next quarterly releases. regards, tom lane
Morris de Oryx <morrisdeoryx@gmail.com> writes:
> Can anyone suggest a path forward for me with the upgrade to PG 15?
Apply this patch:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ca9e79274938d8ede07d9990c2f6f5107553b524
or more likely, pester RDS to do so sooner than the next quarterly
releases.
regards, tom lane
Kirk Wolak <wolakk@gmail.com> writes:
.. to strings of other lengths. So the new output (before 016107478
fixed it) is
pg_dump: warning: could not resolve dependency loop among these items:
pg_dump: detail: FUNCTION a_f (ID 216 OID 40532)
pg_dump: detail: CONSTRAINT a_pkey (ID 3466 OID 40531)
pg_dump: detail: POST-DATA BOUNDARY (ID 3612)
pg_dump: detail: TABLE DATA a (ID 3610 OID 40525)
pg_dump: detail: PRE-DATA BOUNDARY (ID 3611)
regards, tom lane
On Sun, Jun 4, 2023 at 1:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:Kirk Wolak <wolakk@gmail.com> writes:
.. to strings of other lengths. So the new output (before 016107478
fixed it) is
pg_dump: warning: could not resolve dependency loop among these items:
pg_dump: detail: FUNCTION a_f (ID 216 OID 40532)
pg_dump: detail: CONSTRAINT a_pkey (ID 3466 OID 40531)
pg_dump: detail: POST-DATA BOUNDARY (ID 3612)
pg_dump: detail: TABLE DATA a (ID 3610 OID 40525)
pg_dump: detail: PRE-DATA BOUNDARY (ID 3611)
regards, tom lane+1
Another suggestion for AWS/RDS: Expose all of the logs in the upgrade tool chain. If I'd had all of the logs at the start of this, I'd have been able to track down the issue myself quite quickly. Setting up that simple case database took me less than an hour today. Without the logs, it's been impossible (until the RDS patch a month ago) and difficult (now) to get a sense of what's happening.Thank youOn Mon, Jun 5, 2023 at 5:19 PM Kirk Wolak <wolakk@gmail.com> wrote:On Sun, Jun 4, 2023 at 1:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:Kirk Wolak <wolakk@gmail.com> writes:
.. to strings of other lengths. So the new output (before 016107478
fixed it) is
pg_dump: warning: could not resolve dependency loop among these items:
pg_dump: detail: FUNCTION a_f (ID 216 OID 40532)
pg_dump: detail: CONSTRAINT a_pkey (ID 3466 OID 40531)
pg_dump: detail: POST-DATA BOUNDARY (ID 3612)
pg_dump: detail: TABLE DATA a (ID 3610 OID 40525)
pg_dump: detail: PRE-DATA BOUNDARY (ID 3611)
regards, tom lane+1
On 2023-Jun-13, Morris de Oryx wrote: > Quick follow-up: I've heard back from AWS regarding applying Tom Lane's > patch. Nope. RDS releases numbered versions, nothing else. Sounds like a reasonable policy to me. > As Postgres is now at 15.8/15.3 in the wild and on 15.7/15.3 on RDS, > I'm guessing that the patch won't be available until 14.9/15.4. > > Am I right in thinking that this patch will be integrated into 14.9/15.4, Yes. The commits got into Postgres on June 4th, and 14.8 and 15.3 where stamped on May 8th. So the fixes will be in 14.9 and 15.4 in August, per https://www.postgresql.org/developer/roadmap/ > if they are released? No "if" about this, unless everybody here is hit by ICBMs. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Reminds me to say a big thank you to everyone involved in and contributing to Postgres development for making error messages which are so good. For a programmer, error text is a primary UI. Most Postgres errors and log messages are clear and sufficient. Even when they're a bit obscure, they alway seem to be on topic, and enough to get you on the right track.I assume that we've all used programs and operating systems that emit more....runic...errors.On Mon, Jun 5, 2023 at 6:03 PM Morris de Oryx <morrisdeoryx@gmail.com> wrote:Another suggestion for AWS/RDS: Expose all of the logs in the upgrade tool chain. If I'd had all of the logs at the start of this, I'd have been able to track down the issue myself quite quickly. Setting up that simple case database took me less than an hour today. Without the logs, it's been impossible (until the RDS patch a month ago) and difficult (now) to get a sense of what's happening.Thank youOn Mon, Jun 5, 2023 at 5:19 PM Kirk Wolak <wolakk@gmail.com> wrote:On Sun, Jun 4, 2023 at 1:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:Kirk Wolak <wolakk@gmail.com> writes:
.. to strings of other lengths. So the new output (before 016107478
fixed it) is
pg_dump: warning: could not resolve dependency loop among these items:
pg_dump: detail: FUNCTION a_f (ID 216 OID 40532)
pg_dump: detail: CONSTRAINT a_pkey (ID 3466 OID 40531)
pg_dump: detail: POST-DATA BOUNDARY (ID 3612)
pg_dump: detail: TABLE DATA a (ID 3610 OID 40525)
pg_dump: detail: PRE-DATA BOUNDARY (ID 3611)
regards, tom lane+1