Обсуждение: Suggestion for --truncate-tables to pg_restore
Hi, I've had problems using pg_restore --data-only when restoring individual schemas (which contain data which has had bad things done to it). --clean does not work well because of dependent objects in other schemas. Patch to the docs attached (before I go and do any real coding.) Karl <kop@meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
Вложения
On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote: > I've had problems using pg_restore --data-only when > restoring individual schemas (which contain data which > has had bad things done to it). --clean does not work > well because of dependent objects in other schemas. Before doing any more work I want to report on the discussions that took place at the code sprint at Postgres Open in Chicago. Because I'm going to add in additional thoughts I've had and to avoid mis-representing anybody's opinion I'll not mention who said what. Feel free to step forward and claim Ingenious Ideas as your own. Likewise I apologize if lack of attribution makes it more difficult to discern (my) uninformed drivel from intelligent insight. ---- First, the problem: Begin with the following structure: CREATE TABLE schemaA.foo (id PRIMARY KEY, data INT); CREATE VIEW schemaB.bar AS SELECT * FROM schemaA.foo; Then, by accident, somebody does: UPDATE schemaA.foo SET data = data + (RANDOM() * 1000)::INT; So, you want to restore the data into schemaA.foo. But schemaA.foo has (bad) data in it that must first be removed. It would seem that using pg_restore --clean -n schemaA -t foo my_pg_dump_backup would solve the problem, it would drop schemaA.foo, recreate it, and then restore the data. But this does not work. schemaA.foo does not drop because it's got a dependent database object, schemaB.bar. Of course there are manual work-arounds. One of these is truncating schemaA.foo and then doing a pg_restore with --data-only. The manual work-arounds become increasingly burdensome as you need to restore more tables. The case that motivated me was an attempt to restore the data in an entire schema, one which contained a significant number of tables. So, the idea here is to be able to do a data-only restore, first truncating the data in the tables being restored to remove the existing corrupted data. The proposal is to add a --truncate-tables option to pg_restore. ---- There were some comments on syntax. I proposed to use -u as a short option. This was thought confusing, given it's use in other Unix command line programs (mysql). Since there's no obvious short option, forget it. Just have a long option. Another choice is to avoid introducing yet another option and instead overload --clean so that when doing a --data-only restore --clean truncates tables and otherwise --clean retains the existing behavior of dropping and re-creating the restored objects. (I tested pg_restore with 9.1 and when --data-only is used --clean is ignored, it does not even produce a warning. This is arguably a bug.) ---- More serious objections were raised regarding semantics. What if, instead, the initial structure looked like: CREATE TABLE schemaA.foo (id PRIMARY KEY, data INT); CREATE TABLE schemaB.bar (id INT CONSTRAINT "bar_on_foo" REFERENCES foo, moredata INT); With a case like this, in most real-world situations, you'd have to use pg_restore with --disable-triggers if you wanted to use --data-only and --truncate-tables. The possibility of foreign key referential integrity corruption is obvious. Aside: Unless you're restoring databases in their entirety the pg_restore --disable-triggers option makes it easy to introduce foreign key referential integrity corruption. In fact, since pg_restore does not wrap it's operations in one big transaction, it's easy to attempt restoration of a portion of a database, have part of the process succeed and part of it fail (due to either schema or data dependencies), and be left off worse than before you started. The pg_restore docs might benefit from a big fat warning regarding attempts to restore less than an entire database. So, the discussion went, pg_restore is just another application and introducing more options which could lead to corruption of referential integrity is a bad idea. But pg_restore should not be thought of as just another front-end. It should be thought of as a data recovery tool. Recovering some data and being left with referential integrity problems is better than having no data. This is true even if, due to different users owning different schemas and so forth, nobody knows exactly what might be broken. Yes, but we can do better. (The unstated sub-text being that we don't want to introduce an inferior feature which will then need to be supported forever.) How could we do better: Here I will record only the ideas related to restore, although there was some mention of dump as well. There has apparently been some discussion of writing a foreign data wrapper which would operate on a database dump. This might (in ways that are not immediately obvious to me) address this issue. The restore process could, based on what table data needs restoration, look at foreign key dependencies and produce a list of the tables which all must be restored into order to ensure foreign key referential integrity. In the case of restoration into a empty database the foreign key dependences must be calculated from the dump. (An "easy" way to do this would be to create all the database objects in some temporary place and query the system catalogs to produce the dependency graph.) In the case of restoration into an existing database the foreign key dependences should come from the database into which the data is to be restored. (This is necessary to capture dependences which may have been introduced after the dump was taken.) The above applies to data-only restoration. When restoring the database schema meta-information (object definition) a similar graph of database object dependences must be produced and used to determine what needs to be restored. But when doing a partial data-only restore there is more to data integrity than just foreign key referential integrity. Other constraints and triggers ensure other sorts of data integrity rules. It is not enough to leave triggers turned on when restoring data. Data not restored may validate against restored data in triggers fired only on manipulation of the un-restored table content. The only solution I can see is to also include in the computed set of tables which require restoration those tables having triggers which reference any of the restored data. Just how far should pg_restore go in attempting to preserve data integrity? ---- Two things are clear: The current table and schema oriented options for backing up and restoring portions of databases are flawed with respectto data integrity. Life is complicated. Where should I go from here? I am not now in a position to pursue anything more complicated than completing the code to add a --truncate-tables option to pg_restore. Should I finish this and send in a patch? Karl <kop@meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
On 09/21/2012 10:54:05 AM, Karl O. Pinc wrote: > On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote: > > > I've had problems using pg_restore --data-only when > > restoring individual schemas (which contain data which > > has had bad things done to it). --clean does not work > > well because of dependent objects in other schemas. Since there wasn't much more to do I've gone ahead and written the patch. Works for me. Against git master. Passes regression tests, but there's no regression tests for pg_restore so this does not say much. Since there's no regression tests I've not written one. Since this is a real patch for application I've given it a new name (it's not a v2). Truncate done right before COPY, since that's what the parallel restores do. Karl <kop@meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
Whoops. Do over. Sent the wrong file. On 09/23/2012 12:19:07 AM, Karl O. Pinc wrote: > On 09/21/2012 10:54:05 AM, Karl O. Pinc wrote: > > On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote: > > > > > I've had problems using pg_restore --data-only when > > > restoring individual schemas (which contain data which > > > has had bad things done to it). --clean does not work > > > well because of dependent objects in other schemas. > > Since there wasn't much more to do I've gone ahead > and written the patch. Works for me. > > Against git master. > Passes regression tests, but there's no regression > tests for pg_restore so this does not say much. > Since there's no regression tests I've not written one. > > Since this is a real patch for application I've given > it a new name (it's not a v2). > > Truncate done right before COPY, since that's what > the parallel restores do. Karl <kop@meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
Вложения
Attached is version 2. The sgml did not build. On 09/23/2012 12:24:27 AM, Karl O. Pinc wrote: > Whoops. Do over. Sent the wrong file. > > On 09/23/2012 12:19:07 AM, Karl O. Pinc wrote: > > On 09/21/2012 10:54:05 AM, Karl O. Pinc wrote: > > > On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote: > > > > > > > I've had problems using pg_restore --data-only when > > > > restoring individual schemas (which contain data which > > > > has had bad things done to it). --clean does not work > > > > well because of dependent objects in other schemas. > > > > Since there wasn't much more to do I've gone ahead > > and written the patch. Works for me. > > > > Against git master. > > Passes regression tests, but there's no regression > > tests for pg_restore so this does not say much. > > Since there's no regression tests I've not written one. > > > > Since this is a real patch for application I've given > > it a new name (it's not a v2). > > > > Truncate done right before COPY, since that's what > > the parallel restores do. > > > Karl <kop@meme.com> > Free Software: "You don't pay back, you pay forward." > -- Robert A. Heinlein > > ------quoted attachment------ > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > Karl <kop@meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
Вложения
Hi, Attached is version 3. The convention seems to be to leave the operator at the end of the line when breaking long lines, so do that. Add extra () -- make operator precedence explicit and have indentation reflect operator precedence. On 09/23/2012 08:52:07 PM, Karl O. Pinc wrote: > On 09/23/2012 12:24:27 AM, Karl O. Pinc wrote: > > On 09/23/2012 12:19:07 AM, Karl O. Pinc wrote: > > > On 09/21/2012 10:54:05 AM, Karl O. Pinc wrote: > > > > On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote: > > > > > > > > > I've had problems using pg_restore --data-only when > > > > > restoring individual schemas (which contain data which > > > > > has had bad things done to it). --clean does not work > > > > > well because of dependent objects in other schemas. > > > > > > Since there wasn't much more to do I've gone ahead > > > and written the patch. Works for me. > > > > > > Against git master. > > > Passes regression tests, but there's no regression > > > tests for pg_restore so this does not say much. > > > Since there's no regression tests I've not written one. > > > > > > Since this is a real patch for application I've given > > > it a new name (it's not a v2). > > > > > > Truncate done right before COPY, since that's what > > > the parallel restores do. Karl <kop@meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
Вложения
Hi, Attached is version 4. Version 3 no longer built against head. On 10/16/2012 09:48:06 PM, Karl O. Pinc wrote: > > On 09/23/2012 08:52:07 PM, Karl O. Pinc wrote: > > > On 09/23/2012 12:24:27 AM, Karl O. Pinc wrote: > > > > On 09/23/2012 12:19:07 AM, Karl O. Pinc wrote: > > > > On 09/21/2012 10:54:05 AM, Karl O. Pinc wrote: > > > > > On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote: > > > > > > > > > > > I've had problems using pg_restore --data-only when > > > > > > restoring individual schemas (which contain data which > > > > > > has had bad things done to it). --clean does not work > > > > > > well because of dependent objects in other schemas. > > > > > > > > Since there wasn't much more to do I've gone ahead > > > > and written the patch. Works for me. > > > > > > > > Against git master. > > > > Passes regression tests, but there's no regression > > > > tests for pg_restore so this does not say much. > > > > Since there's no regression tests I've not written one. > > > > > > > > Since this is a real patch for application I've given > > > > it a new name (it's not a v2). > > > > > > > > Truncate done right before COPY, since that's what > > > > the parallel restores do. > > > Karl <kop@meme.com> > Free Software: "You don't pay back, you pay forward." > -- Robert A. Heinlein > > ------quoted attachment------ > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > Karl <kop@meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
Вложения
Hi Karl, I signed on to review this patch for the current CF. Most of the background for the patch seems to be in the message below, so I'm going to respond to this one first. On Fri, Sep 21, 2012 at 8:54 AM, Karl O. Pinc <kop@meme.com> wrote: > On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote: > >> I've had problems using pg_restore --data-only when >> restoring individual schemas (which contain data which >> has had bad things done to it). --clean does not work >> well because of dependent objects in other schemas. OK. > ---- > > First, the problem: > > Begin with the following structure: > > CREATE TABLE schemaA.foo (id PRIMARY KEY, data INT); > > CREATE VIEW schemaB.bar AS SELECT * FROM schemaA.foo; > > Then, by accident, somebody does: > > UPDATE schemaA.foo SET data = data + (RANDOM() * 1000)::INT; > > So, you want to restore the data into schemaA.foo. > But schemaA.foo has (bad) data in it that must first > be removed. It would seem that using > > pg_restore --clean -n schemaA -t foo my_pg_dump_backup > > would solve the problem, it would drop schemaA.foo, > recreate it, and then restore the data. But this does > not work. schemaA.foo does not drop because it's > got a dependent database object, schemaB.bar. Right. > Of course there are manual work-arounds. One of these > is truncating schemaA.foo and then doing a pg_restore > with --data-only. Simply doing TRUNCATE manually was the first thought that occurred to me when I saw your example. > The manual work-arounds become > increasingly burdensome as you need to restore more > tables. The case that motivated me was an attempt > to restore the data in an entire schema, one which > contained a significant number of tables. TBH, I didn't find the example above particularly compelling for demonstrating the need for this feature. If you've just got one table with dependent views which needs to be restored, it's pretty easy to manually TRUNCATE and have pg_restore --data-only reload the table. (And easy enough to combine the truncate and restore into a single transaction in case anything goes wrong, if need be.) But I'm willing to grant that this proposed feature is potentially as useful as existing restore-jiggering options like --disable-triggers. And I guess I could see that if you're really stuck having to perform a --data-only restore of many tables, this feature could come in handy. > So, the idea here is to be able to do a data-only > restore, first truncating the data in the tables > being restored to remove the existing corrupted data. > > The proposal is to add a --truncate-tables option > to pg_restore. > > ---- > > There were some comments on syntax. > > I proposed to use -u as a short option. This was > thought confusing, given it's use in other > Unix command line programs (mysql). Since there's > no obvious short option, forget it. Just have > a long option. Agreed. > Another choice is to avoid introducing yet another > option and instead overload --clean so that when > doing a --data-only restore --clean truncates tables > and otherwise --clean retains the existing behavior of > dropping and re-creating the restored objects. I like the --truncate-tables flag idea better than overloading --clean, since it makes the purpose immediately apparent. > (I tested pg_restore with 9.1 and when --data-only is > used --clean is ignored, it does not even produce a warning. > This is arguably a bug.) +1 for having pg_restore bail out with an error if the user specifies --data-only --clean. By the same token, --clean and --truncate-tables together should also raise a "not allowed" error. > ---- > > More serious objections were raised regarding semantics. > > What if, instead, the initial structure looked like: > > CREATE TABLE schemaA.foo > (id PRIMARY KEY, data INT); > > CREATE TABLE schemaB.bar > (id INT CONSTRAINT "bar_on_foo" REFERENCES foo > , moredata INT); > > With a case like this, in most real-world situations, you'd > have to use pg_restore with --disable-triggers if you wanted > to use --data-only and --truncate-tables. The possibility of > foreign key referential integrity corruption is obvious. Why would --disable-triggers be used in this example? I don't think you could use --truncate-tables to restore only table "foo", because you would get: ERROR: cannot truncate a table referenced in a foreign key constraint (At least, I think so, not having tried with the actual patch.) You could use --disable-triggers when restoring "bar". Though if you're just enabling that option for performance purposes, and are unable to guarantee that the restore will leave the tables in a consistent state, well then it seems like you shouldn't use the option. > Aside: Unless you're restoring databases in their entirety > the pg_restore --disable-triggers option makes it easy to > introduce foreign key referential integrity corruption. Yup, and I think the docs could do more to warn users about --disable-triggers in particular. And I see you've submitted a separate patch along those lines. > In fact, since pg_restore does not wrap it's operations > in one big transaction, it's easy to attempt restoration > of a portion of a database, have part of the process > succeed and part of it fail (due to either schema > or data dependencies), and be left off worse > than before you started. That's what the --single-transaction option is for. > So, the discussion went, pg_restore is just another > application and introducing more options > which could lead to corruption of referential integrity is > a bad idea. I do agree that increasing the ways for pg_restore to be a foot-gun is a Bad Idea. > But pg_restore should not be thought of as just another > front-end. It should be thought of as a data recovery > tool. I don't totally agree that charter for pg_restore should be a "data recovery tool" (i.e. general purpose), but for the sake of this patch we can leave that aside. > Recovering some data and being left with referential > integrity problems is better than having no data. Well, this is really a judgment call, and I have a hunch that many admins would actually choose "none of the above". And I think this point gets to the crux of whether the --truncate-tables option will be useful as-is. For your first example, --truncate-tables seems to have some use, so that the admin isn't forced to recreate various views which may be dependent on the table. (Though it's not too difficult to work around this case today.) For the second example involving FKs, I'm a little bit more hesitant about endorsing the use of --truncate-tables combined with --disable-triggers to potentially allow bogus FKs. I know this is possible to some extent today using the --disable-triggers option, but it makes me nervous to be adding a mode to pg_restore if it's primarily designed to work together with --disable-triggers as a larger foot-gun. > This > is true even if, due to different users owning different > schemas and so forth, nobody knows exactly what > might be broken. > > Yes, but we can do better. (The unstated sub-text being that > we don't want to introduce an inferior feature which > will then need to be supported forever.) > > How could we do better: > > Here I will record only the ideas related to restore, > although there was some mention of dump as well. > > There has apparently been some discussion of writing > a foreign data wrapper which would operate on a database > dump. This might (in ways that are not immediately > obvious to me) address this issue. That's an interesting idea. If you could SELECT directly out of a dump file via FDW, you could handle the restore process purely in SQL. But not directly relevant to this patch. > The restore process could, based on what table data needs > restoration, look at foreign key dependencies and produce a > list of the tables which all must be restored into order to > ensure foreign key referential integrity. One mode of operation of pg_restore is outputting to a file or pipe for subsequent processing, which of course wouldn't work with this idea of having the restore be dependent on the target database structure. > [snip more discussion of pg_restore possibly reordering objects and ensuring integrity] For the purposes of actually completing a review of the patch in question, I'm going to avoid further blue-sky speculation here. Just a few initial comments about the doc portion of the patch: + This option is only relevant when performing a data-only If we are going to restrict the --truncate-tables option to be used with --data-only, "only allowed" may be more clear than "only relevant". + <para> + The <option>--disable-triggers</option> will almost always + always need to be used in conjunction with this option to + disable check constraints on foreign keys. + </para> For the first example you posted, of a view dependent on a table which needed to be restored, this advice would not be accurate. IMO it's a little dangerous advising users to "almost always" use a foot-gun like --disable-triggers. I'm out of time for today, and I haven't had a chance to actually try out the patch, but I wanted to send off my thoughts so far. I should have a chance for another look later this week. Josh
Hi Josh, On 11/20/2012 11:53:23 PM, Josh Kupershmidt wrote: > Hi Karl, > I signed on to review this patch for the current CF. I noticed. Thanks very much. > On Fri, Sep 21, 2012 at 8:54 AM, Karl O. Pinc <kop@meme.com> wrote: > > On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote: > > First, the problem: > > > > Begin with the following structure: > > > > CREATE TABLE schemaA.foo (id PRIMARY KEY, data INT); > > > > CREATE VIEW schemaB.bar AS SELECT * FROM schemaA.foo; > > > > Then, by accident, somebody does: > > > > UPDATE schemaA.foo SET data = data + (RANDOM() * 1000)::INT; > > > > So, you want to restore the data into schemaA.foo. > > But schemaA.foo has (bad) data in it that must first > > be removed. It would seem that using > > > > pg_restore --clean -n schemaA -t foo my_pg_dump_backup > > > > would solve the problem, it would drop schemaA.foo, > > recreate it, and then restore the data. But this does > > not work. schemaA.foo does not drop because it's > > got a dependent database object, schemaB.bar. > TBH, I didn't find the example above particularly compelling for > demonstrating the need for this feature. If you've just got one table > with dependent views which needs to be restored, it's pretty easy to > manually TRUNCATE and have pg_restore --data-only reload the table. > (And easy enough to combine the truncate and restore into a single > transaction in case anything goes wrong, if need be.) I was not clear in stating the problem. (But see below regarding foreign keys.) The dependent view was an example, but there can also be REFERENCES constraints and trigger related constraints involving other tables in other schemas. The easiest work-around I can think of here is destroying all the triggers and constraints, either everything in the whole db or doing some work to be more selective, truncating all the schema's tables. doing a data-only restore of the schema, and then pg_restore --data-only, and then re-creating all the triggers and constraints. > > But I'm willing to grant that this proposed feature is potentially as > useful as existing restore-jiggering options like --disable-triggers. > And I guess I could see that if you're really stuck having to perform > a --data-only restore of many tables, this feature could come in > handy. I think so. See above. > > > So, the idea here is to be able to do a data-only > > restore, first truncating the data in the tables > > being restored to remove the existing corrupted data. > > > > The proposal is to add a --truncate-tables option > > to pg_restore. > > > > ---- > > (I tested pg_restore with 9.1 and when --data-only is > > used --clean is ignored, it does not even produce a warning. > > This is arguably a bug.) > > +1 for having pg_restore bail out with an error if the user specifies > --data-only --clean. By the same token, --clean and --truncate-tables > together should also raise a "not allowed" error. OT: After looking at the code I found a number of "conflicting" option combinations are not tested for or rejected. I can't recall what they are now. The right way to fix these would be to send in a separate patch for each "conflict" all attached to one email/commitfest item? Or one patch that just gets adjusted until it's right? > > > ---- > > > > More serious objections were raised regarding semantics. > > > > What if, instead, the initial structure looked like: > > > > CREATE TABLE schemaA.foo > > (id PRIMARY KEY, data INT); > > > > CREATE TABLE schemaB.bar > > (id INT CONSTRAINT "bar_on_foo" REFERENCES foo > > , moredata INT); > > > > With a case like this, in most real-world situations, you'd > > have to use pg_restore with --disable-triggers if you wanted > > to use --data-only and --truncate-tables. The possibility of > > foreign key referential integrity corruption is obvious. > > Why would --disable-triggers be used in this example? I don't think > you could use --truncate-tables to restore only table "foo", because > you would get: > > ERROR: cannot truncate a table referenced in a foreign key > constraint > > (At least, I think so, not having tried with the actual patch.) You > could use --disable-triggers when restoring "bar". I tried it and you're quite right. (I thought I'd tried this before, but clearly I did not -- proving the utility of the review process. :-) My assumption was that since triggers were turned off that constraints, being triggers, would be off as well and therefore tables with foreign keys could be truncated. Obviously not, since the I get the above error. I just tried it. --disable-triggers does not disable constraints. > > > Recovering some data and being left with referential > > integrity problems is better than having no data. > > Well, this is really a judgment call, and I have a hunch that many > admins would actually choose "none of the above". And I think this > point gets to the crux of whether the --truncate-tables option will > be > useful as-is. Well, yes. "None of the above" is best. :) In my case I had munged the data in the one important schema and wanted to restore. The setting is an academic one and a lot of cruft gets left laying around in the other schemas, some of which consist entirely of cruft. Responsibility for the non-main schema content is distributed. In the interest of getting things working again quickly I wished to restore the important schema quickly, and didn't want to have to sort through the cruft ahead of time. In other words, I was entirely willing to break things and pick up the pieces afterwords. > > For your first example, --truncate-tables seems to have some use, so > that the admin isn't forced to recreate various views which may be > dependent on the table. (Though it's not too difficult to work around > this case today.) As an aside: I never have an issue with this, having planned ahead. I'm curious what the not-too-difficult work-around is that you have in mind. I'm not coming up with a tidy way to, e.g, pull a lot of views out of a dump. > For the second example involving FKs, I'm a little bit more hesitant > about endorsing the use of --truncate-tables combined with > --disable-triggers to potentially allow bogus FKs. I know this is > possible to some extent today using the --disable-triggers option, > but > it makes me nervous to be adding a mode to pg_restore if it's > primarily designed to work together with --disable-triggers as a > larger foot-gun. This is the crux of the issue, and where I was thinking of taking this patch. I very often am of the mindset that foreign keys are no more or less special than other more complex data integrity rules enforced with triggers. (I suppose others might say the same regards to integrity enforced at the application level.) This naturally inclines me to think that one more way, beyond --disable-triggers, to break integrity is no big deal. But I quite see your point. Is it possible to get resolution on this issue before either of us do any more work in the direction of foreign keys? An additional foot-gun, --disable-constraints, seems like the natural progression in this direction. Constraints, unlike triggers (?), can, in theory, be fired at any time to check data content so perhaps providing a way to test existing constraints against db content would be a way to mitigate the foot-gun-ness and drive an after-the-fact data cleanup process. --disable-constraints seems like an entirely separate patch so maybe we can stop the FK related issue right here? (Although I would appreciate feedback regards whether such an option might be accepted, at minimum I'd like to get this out of my brain.) > > Just a few initial comments about the doc portion of the patch: > > + This option is only relevant when performing a data-only > > If we are going to restrict the --truncate-tables option to be used > with --data-only, "only allowed" may be more clear than "only > relevant". Make sense to me. My intent here was to use the language used elsewhere in the docs but I'm happy to go in a better direction. (And perhaps later submit more patches to move other parts of the docs in that direction.) > > + <para> > + The <option>--disable-triggers</option> will almost always > + always need to be used in conjunction with this option to > + disable check constraints on foreign keys. > + </para> > > For the first example you posted, of a view dependent on a table > which > needed to be restored, this advice would not be accurate. IMO it's a > little dangerous advising users to "almost always" use a foot-gun > like > --disable-triggers. Sorta brings us back to the sticking point above.... And sounds like, at the moment at least, this paragraph can be deleted. > I'm out of time for today, and I haven't had a chance to actually try > out the patch, but I wanted to send off my thoughts so far. I should > have a chance for another look later this week. Thanks for the work. I'll hold off on submitting any revisions to the patch for the moment. One thing you'll want to pay attention to is the point in the restore process at which the truncation is done. In the current version each table is truncated immediately before being copied. It might or might not be better to do all the truncation up-front, in the fashion of --clean. I would appreciate some guidance on this. In case it's helpful I'm attaching two files I used to test the foreign key issue. Regards, Karl <kop@meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
Вложения
On Wed, Nov 21, 2012 at 5:48 AM, Karl O. Pinc <kop@meme.com> wrote: >> On Fri, Sep 21, 2012 at 8:54 AM, Karl O. Pinc <kop@meme.com> wrote: >> > On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote: > OT: > After looking at the code I found a number of "conflicting" > option combinations are not tested for or rejected. I can't > recall what they are now. The right way to fix these would be > to send in a separate patch for each "conflict" all attached > to one email/commitfest item? Or one patch that just gets > adjusted until it's right? Typically I think it's easiest for the reviewer+committer to consider a bunch of such similar changes altogether in one patch, rather than in a handful of separate patches, though that's just my own preference. >> > ---- >> > >> > More serious objections were raised regarding semantics. >> > >> > What if, instead, the initial structure looked like: >> > >> > CREATE TABLE schemaA.foo >> > (id PRIMARY KEY, data INT); >> > >> > CREATE TABLE schemaB.bar >> > (id INT CONSTRAINT "bar_on_foo" REFERENCES foo >> > , moredata INT); >> > >> > With a case like this, in most real-world situations, you'd >> > have to use pg_restore with --disable-triggers if you wanted >> > to use --data-only and --truncate-tables. The possibility of >> > foreign key referential integrity corruption is obvious. >> >> Why would --disable-triggers be used in this example? I don't think >> you could use --truncate-tables to restore only table "foo", because >> you would get: >> >> ERROR: cannot truncate a table referenced in a foreign key >> constraint >> >> (At least, I think so, not having tried with the actual patch.) You >> could use --disable-triggers when restoring "bar". > > I tried it and you're quite right. (I thought I'd tried this > before, but clearly I did not -- proving the utility of the review > process. :-) My assumption was that since triggers > were turned off that constraints, being triggers, would be off as > well and therefore tables with foreign keys could be truncated. > Obviously not, since the I get the above error. > > I just tried it. --disable-triggers does not disable constraints. Just to be clear, I believe the problem in this example is that --disable-triggers does not disable any foreign key constraints of other tables when you are restoring a single table. So with: pg_restore -1 --truncate-tables --disable-triggers --table=foo \ --data-only my.dump ... you would get the complaint ERROR: cannot truncate a table referenced in a foreign key constraint which is talking about bar's referencing foo, and there was no ALTER TABLE bar DISABLE TRIGGER ALL done, since "bar" isn't being restored. I don't have a quibble with this existing behavior -- you are instructing pg_restore to only mess with "bar", after all. See below, though, for a comparison of --clean and --truncate-tables when restoring "foo" and "bar" together. >> For your first example, --truncate-tables seems to have some use, so >> that the admin isn't forced to recreate various views which may be >> dependent on the table. (Though it's not too difficult to work around >> this case today.) > > As an aside: I never have an issue with this, having planned ahead. > I'm curious what the not-too-difficult work-around is that you have > in mind. I'm not coming up with a tidy way to, e.g, pull a lot > of views out of a dump. Well, for the first example, there was one table and only one view which depended on the table, so manually editing the list file like so: pg_restore --list --file=my.dump > output.list # manually edit file "output.list", select only entries pertaining # to thetable and dependent view pg_restore -1 --clean --use-list=output.list ... isn't too arduous, though it would become so if you had more dependent views to worry about. I'm willing to count this use-case as a usability win for --truncate-tables, since with that option the restore can be boiled down to a short and sweet: pg_restore -1 --data-only --truncate-tables --table=mytable my.dump ... Though note this may not prove practical for large tables, since --data-only leaves constraints and indexes intact during the restore, and can be massively slower compared to adding the constraints only after the data has been COPYed in, as pg_restore otherwise does. >> For the second example involving FKs, I'm a little bit more hesitant >> about endorsing the use of --truncate-tables combined with >> --disable-triggers to potentially allow bogus FKs. I know this is >> possible to some extent today using the --disable-triggers option, >> but >> it makes me nervous to be adding a mode to pg_restore if it's >> primarily designed to work together with --disable-triggers as a >> larger foot-gun. > > This is the crux of the issue, and where I was thinking of > taking this patch. I very often am of the mindset that > foreign keys are no more or less special than other > more complex data integrity rules enforced with triggers. > (I suppose others might say the same regards to integrity > enforced at the application level.) This naturally > inclines me to think that one more way, beyond > --disable-triggers, to break integrity is no big deal. > > But I quite see your point. Is it possible to get > resolution on this issue before either of us do any > more work in the direction of foreign keys? I think the patch has some promise with at least one use-case (view(s) dependent on table which needs to be reloaded, as discussed above). With the other use-case we have been discussing, of reloading a table referenced by other table(s)'s FKs, whether --truncate-tables is helpful is less clear to me, at least in the patch's current state. (See also bottom.) > An additional foot-gun, --disable-constraints, > seems like the natural progression in this direction. > Constraints, unlike triggers (?), can, in theory, > be fired at any time to check data content so perhaps > providing a way to test existing constraints against > db content would be a way to mitigate the foot-gun-ness > and drive an after-the-fact data cleanup process. > > --disable-constraints seems like an entirely separate > patch so maybe we can stop the FK related issue right > here? (Although I would appreciate feedback regards > whether such an option might be accepted, at minimum > I'd like to get this out of my brain.) I'm not sure I follow exactly how you envision --disable-constraints would work, but it does seem a separate issue from the --truncate-tables option at hand. > One thing you'll want to pay attention to is the point > in the restore process at which the truncation is done. > In the current version each table is truncated immediately > before being copied. It might or might not be better to do > all the truncation up-front, in the fashion of --clean. > I would appreciate some guidance on this. IMO --truncate-tables is, at a minimum, going to need to truncate tables up-front and in the appropriate order to avoid the annoying "ERROR: cannot truncate a table referenced in a foreign key constraint", at least when avoiding that error is possible. Let's go back to your example of two tables, with "bar" referencing "foo". This existing --clean functionality works fine: # edit list to only include lines mentioning "foo" and "bar" pg_restore -1 --clean --use-list=output.list ... But this won't work, since pg_restore attempts to truncate and restore foo separately from bar: # edit list to only include lines mentioning "foo" and "bar" pg_restore --truncate-tables --data-only -1 --use-list=output.list... i.e. will run into "ERROR: cannot truncate a table referenced in a foreign key constraint". > In case it's helpful I'm attaching two files > I used to test the foreign key issue. Thanks, I do appreciate seeing testcases scripts like this, since it can neatly demonstrate the intended use-case of the feature, and help bring to light anything that's missing. I tried your testcase, and got: pg_restore: [archiver (db)] could not execute query: ERROR: cannot truncate a table referenced in a foreign key constraint DETAIL: Table "bar" references "foo". as discussed above. If you'd like to advertise this feature as being handy for reloading a table referenced by other FKs, I'd be interested to see a testcase demonstrating this use, along with any changes to the patch (e.g. moving TRUNCATEs to the start) which might be needed. Josh
On Wed, Nov 21, 2012 at 12:53 AM, Josh Kupershmidt <schmiddy@gmail.com> wrote: > TBH, I didn't find the example above particularly compelling for > demonstrating the need for this feature. If you've just got one table > with dependent views which needs to be restored, it's pretty easy to > manually TRUNCATE and have pg_restore --data-only reload the table. > (And easy enough to combine the truncate and restore into a single > transaction in case anything goes wrong, if need be.) > > But I'm willing to grant that this proposed feature is potentially as > useful as existing restore-jiggering options like --disable-triggers. > And I guess I could see that if you're really stuck having to perform > a --data-only restore of many tables, this feature could come in > handy. I think I would come down on the other side of this. We've never really been able to get --clean work properly in all scenarios, and it seems likely that a similar fate will befall this option. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11/26/2012 12:06:56 PM, Robert Haas wrote: > On Wed, Nov 21, 2012 at 12:53 AM, Josh Kupershmidt > <schmiddy@gmail.com> wrote: > > TBH, I didn't find the example above particularly compelling for > > demonstrating the need for this feature. If you've just got one > table > > with dependent views which needs to be restored, it's pretty easy > to > > manually TRUNCATE and have pg_restore --data-only reload the table. > > (And easy enough to combine the truncate and restore into a single > > transaction in case anything goes wrong, if need be.) > > > > But I'm willing to grant that this proposed feature is potentially > as > > useful as existing restore-jiggering options like > --disable-triggers. > > And I guess I could see that if you're really stuck having to > perform > > a --data-only restore of many tables, this feature could come in > > handy. > > I think I would come down on the other side of this. We've never > really been able to get --clean work properly in all scenarios, and > it > seems likely that a similar fate will befall this option. Where I would like to go with this is to first introduce, as a new patch, an ALTER TABLE option to disable a constraint. Something like ALTER TABLE foo UNVALIDATE CONSTRAINT "constraintname"; This would mark the constraint NOT VALID, as if the constraint were created with the NOT VALID option. After a constraint is UNVALIDATEd the existing ALTER TABLE foo VALIDATE CONSTRAINT "constraintname"; feature would turn the constraint on and check the data. With UNVALIDATE CONSTRAINT, pg_restore could first turn off all the constraints concerning tables to be restored, truncate the tables, restore the data, turn the constraints back on and re-validate the constraints. No need to worry about ordering based on a FK referential dependency graph or loops in such a graph (due to DEFERRABLE INITIALLY DEFERRED). This approach would allow the content of a table or tables to be restored regardless of dependent objects or FK references and preserve FK referential integrity. Right? I need some guidance here from someone who knows more than I do. There would likely need to be a pg_restore option like --disable-constraints to invoke this functionality, but that can be worked out later. Likewise, I see an update and a delete trigger in pg_triggers associated with the referenced table in REFERENCES constraints, but no trigger for truncate. So making a constraint NOT VALID may not be as easy as it seems. I don't know what the problems are with --clean but I would like to know if this appears a promising approach. If so I can pursue it, although I make no promises. (I sent in the --disable-triggers patch because it seemed easy and I'm not sure where a larger project fits into my life.) Regards, Karl <kop@meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein P.S. An outstanding question regards --truncate-tables is whether it should drop indexes before truncate and re-create them after restore. Sounds like it should.
On Mon, Nov 26, 2012 at 4:51 PM, Karl O. Pinc <kop@meme.com> wrote: > Where I would like to go with this is to first introduce, > as a new patch, an ALTER TABLE option to disable a > constraint. Something like > > ALTER TABLE foo UNVALIDATE CONSTRAINT "constraintname"; This doesn't really make sense, because constraints that are not validated are still enforced for new rows. This thus wouldn't save anything performance-wise. We should perhaps have two more states: not enforced but blindly assumed true, and not enforced and not assumed true either. But currently, we don't. > I don't know what the problems are with --clean > but I would like to know if this appears > a promising approach. If so I can pursue it, > although I make no promises. (I sent in > the --disable-triggers patch because it seemed > easy and I'm not sure where a larger project fits > into my life.) I'm not really sure what the issues were any more; but I think they may have had to do with dependencies between different objects messing things up, which I think is likely to be a problem for this proposal as well. > P.S. An outstanding question regards --truncate-tables > is whether it should drop indexes before truncate > and re-create them after restore. Sounds like it should. Well, that would improve performance, but it also makes the behavior of object significantly different from what one might expect from the name. One of the problems here is that there seem to be a number of slightly-different things that one might want to do, and it's not exactly clear what all of them are, or whether a reasonable number of options can cater to all of them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Nov 26, 2012 at 3:42 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Nov 26, 2012 at 4:51 PM, Karl O. Pinc <kop@meme.com> wrote: >> P.S. An outstanding question regards --truncate-tables >> is whether it should drop indexes before truncate >> and re-create them after restore. Sounds like it should. > > Well, that would improve performance, but it also makes the behavior > of object significantly different from what one might expect from the > name. One of the problems here is that there seem to be a number of > slightly-different things that one might want to do, and it's not > exactly clear what all of them are, or whether a reasonable number of > options can cater to all of them. Another problem: attempting to drop a unique constraint or primary key (if we're counting these as indexes to be dropped and recreated, which they should be if the goal is reasonable restore performance) which is referenced by another table's foreign key will cause: ERROR: cannot drop constraint xxx on table yyy because other objectsdepend on it and as discussed upthread, it would be impolite for pg_restore to presume it should monkey with dropping+recreating other tables' constraints to work around this problem, not to mention impossible when pg_restore is not connected to the target database. It is a common administrative task to selectively restore some existing tables' contents from a backup, and IIRC was the impetus for this patch. Instead of adding a bunch of options to pg_restore, perhaps a separate tool specific to this task would be the way to go. It could handle the minutiae of truncating, dropping and recreating constraints and indexes of the target tables, and dealing with FKs sensibly, without worrying about conflicts with existing pg_restore options and behavior. Josh
On 11/26/2012 08:45:08 PM, Josh Kupershmidt wrote: > On Mon, Nov 26, 2012 at 3:42 PM, Robert Haas <robertmhaas@gmail.com> > wrote: > > On Mon, Nov 26, 2012 at 4:51 PM, Karl O. Pinc <kop@meme.com> wrote: > >> P.S. An outstanding question regards --truncate-tables > >> is whether it should drop indexes before truncate > >> and re-create them after restore. Sounds like it should. > > > > Well, that would improve performance, but it also makes the > behavior > > of object significantly different from what one might expect from > the > > name. One of the problems here is that there seem to be a number > of > > slightly-different things that one might want to do, and it's not > > exactly clear what all of them are, or whether a reasonable number > of > > options can cater to all of them. > > Another problem: attempting to drop a unique constraint or primary > key > (if we're counting these as indexes to be dropped and recreated, > which > they should be if the goal is reasonable restore performance) which > is > referenced by another table's foreign key will cause: > ERROR: cannot drop constraint xxx on table yyy > because other objects depend on it > > and as discussed upthread, it would be impolite for pg_restore to > presume it should monkey with dropping+recreating other tables' > constraints to work around this problem, not to mention impossible > when pg_restore is not connected to the target database. I'm thinking impossible because it's impossible to know what the existing FKs are without a db connection. Impossible is a problem. You may have another reason why it's impossible. > It is a common administrative task to selectively restore some > existing tables' contents from a backup, and IIRC was the impetus for > this patch. Yes. (And aside from listing tables individually it'd be nice to restore tables per schema.) It's also a bit surprising that restoring table content is so hard/unsupported, given a db of more than minimal complexity. > Instead of adding a bunch of options to pg_restore, > perhaps a separate tool specific to this task would be the way to go. > It could handle the minutiae of truncating, dropping and recreating > constraints and indexes of the target tables, and dealing with FKs > sensibly, without worrying about conflicts with existing pg_restore > options and behavior. Per above, the tool would then either require a db connection or at least a dump which contains the system catalogs. I'm afraid I don't have a clear picture of what such a tool would look like, if it does not look a lot like pg_restore. I would like to have such a tool. I'm not certain how much I'd be able to contribute toward making one. Meanwhile it sounds like the --truncate-tables patch is looking less and less desirable. I'm ready for rejection, but will soldier on in the interest of not wasting other people work on this, if given direction to move forward. Regards, Karl <kop@meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
On 11/26/2012 09:30:48 PM, Karl O. Pinc wrote: > On 11/26/2012 08:45:08 PM, Josh Kupershmidt wrote: > > It is a common administrative task to selectively restore some > > existing tables' contents from a backup, and IIRC was the impetus > for > > this patch. > > Yes. (And aside from listing tables individually it'd be nice > to restore tables per schema.) As long as I'm daydreaming it'd be nice to be able to restore a table, data and schema, and have available the various combinations of: new table name, different owner, different schema, different db. Without having to edit a file by hand. Of course I've not done the brain work involved in figuring out just what this would mean in terms of related objects like triggers, constraints, indexes and so forth. But then who doesn't want a pony? :-) Regards, Regards, Karl <kop@meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
Sorry for the delay in following up here. On Mon, Nov 26, 2012 at 8:30 PM, Karl O. Pinc <kop@meme.com> wrote: > On 11/26/2012 08:45:08 PM, Josh Kupershmidt wrote: >> On Mon, Nov 26, 2012 at 3:42 PM, Robert Haas <robertmhaas@gmail.com> >> wrote: >> > On Mon, Nov 26, 2012 at 4:51 PM, Karl O. Pinc <kop@meme.com> wrote: >> >> P.S. An outstanding question regards --truncate-tables >> >> is whether it should drop indexes before truncate >> >> and re-create them after restore. Sounds like it should. >> > >> > Well, that would improve performance, but it also makes the >> behavior >> > of object significantly different from what one might expect from >> the >> > name. One of the problems here is that there seem to be a number >> of >> > slightly-different things that one might want to do, and it's not >> > exactly clear what all of them are, or whether a reasonable number >> of >> > options can cater to all of them. >> >> Another problem: attempting to drop a unique constraint or primary >> key >> (if we're counting these as indexes to be dropped and recreated, >> which >> they should be if the goal is reasonable restore performance) which >> is >> referenced by another table's foreign key will cause: >> ERROR: cannot drop constraint xxx on table yyy >> because other objects depend on it >> >> and as discussed upthread, it would be impolite for pg_restore to >> presume it should monkey with dropping+recreating other tables' >> constraints to work around this problem, not to mention impossible >> when pg_restore is not connected to the target database. > > I'm thinking impossible because it's impossible to know > what the existing FKs are without a db connection. Impossible is > a problem. You may have another reason why it's impossible. Yes, that's what I meant. > Meanwhile it sounds like the --truncate-tables patch > is looking less and less desirable. I'm ready for > rejection, but will soldier on in the interest of > not wasting other people work on this, if given > direction to move forward. Well, as far as I was able to tell, the use-case where this patch worked without trouble was limited to restoring a table, or schema with table(s), that:a.) has some view(s) dependent on itb.) has no other tables with FK references to it, so that we don'trun into: ERROR: cannot truncate a table referenced in a foreign key constraintc.) is not so large that it takes forever for datato be restored with indexes and constraints left intactd.) and whose admin does not want to use --clean plus a list-file which limits pg_restore to the table and its views I was initially hoping that the patch would be more useful for restoring a table with FKs pointing to it, but it seems the only reliable way to do this kind of selective restore with pg_restore is with --clean and editing the list-file. Editing the list-file is certainly tedious and prone to manual error, but I'm not sure this particular patch has a wide enough use-case to alleviate that pain significantly. Josh
On 12/04/2012 09:26:47 PM, Josh Kupershmidt wrote: > Sorry for the delay in following up here. No problem at all. > Well, as far as I was able to tell, the use-case where this patch > worked without trouble was limited to restoring a table, or schema > with table(s), that: > a.) has some view(s) dependent on it > b.) has no other tables with FK references to it, so that we don't > run into: > ERROR: cannot truncate a table referenced in a foreign key > constraint > c.) is not so large that it takes forever for data to be restored > with indexes and constraints left intact > d.) and whose admin does not want to use --clean plus a list-file > which limits pg_restore to the table and its views > > I was initially hoping that the patch would be more useful for > restoring a table with FKs pointing to it, but it seems the only > reliable way to do this kind of selective restore with pg_restore is > with --clean and editing the list-file. Editing the list-file is > certainly tedious and prone to manual error, but I'm not sure this > particular patch has a wide enough use-case to alleviate that pain > significantly. I think there must be a reliable way to restore tables with FKs pointing to them, but getting pg_restore to do it seems perilous; at least given your expectations for the behavior of pg_restore in the context of the existing option set. As with you, I am not inclined to add another option to pg_restore unless it's really useful. (Pg_restore already has gobs of options, to the point where it's getting sticky.) I don't think this patch meets the utility bar. It might be able to be re-worked into something useful, or it might need to evolve into some sort of new restore utility per your thoughts. For now better to reject it until the right/comprehensive way to proceed is figured out. Thanks for all your work. I hope that this has at least moved the discussion forward and not been a waste of everybody's time. I would like to see a "better" way of restoring tables that are FK reference targets. Regards, Karl <kop@meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
Hi there, Refering to https://www.postgresql.org/message-id/1352742344.21373.4@mofo I'm running into situations where I'd need to bulk transfer of data tables across servers, but a drop and recreate schema isn't feasible as we are running different permissions etc. on the two databases. Thus my question: Anybody working on getting this into pg_restore proper, and any advice on getting this feature incorporated? HEndrik