Обсуждение: TRUNCATE SERIALIZABLE and frozen COPY
For 9.2 we discussed having COPY setting tuples as frozen. Various details apply. Earlier threads: "RFC: Making TRUNCATE more "MVCC-safe" "COPY with hints, rebirth" I was unhappy with changing the behaviour of TRUNCATE, and still am. So the proposal here is to have a specific modifier on TRUNCATE command that makes it MVCC safe by throwing a serialization error. That new behaviour should be requestable by adding the SERIALIZABLE keyword. i.e. TRUNCATE foo SERIALIZABLE; This then allows a new style of TRUNCATE, yet without modiying behaviour of earlier programs (ugh!). (Once we have this, somebody that cares more than me may wish to discuss deprecation of existing TRUNCATE behaviour in favour of this new behaviour. I am not suggesting that here, nor do I even see a reason for that at present.) As soon as we have TRUNCATE SERIALIZABLE we can then enable normal COPY to set tuples as frozen when 1) no earlier snapshots exist in the current transaction - since they might see frozen tuples loaded in a later command and thus cause MVCC violation 2) table has been truncated in this subtransaction by a TRUNCATE SERIALIZABLE That then means the optimization would be available for normal data loads/reloads. This looks like a fairly neat way to allow MVCC-aware TRUNCATE for those that want it, without affecting existing code and yet speeding up large loads. (Note that I am not suggesting any change to existing CREATE TABLE behaviour. If you want this optimization, just add in TRUNCATE SERIALIZABLE). Thoughts? -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Nov 7, 2012 at 10:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > For 9.2 we discussed having COPY setting tuples as frozen. Various > details apply. > Earlier threads: > "RFC: Making TRUNCATE more "MVCC-safe" > "COPY with hints, rebirth" > > I was unhappy with changing the behaviour of TRUNCATE, and still am. > So the proposal here is to have a specific modifier on TRUNCATE > command that makes it MVCC safe by throwing a serialization error. I don't think I understand the proposal. Under what circumstances would it throw a serialization error? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 8 November 2012 17:07, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Nov 7, 2012 at 10:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> For 9.2 we discussed having COPY setting tuples as frozen. Various >> details apply. >> Earlier threads: >> "RFC: Making TRUNCATE more "MVCC-safe" >> "COPY with hints, rebirth" >> >> I was unhappy with changing the behaviour of TRUNCATE, and still am. >> So the proposal here is to have a specific modifier on TRUNCATE >> command that makes it MVCC safe by throwing a serialization error. > > I don't think I understand the proposal. Under what circumstances > would it throw a serialization error? If you ask for TRUNCATE SERIALIZABLE then if someone asks for data in the table and has a snapshot that can see earlier data then it will throw a serializable error. So its a new kind of TRUNCATE that is MVCC safe. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 11/08/2012 08:51 PM, Simon Riggs wrote: > On 8 November 2012 17:07, Robert Haas <robertmhaas@gmail.com> wrote: >> On Wed, Nov 7, 2012 at 10:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> For 9.2 we discussed having COPY setting tuples as frozen. Various >>> details apply. >>> Earlier threads: >>> "RFC: Making TRUNCATE more "MVCC-safe" >>> "COPY with hints, rebirth" >>> >>> I was unhappy with changing the behaviour of TRUNCATE, and still am. >>> So the proposal here is to have a specific modifier on TRUNCATE >>> command that makes it MVCC safe by throwing a serialization error. >> I don't think I understand the proposal. Under what circumstances >> would it throw a serialization error? > If you ask for TRUNCATE SERIALIZABLE then if someone asks for data in > the table and has a snapshot that can see earlier data then it will > throw a serializable error. So its a new kind of TRUNCATE that is MVCC > safe. Can't we make it so that the reader with earlier snapshot sees the data from the pre-truncation file ? and we unlink the base file(s) only once nobody has a snapshot the can see it ? or are there some subtler problems (I was under impression that we already did this as described above) ? ---------------- Hannu
On 8 November 2012 23:20, Hannu Krosing <hannu@2ndquadrant.com> wrote: > On 11/08/2012 08:51 PM, Simon Riggs wrote: >> >> On 8 November 2012 17:07, Robert Haas <robertmhaas@gmail.com> wrote: >>> >>> On Wed, Nov 7, 2012 at 10:34 AM, Simon Riggs <simon@2ndquadrant.com> >>> wrote: >>>> >>>> For 9.2 we discussed having COPY setting tuples as frozen. Various >>>> details apply. >>>> Earlier threads: >>>> "RFC: Making TRUNCATE more "MVCC-safe" >>>> "COPY with hints, rebirth" >>>> >>>> I was unhappy with changing the behaviour of TRUNCATE, and still am. >>>> So the proposal here is to have a specific modifier on TRUNCATE >>>> command that makes it MVCC safe by throwing a serialization error. >>> >>> I don't think I understand the proposal. Under what circumstances >>> would it throw a serialization error? >> >> If you ask for TRUNCATE SERIALIZABLE then if someone asks for data in >> the table and has a snapshot that can see earlier data then it will >> throw a serializable error. So its a new kind of TRUNCATE that is MVCC >> safe. > > Can't we make it so that the reader with earlier snapshot sees the data from > the pre-truncation file ? We could... but that would require keeping a history of relfilenodes for an object to allow for more than one TRUNCATE event. Tracking all of that would be hard and I don't personally think its worth that effort. > and we unlink the base file(s) only once nobody has a snapshot the can see > it ? DELETE does that if that's the semantics you want. > or are there some subtler problems (I was under impression that we already > did this as described above) ? -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 11/09/2012 09:34 AM, Simon Riggs wrote: > On 8 November 2012 23:20, Hannu Krosing <hannu@2ndquadrant.com> wrote: >> On 11/08/2012 08:51 PM, Simon Riggs wrote: >>> On 8 November 2012 17:07, Robert Haas <robertmhaas@gmail.com> wrote: >>>> On Wed, Nov 7, 2012 at 10:34 AM, Simon Riggs <simon@2ndquadrant.com> >>>> wrote: >>>>> For 9.2 we discussed having COPY setting tuples as frozen. Various >>>>> details apply. >>>>> Earlier threads: >>>>> "RFC: Making TRUNCATE more "MVCC-safe" >>>>> "COPY with hints, rebirth" >>>>> >>>>> I was unhappy with changing the behaviour of TRUNCATE, and still am. >>>>> So the proposal here is to have a specific modifier on TRUNCATE >>>>> command that makes it MVCC safe by throwing a serialization error. >>>> I don't think I understand the proposal. Under what circumstances >>>> would it throw a serialization error? >>> If you ask for TRUNCATE SERIALIZABLE then if someone asks for data in >>> the table and has a snapshot that can see earlier data then it will >>> throw a serializable error. So its a new kind of TRUNCATE that is MVCC >>> safe. >> Can't we make it so that the reader with earlier snapshot sees the data from >> the pre-truncation file ? > We could... but that would require keeping a history of relfilenodes > for an object to allow for more than one TRUNCATE event. MVCC does keep history of old relfilenodes. I thought we were able to read old MVCC versions in pg_class for this if the snapshot required data matching older pg_class record. > Tracking all > of that would be hard and I don't personally think its worth that > effort. > >> and we unlink the base file(s) only once nobody has a snapshot the can see >> it ? > DELETE does that if that's the semantics you want. > >> or are there some subtler problems (I was under impression that we already >> did this as described above) ?
On Wed, Nov 7, 2012 at 5:34 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > I was unhappy with changing the behaviour of TRUNCATE, and still am. > So the proposal here is to have a specific modifier on TRUNCATE > command that makes it MVCC safe by throwing a serialization error. > That new behaviour should be requestable by adding the SERIALIZABLE > keyword. > i.e. TRUNCATE foo SERIALIZABLE; > This then allows a new style of TRUNCATE, yet without modiying > behaviour of earlier programs (ugh!). Personally I think the behavior should be dictated by the *reader*. The one doing the truncation may not know about the consistency requirements of particular readers. Especially when you do the truncate via pg_restore or some other generic tool. And indeed different readers may have different consistency requirements. So I'd prefer if it were a GUC variable; readers that accept relaxed consistency can set truncate_conflict=off Granted, making this part of the TRUNCATE statement does give you table-level granularity. But if a reader can already handle serialization conflicts, it doesn't really matter which table they came from. But making TRUNCATE behave in a MVCC-safe way, like proposed by Hannu, would be much better. Regards, Marti
On 9 November 2012 10:28, Marti Raudsepp <marti@juffo.org> wrote: > On Wed, Nov 7, 2012 at 5:34 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> I was unhappy with changing the behaviour of TRUNCATE, and still am. >> So the proposal here is to have a specific modifier on TRUNCATE >> command that makes it MVCC safe by throwing a serialization error. >> That new behaviour should be requestable by adding the SERIALIZABLE >> keyword. >> i.e. TRUNCATE foo SERIALIZABLE; >> This then allows a new style of TRUNCATE, yet without modiying >> behaviour of earlier programs (ugh!). > > Personally I think the behavior should be dictated by the *reader*. > The one doing the truncation may not know about the consistency > requirements of particular readers. Especially when you do the > truncate via pg_restore or some other generic tool. And indeed > different readers may have different consistency requirements. > > So I'd prefer if it were a GUC variable; readers that accept relaxed > consistency can set truncate_conflict=off I proposed something similar earlier, but Robert said he didn't like that. The way you've described it here makes more sense, whereas my parameter name made it seem more arbitrary. So I think that looks like the way to go. I'm trying to use this as a way to optimize COPY, so we'd need to make a Seq Scan return zero rows if the truncatexid is seen as running by the snapshot, which is the current behaviour. That seems easy enough. > Granted, making this part of the TRUNCATE statement does give you > table-level granularity. But if a reader can already handle > serialization conflicts, it doesn't really matter which table they > came from. > But making TRUNCATE behave in a MVCC-safe way, like proposed by Hannu, > would be much better. Maybe, but one of the reasons for having a separate TRUNCATE command rather than DELETE is the immediately removal of space. Changing TRUNCATE so it suddenly holds on to space for longer will force us to create a new command that acts like the old TRUNCATE. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Nov 9, 2012 at 8:22 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Personally I think the behavior should be dictated by the *reader*. >> The one doing the truncation may not know about the consistency >> requirements of particular readers. Especially when you do the >> truncate via pg_restore or some other generic tool. And indeed >> different readers may have different consistency requirements. >> >> So I'd prefer if it were a GUC variable; readers that accept relaxed >> consistency can set truncate_conflict=off > > I proposed something similar earlier, but Robert said he didn't like > that. The way you've described it here makes more sense, whereas my > parameter name made it seem more arbitrary. So I think that looks like > the way to go. Hmm, I don't remember saying I didn't like that. Maybe I disliked something about a particular proposed implementation? Actually, I don't really see the need for this to be customizable at all. I have to believe that there is vanishingly little application code that would care about this change in semantics, so why not just change the behavior and call it good? I think the question that hasn't really been adequately answered is: where and how are we going to track conflicts? Your previous patch involved storing an XID in pg_class, but I think we both found that a bit grotty - it'd probably need special handling for wraparound, and I think we came up with some related cases that couldn't be handled in the same way without adding a bunch more XIDs to various places. I don't really like the idea of having XIDs floating around in the system catalogs - it seems like a recipe for bugs, not to mention that storing ephemeral data in a persistent table seems like a mismatch. What I've been wondering since this last came up is whether we could use some variant of the SIREAD locks Kevin introduced for SSI to handle this case - essentially have the transaction doing the TRUNCATE make an entry in the lock table that will force a serialization failure for any backend which accesses the table with a snapshot that can't see the truncating transaction's XID. The lock table entry would need some kind of deferred clean-up, so it doesn't go away until the locker's XID precedes RecentGlobalXmin. Of course, an extra lock table probe for every table access will be unacceptable from a concurrency perspective, but we could probably optimize most of them away by only checking the lock table if the pg_class row's own xmin is new enough that the other backend's MVCC snapshot can't see it. A recent update to pg_class doesn't imply the existing of a lock, but the absence of any recent update to pg_class does imply that no lock can exist. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 9 November 2012 14:01, Robert Haas <robertmhaas@gmail.com> wrote: > I think the question that hasn't really been adequately answered is: > where and how are we going to track conflicts? Your previous patch > involved storing an XID in pg_class, but I think we both found that a > bit grotty - it'd probably need special handling for wraparound, and I > think we came up with some related cases that couldn't be handled in > the same way without adding a bunch more XIDs to various places. I > don't really like the idea of having XIDs floating around in the > system catalogs - it seems like a recipe for bugs, not to mention that > storing ephemeral data in a persistent table seems like a mismatch. Yes, the xid only needs to be transient, not in pg_class. > What I've been wondering since this last came up is whether we could > use some variant of the SIREAD locks Kevin introduced for SSI to > handle this case - essentially have the transaction doing the TRUNCATE > make an entry in the lock table that will force a serialization > failure for any backend which accesses the table with a snapshot that > can't see the truncating transaction's XID. The lock table entry > would need some kind of deferred clean-up, so it doesn't go away until > the locker's XID precedes RecentGlobalXmin. Of course, an extra lock > table probe for every table access will be unacceptable from a > concurrency perspective, but we could probably optimize most of them > away by only checking the lock table if the pg_class row's own xmin is > new enough that the other backend's MVCC snapshot can't see it. A > recent update to pg_class doesn't imply the existing of a lock, but > the absence of any recent update to pg_class does imply that no lock > can exist. I think the xid should still live in relcache, per the patch, but should live in a transient place (and not pg_class). We need a fast lookup structure that is expandable to accommodate arbitrary numbers of truncates. Shared hash table, with some form of overflow mechanism. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Robert Haas wrote: > What I've been wondering since this last came up is whether we > could use some variant of the SIREAD locks Kevin introduced for SSI > to handle this case - essentially have the transaction doing the > TRUNCATE make an entry in the lock table that will force a > serialization failure for any backend which accesses the table with > a snapshot that can't see the truncating transaction's XID. It seems to me that the goal would be to make this semantically idential to the behavior users would see if an unqualified DELETE were run against the table rather than a TRUNCATE. To wit: (1) Any attempt to read from the truncated table would not block. The pg_class version included in the transaction's snapshot would determine which heap and indexes were accessed. If the reading transaction were SERIALIZABLE, it would generate a read-write conflict out to the truncating transaction. (2) Any attempt to write to the truncated table would block until the end of the transaction which is doing the truncation. If the truncating transaction rolls back, it proceeds normally against the old data. Otherwise: If the transaction is READ COMMITTED, follow the pg_class update links. At more strict isolation levels, generate a write conflict error. I'm not sure where any new use of the predicate locking system would come into play in that, other than properly handling read-write conflicts when both transactions were SERIALIZABLE. This seems like a subset of the issues which one might want to address by making DDL statement behave in a more strictly MVCC fashion. Does it make sense to pick those off one at a time, or should something like this be done only in the context of an overall plan to deal with all of it? -Kevin
On Fri, Nov 9, 2012 at 4:24 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > We need a fast lookup structure that is expandable to accommodate > arbitrary numbers of truncates. Shared hash table, with some form of > overflow mechanism. Surely you only need to remember the last completed truncate for each relation? The latest one also invalidates any snapshots before earlier truncates. Regards, Marti
On 9 November 2012 14:55, Marti Raudsepp <marti@juffo.org> wrote: > On Fri, Nov 9, 2012 at 4:24 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> We need a fast lookup structure that is expandable to accommodate >> arbitrary numbers of truncates. Shared hash table, with some form of >> overflow mechanism. > > Surely you only need to remember the last completed truncate for each > relation? Yes > The latest one also invalidates any snapshots before earlier > truncates. 1 per table, arbirary number of tables -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 9 November 2012 14:22, Kevin Grittner <kgrittn@mail.com> wrote: > Robert Haas wrote: > >> What I've been wondering since this last came up is whether we >> could use some variant of the SIREAD locks Kevin introduced for SSI >> to handle this case - essentially have the transaction doing the >> TRUNCATE make an entry in the lock table that will force a >> serialization failure for any backend which accesses the table with >> a snapshot that can't see the truncating transaction's XID. > > It seems to me that the goal would be to make this semantically > idential to the behavior users would see if an unqualified DELETE > were run against the table rather than a TRUNCATE. Unqualified DELETE already runs that way. TRUNCATE is a different command for a reason. Making TRUNCATE like something we already have seems not very useful to me, not least because it breaks existing applications. > This seems like a subset of the issues which one might want to > address by making DDL statement behave in a more strictly MVCC > fashion. Does it make sense to pick those off one at a time, or > should something like this be done only in the context of an overall > plan to deal with all of it? TRUNCATE is not DDL, plus I have no interest in this other than speeding up COPY. Scope creep just kills features. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Nov 9, 2012 at 8:22 AM, Kevin Grittner <kgrittn@mail.com> wrote: > Robert Haas wrote: > >> What I've been wondering since this last came up is whether we >> could use some variant of the SIREAD locks Kevin introduced for SSI >> to handle this case - essentially have the transaction doing the >> TRUNCATE make an entry in the lock table that will force a >> serialization failure for any backend which accesses the table with >> a snapshot that can't see the truncating transaction's XID. > > It seems to me that the goal would be to make this semantically > idential to the behavior users would see if an unqualified DELETE > were run against the table rather than a TRUNCATE. To wit: but, triggers would not fire, right? merlin
Merlin Moncure wrote: > Kevin Grittner <kgrittn@mail.com> wrote: >> Robert Haas wrote: >> It seems to me that the goal would be to make this semantically >> idential to the behavior users would see if an unqualified DELETE >> were run against the table rather than a TRUNCATE. > > but, triggers would not fire, right? Right. Perhaps "identical" was too strong a word. I was referring to the aspect under consideration here -- making it "serializable" in line with other MVCC operations. If we're not talking about making conflicts with other transactions behave just the same as an unqualified DELETE from a user perspective, I'm not sure what the goal is, exactly. Obviously we would be keeping the guts of the implementation the same (swapping in a new, empty heap). -Kevin
On 9 November 2012 15:34, Kevin Grittner <kgrittn@mail.com> wrote: > If we're not talking about making conflicts with other transactions > behave just the same as an unqualified DELETE from a user > perspective, I'm not sure what the goal is, exactly. Reasonable question. My goal is to allow COPY to load frozen tuples without causing MVCC violations. Altering TRUNCATE so it behaves perfectly from an MVCC/Serializable perspective is a much bigger, and completely different goal, as well as something I don't see as desirable anyway for at least 2 good reasons, as explained. IMHO if people want MVCC/Serializable semantics, use DELETE, possibly spending time to make unqualified DELETE do some fancy TRUNCATE-like tricks with relfilenodes. Forcing a tightly scoped proposal into a much wider one will just kill this and leave it blocked. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 9 November 2012 15:46, Simon Riggs <simon@2ndquadrant.com> wrote: > Altering TRUNCATE so it behaves perfectly from an MVCC/Serializable > perspective is a much bigger, and completely different goal, as well > as something I don't see as desirable anyway for at least 2 good > reasons, as explained. IMHO if people want MVCC/Serializable > semantics, use DELETE, possibly spending time to make unqualified > DELETE do some fancy TRUNCATE-like tricks with relfilenodes. We spent a lot of time in 9.2 making TRUNCATE/reload of a table "just work", rather than implementing a REPLACE command. ISTM strange to throw away all that effort, changing behaviour of TRUNCATE and thus forcing the need for a REPLACE command after all. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs wrote: >> This seems like a subset of the issues which one might want to >> address by making DDL statement behave in a more strictly MVCC >> fashion. Does it make sense to pick those off one at a time, or >> should something like this be done only in the context of an >> overall plan to deal with all of it? > > TRUNCATE is not DDL You're right, I should have said utility commands. > I have no interest in this other than speeding up COPY. I would love to have that! > Scope creep just kills features. Well, I wasn't saying it should all be *done* at the same time, but this is not the only utility command which could benefit from such an effort, and if each one is done with no consideration of what it takes for them all to be done, we could wind up with something that doesn't hang together very coherently. Per perhaps this one could serve as a "pilot", to identify issues and help develop such a plan. -Kevin
Simon Riggs wrote: > My goal is to allow COPY to load frozen tuples without causing MVCC > violations. OK. That wasn't initially clear to me. > Forcing a tightly scoped proposal into a much wider one will just > kill this and leave it blocked. The goal is important enough and narrow enough to merit the approach you're talking about, IMV, at least. Sorry I initially misunderstood what you were going for. -Kevin
Simon Riggs <simon@2ndQuadrant.com> writes: > On 9 November 2012 15:34, Kevin Grittner <kgrittn@mail.com> wrote: >> If we're not talking about making conflicts with other transactions >> behave just the same as an unqualified DELETE from a user >> perspective, I'm not sure what the goal is, exactly. > Reasonable question. > My goal is to allow COPY to load frozen tuples without causing MVCC violations. If that's the goal, I question why you're insisting on touching TRUNCATE's behavior. We already have the principle that "TRUNCATE is like DELETE except not concurrent-safe". Why not just invent a non-concurrent-safe option to COPY that loads prefrozen tuples into a new heap, and call it good? There will be visibility oddness from that definition, sure, but AFAICS there will be visibility oddness from what you're talking about too. You'll just have expended a very great deal of effort to make the weirdness a bit different. Even if the TRUNCATE part of it were perfectly clean, the "load prefrozen tuples" part won't be --- so I'm not seeing the value of changing TRUNCATE. regards, tom lane
On Fri, Nov 9, 2012 at 9:22 AM, Kevin Grittner <kgrittn@mail.com> wrote: > (1) Any attempt to read from the truncated table would not block. The > pg_class version included in the transaction's snapshot would > determine which heap and indexes were accessed. Well, the thing is, you can't actually do this. When the transaction commits, we truncate the main forks of the old heap and index and remove all of the supplemental forks. The main forks are finally removed for good at the next checkpoint cycle. To make this work, we'd have to keep around the old heap and index until there were no longer any MVCC snapshots that could see them. That might be useful as an option, but it would need a bunch of additional mechanism, and it doesn't seem desirable as a default behavior because it could defer disk space reclamation indefinitely. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Nov 9, 2012 at 11:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> My goal is to allow COPY to load frozen tuples without causing MVCC violations. > > If that's the goal, I question why you're insisting on touching > TRUNCATE's behavior. We already have the principle that "TRUNCATE is > like DELETE except not concurrent-safe". Why not just invent a > non-concurrent-safe option to COPY that loads prefrozen tuples into a > new heap, and call it good? There will be visibility oddness from that > definition, sure, but AFAICS there will be visibility oddness from what > you're talking about too. You'll just have expended a very great deal > of effort to make the weirdness a bit different. Even if the TRUNCATE > part of it were perfectly clean, the "load prefrozen tuples" part won't > be --- so I'm not seeing the value of changing TRUNCATE. I don't object to the idea of giving COPY a way to load prefrozen tuples, but I think you might be missing the point here otherwise. Right now, if you CREATE or TRUNCATE a table, copy a bunch of data into it, and then commit, another transaction that took a snapshot before your commit can subsequently look at that table and it will NOT see your newly-loaded data. What it will see instead is an empty table. This is, of course, wrong: it ought to fail with a serialization error. It is very possible that the table has never been empty at the conclusion of a completed transaction: it might have contained data before the TRUNCATE, and it might again contain data by the time the truncating transaction commits. Yet, we see it as empty, which is not MVCC-compliant. If we were to make COPY pre-freeze the data when the table was created or truncated in the same transaction, it would alter the behavior in this situation, and from an application perspective, only this situation. Now, instead of seeing the table as empty, you'd see the new contents. This is also not MVCC-compliant, and I guess the concern when we have talked about this topic before is that changing from wrong behavior to another, not-backward-compatible wrong behavior might not be the friendliest thing to do. We could decide we don't care and just break it. Or we could try to make it through a serialization error, as Simon is proposing here, which seems like the tidiest solution. Or we could keep the old heap around until there are no more snapshots that can need it, which is a bit scary since we'd be eating double disk-space in the meantime, but it would certainly be useful to some users, I think. Just having an option to preload frozen tuples dodges all of these issues by throwing our hands up in the air, but it does have the advantage of being more general. Even if we do that I'm not sure it would be a bad thing to try to solve this issue in a somewhat more principled way, but it would surely reduce the urgency. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Just having an option to preload frozen tuples dodges all of these > issues by throwing our hands up in the air, but it does have the > advantage of being more general. Even if we do that I'm not sure it > would be a bad thing to try to solve this issue in a somewhat more > principled way, but it would surely reduce the urgency. Yeah. ISTM the whole point of TRUNCATE is "I don't care about serializability for this operation, give me efficiency instead". So I see nothing wrong with a (non-default) option for COPY that similarly trades away some semantic guarantees for efficiency's sake. There are an awful lot of bulk-load scenarios where people will gladly take that trade, and are not very interested in halfway points either. regards, tom lane
On 9 November 2012 16:27, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On 9 November 2012 15:34, Kevin Grittner <kgrittn@mail.com> wrote: >>> If we're not talking about making conflicts with other transactions >>> behave just the same as an unqualified DELETE from a user >>> perspective, I'm not sure what the goal is, exactly. > >> Reasonable question. > >> My goal is to allow COPY to load frozen tuples without causing MVCC violations. > > If that's the goal, I question why you're insisting on touching > TRUNCATE's behavior. We already have the principle that "TRUNCATE is > like DELETE except not concurrent-safe". Why not just invent a > non-concurrent-safe option to COPY that loads prefrozen tuples into a > new heap, and call it good? There will be visibility oddness from that > definition, sure, but AFAICS there will be visibility oddness from what > you're talking about too. You'll just have expended a very great deal > of effort to make the weirdness a bit different. Even if the TRUNCATE > part of it were perfectly clean, the "load prefrozen tuples" part won't > be --- so I'm not seeing the value of changing TRUNCATE. This is wonderful thought and I wish I'd thought of it. My digression via truncate now annoys me. Yes, there are objections and I've read what Robert has said. An explicit new option is perfectly entitled to introduce new behaviour and won't cause a problem with existing applications. I personally don't care about serializable stuff here, and nor do most others. They just want a way to load new data quickly. It is important to me that we do things in well principled ways, but the task at hand is data loading not perfect visibility guarantees. So what we're talking about here is a new mode for COPY, that when requested will pre-freeze tuples when loading into a newly created/truncated table. If the table isn't newly created/truncated then we'll just ignore it and continue. I see no need to throw an error, since that will just cause annoying usability issues. COPY FREEZE here we come, with extensive docs to explain the trade-off the user is accepting. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Nov 9, 2012 at 3:31 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > So what we're talking about here is a new mode for COPY, that when > requested will pre-freeze tuples when loading into a newly > created/truncated table. If the table isn't newly created/truncated > then we'll just ignore it and continue. I see no need to throw an > error, since that will just cause annoying usability issues. Actually, why not just have it work always? If people want to load frozen tuples into a table that's not newly created/truncated, why not let them? Sure, there could be MVCC violations, but as long as the behavior is opt-in, who cares? I think it'd be useful to a lot of people. If we want to reduce (not eliminate) the potential MVCC issues, which I think would be a good idea, we could take AccessExclusiveLock on the table when COPY (FREEZE) is used. Someone using an old snapshot but accessing the table for the first time after AEL is released could still see MVCC anomalies, but at least it would rule out things changing in mid-query, which is the case that I think would be most problematic. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Nov 9, 2012 at 3:31 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> So what we're talking about here is a new mode for COPY, that when >> requested will pre-freeze tuples when loading into a newly >> created/truncated table. If the table isn't newly created/truncated >> then we'll just ignore it and continue. I see no need to throw an >> error, since that will just cause annoying usability issues. > Actually, why not just have it work always? If people want to load > frozen tuples into a table that's not newly created/truncated, why not > let them? Sure, there could be MVCC violations, but as long as the > behavior is opt-in, who cares? I think it'd be useful to a lot of > people. I thought about that too, but there's a big problem. It wouldn't be just MVCC that would be broken, but transactional integrity: if the COPY fails partway through, the already-loaded rows still look valid. The new-file requirement provides a way to roll them back. I'm willing to have an option that compromises MVCC semantics transiently, but giving up transactional integrity seems a bit much. regards, tom lane
On Mon, Nov 12, 2012 at 11:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Nov 9, 2012 at 3:31 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> So what we're talking about here is a new mode for COPY, that when >>> requested will pre-freeze tuples when loading into a newly >>> created/truncated table. If the table isn't newly created/truncated >>> then we'll just ignore it and continue. I see no need to throw an >>> error, since that will just cause annoying usability issues. > >> Actually, why not just have it work always? If people want to load >> frozen tuples into a table that's not newly created/truncated, why not >> let them? Sure, there could be MVCC violations, but as long as the >> behavior is opt-in, who cares? I think it'd be useful to a lot of >> people. > > I thought about that too, but there's a big problem. It wouldn't be > just MVCC that would be broken, but transactional integrity: if the > COPY fails partway through, the already-loaded rows still look valid. > The new-file requirement provides a way to roll them back. > > I'm willing to have an option that compromises MVCC semantics > transiently, but giving up transactional integrity seems a bit much. Hmm, good point. There might be some way around that, but figuring it out is probably material for a separate patch. But I guess that raises the question - should COPY (FREEZE) silently ignore the option for not-new relfilenodes, or should it error out? Simon proposed the former, but I'm wondering if the latter would be better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12 November 2012 16:22, Robert Haas <robertmhaas@gmail.com> wrote: > But I guess that raises the question - should COPY (FREEZE) silently > ignore the option for not-new relfilenodes, or should it error out? > Simon proposed the former, but I'm wondering if the latter would be > better. It's got some complex pre-conditions, so having scripts fail because you mis-specified FREEZE would be annoying. The option indicates "I accept the potential MVCC violation", not "it will always freeze". If there is a better name... -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services