Обсуждение: [PATCH] Transaction traceability - txid_status(bigint)
Hi all
--
Following on from
bigint txids vs 'xid' type, new txid_recent(bigint) => xid
here's a patch that implements a txid_status(bigint) function to report the commit status of a function.
If an application is disconnected while a COMMIT request is in flight,
the backend crashes mid-commit, etc, then an application may not be
sure whether or not a commit completed successfully or was rolled
back. While two-phase commit solves this it does so at a considerable
overhead, so introduce a lighter alternative.
txid_status(bigint) lets an application determine the status of a
a commit based on an xid-with-epoch as returned by txid_current()
or similar. Status may be committed, aborted, in-progress (including
prepared xacts) or null if the xact is too old for its commit status
to still be retained because it has passed the wrap-around epoch
boundary.
Applications must call txid_current() in their transactions to make
much use of this since PostgreSQL does not automatically report an xid
to the client when one is assigned.
A future protocol enhancement to report txid assignment would be very useful, but quite separate to this.
Вложения
On 20 August 2016 at 21:24, Craig Ringer <craig@2ndquadrant.com> wrote:
Hi allFollowing on frombigint txids vs 'xid' type, new txid_recent(bigint) => xid
Ahem. Forgot to squash in a fixup commit. Updated patch of txid_status(bigint) attachd.
A related patch follows, adding a new txid_current_ifassigned( bigint) function as suggested by Jim Nasby. It's usefully connected to txid_status() and might as well be added at the same time.
Since it builds on the same history I've also attached an updated version of txid_recent(bigint) now called txid_convert_ifrecent(bigint), per the above-linked thread.
Finally, and not intended for commit, is a useful test function I wrote to cause extremely rapid xid wraparound, bundled up into a src/test/regress test case. txid_incinerate() can jump the server about UINT32/2 xids in ~2 seconds if fsync is off, making it handy for testing. Posting so others can use it for their own test needs later and because it's useful for testing these patches that touch on the xid epoch.
Вложения
On Sat, Aug 20, 2016 at 9:46 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > Ahem. Forgot to squash in a fixup commit. Updated patch of > txid_status(bigint) attachd. > > A related patch follows, adding a new txid_current_ifassigned(bigint) > function as suggested by Jim Nasby. It's usefully connected to txid_status() > and might as well be added at the same time. > > Since it builds on the same history I've also attached an updated version of > txid_recent(bigint) now called txid_convert_ifrecent(bigint), per the > above-linked thread. > > Finally, and not intended for commit, is a useful test function I wrote to > cause extremely rapid xid wraparound, bundled up into a src/test/regress > test case. txid_incinerate() can jump the server about UINT32/2 xids in ~2 > seconds if fsync is off, making it handy for testing. Posting so others can > use it for their own test needs later and because it's useful for testing > these patches that touch on the xid epoch. I think you should use underscores to separate all of the words instead of only some of them. Also, note that the corresponding internal function is GetTopTransactionIdIfAny(), which might suggest txid_current_if_any() rather than txid_current_if_assigned(), but you could argue that your naming is better... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 23 August 2016 at 01:03, Robert Haas <robertmhaas@gmail.com> wrote:
-- I think you should use underscores to separate all of the words
instead of only some of them.
Right. Will fix.
Thanks for taking a look.
Also, note that the corresponding internal function is
GetTopTransactionIdIfAny(), which might suggest txid_current_if_any()
rather than txid_current_if_assigned(), but you could argue that your
naming is better.
Yeah, I do argue that in this case. Not a hugely strong opinion, but we refer to txid_current() in the docs as:
"get current transaction ID, assigning a new one if the current transaction does not have one"
so I thought it'd be worth being consistent with that. It's not like txid_current() mirrors the name of GetTopTransactionId() after all ;) and I think it's more important to be consistent with what the user sees than with the code.
On 23 August 2016 at 01:03, Robert Haas <robertmhaas@gmail.com> wrote:
I think you should use underscores to separate all of the words
instead of only some of them.
ifassigned => if_assigned
ifrecent=> if_recent
Updated patch series attached. As before, 0-4 intended for commit, 5 just because it'll be handy to have around for people doing wraparound related testing.
Вложения
On 23/08/16 02:55, Craig Ringer wrote: > On 23 August 2016 at 01:03, Robert Haas <robertmhaas@gmail.com > <mailto:robertmhaas@gmail.com>> wrote: > > > > I think you should use underscores to separate all of the words > instead of only some of them. > > > ifassigned => if_assigned > > ifrecent=> if_recent > > Updated patch series attached. As before, 0-4 intended for commit, 5 > just because it'll be handy to have around for people doing wraparound > related testing. I guess you mean 0-3 for commit and 4 is just handy? From the point of code this patch seems good to me. I do wonder about the 3rd patch though. I wonder if it would not be better to have the opposite function instead - converting xid to txid as that will always work and does not have to have the NULL case and would be simpler in terms of code. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
<p dir="ltr">On 23 Aug 2016 16:02, "Petr Jelinek" <<a href="mailto:petr@2ndquadrant.com">petr@2ndquadrant.com</a>>wrote:<br /> ><br /> > On 23/08/16 02:55, Craig Ringerwrote:<br /> >><br /> >> On 23 August 2016 at 01:03, Robert Haas <<a href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a><br/> >> <mailto:<a href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>>>wrote:<br /> >><br /> >><br /> >><br/> >> I think you should use underscores to separate all of the words<br /> >> instead ofonly some of them.<br /> >><br /> >><br /> >> ifassigned => if_assigned<br /> >><br /> >>ifrecent=> if_recent<br /> >><br /> >> Updated patch series attached. As before, 0-4 intended forcommit, 5<br /> >> just because it'll be handy to have around for people doing wraparound<br /> >> relatedtesting.<br /> ><br /> ><br /> > I guess you mean 0-3 for commit and 4 is just handy?<p dir="ltr">Er. Right.1-3. 4 just as handy test/tool.<p dir="ltr">1 most important and useful. Then 2. Then 3.<p dir="ltr">> From thepoint of code this patch seems good to me.<p dir="ltr">Thanks.<p dir="ltr">> I do wonder about the 3rd patch though.I wonder if it would not be better to have the opposite function instead - converting xid to txid as that will alwayswork and does not have to have the NULL case and would be simpler in terms of code.<p dir="ltr">Yeah, but it wouldn'tsolve the need to take txid_current() output and do stuff with it other than ordinal comparison. Like pass to committs functions and others that take xid. If we extend all funcs that take xid to take bigint instead, they just get touse the same epoch logic in them, complete with some way to deal with wrapped xids sensibly. It has to be done somewhere.Though it's prettier if hidden from the user.<p dir="ltr">More importantly imo, txid => bigint has to assumethe current epoch. We have no way to make sure the user doesn't try to use something already wrapped.<p dir="ltr">Idon't mind if everyone decides it's better to make xid go away and use bigint everywhere user facing. Or evena new bigxid type. More work than I can really afford but can manage; shouldn't block #1 and #2 though as they alreadyuse bigint.
On 23/08/16 11:27, Craig Ringer wrote: > On 23 Aug 2016 16:02, "Petr Jelinek" <petr@2ndquadrant.com > <mailto:petr@2ndquadrant.com>> wrote: >> >> On 23/08/16 02:55, Craig Ringer wrote: >>> >>> On 23 August 2016 at 01:03, Robert Haas <robertmhaas@gmail.com > <mailto:robertmhaas@gmail.com> >>> <mailto:robertmhaas@gmail.com <mailto:robertmhaas@gmail.com>>> wrote: >>> >>> >>> >>> I think you should use underscores to separate all of the words >>> instead of only some of them. >>> >>> >>> ifassigned => if_assigned >>> >>> ifrecent=> if_recent >>> >>> Updated patch series attached. As before, 0-4 intended for commit, 5 >>> just because it'll be handy to have around for people doing wraparound >>> related testing. >> >> >> I guess you mean 0-3 for commit and 4 is just handy? > > Er. Right. 1-3. 4 just as handy test/tool. > > 1 most important and useful. Then 2. Then 3. > >> From the point of code this patch seems good to me. > > Thanks. > >> I do wonder about the 3rd patch though. I wonder if it would not be > better to have the opposite function instead - converting xid to txid as > that will always work and does not have to have the NULL case and would > be simpler in terms of code. > > Yeah, but it wouldn't solve the need to take txid_current() output and > do stuff with it other than ordinal comparison. Like pass to commit ts > functions and others that take xid. If we extend all funcs that take xid > to take bigint instead, they just get to use the same epoch logic in > them, complete with some way to deal with wrapped xids sensibly. It has > to be done somewhere. Though it's prettier if hidden from the user. > > More importantly imo, txid => bigint has to assume the current epoch. We > have no way to make sure the user doesn't try to use something already > wrapped. > Okay, fair points. > I don't mind if everyone decides it's better to make xid go away and use > bigint everywhere user facing. Or even a new bigxid type. More work than > I can really afford but can manage; shouldn't block #1 and #2 though as > they already use bigint. > I don't think that would be very straightforward to be honest. I guess for what you want to achieve the approach chosen is the best one then. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Aug 22, 2016 at 8:55 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > Updated patch series attached. As before, 0-4 intended for commit, 5 just > because it'll be handy to have around for people doing wraparound related > testing. > > Again, thanks for taking a look. /me reviews a bit more deeply. In 0001, it seems to me that "in-progress" should be "in progress". I don't think it's normal to hyphenate that. We have admittedly sometimes done so, but: [rhaas pgsql]$ git grep 'in-progress' | wc -l 63 [rhaas pgsql]$ git grep 'in progress' | wc -l 346 It may make sense to speak of an "in-progress transaction" but I would say "the transaction is in progress" not "the transaction is in-progress", which seems to me to argue for a space as the proper separator here. Also: +CREATE TYPE txid_status AS ENUM ('committed', 'in-progress', 'aborted'); + +CREATE FUNCTION + txid_status(txid bigint) +RETURNS txid_status +LANGUAGE sql +VOLATILE PARALLEL SAFE +AS $$ +SELECT CASE + WHEN s IS NULL THEN NULL::txid_status + WHEN s = -1 THEN 'aborted'::txid_status + WHEN s = 0 THEN 'in-progress'::txid_status + WHEN s = 1 THEN 'committed'::txid_status +END +FROM pg_catalog.txid_status_internal($1) s; +$$; + +COMMENT ON FUNCTION txid_status(bigint) +IS 'get commit status of given recent xid or null if too old'; I'm not really that keen on this approach. I don't think we need to introduce a new data type for this, and I would rather not use SQL, either. It would be faster and simpler to just return the appropriate string from a C function defined directly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Aug 23, 2016 at 10:18 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Aug 22, 2016 at 8:55 PM, Craig Ringer <craig@2ndquadrant.com> wrote: >> Updated patch series attached. As before, 0-4 intended for commit, 5 just >> because it'll be handy to have around for people doing wraparound related >> testing. >> >> Again, thanks for taking a look. > > /me reviews a bit more deeply. > > In 0001, ... 0002 looks good, so I committed it. You forgot a function prototype for the new SQL-callable function, though, so I added that. For me, it generates a compiler warning if that's missing; you might want to try to achieve a similar setup. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 23 August 2016 at 22:18, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Aug 22, 2016 at 8:55 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> Updated patch series attached. As before, 0-4 intended for commit, 5 just
> because it'll be handy to have around for people doing wraparound related
> testing.
>
> Again, thanks for taking a look.
/me reviews a bit more deeply.
In 0001, it seems to me that "in-progress" should be "in progress".
Fine by me. I was on the fence about it anyway.
+CREATE TYPE txid_status AS ENUM ('committed', 'in-progress', 'aborted');
I'm not really that keen on this approach. I don't think we need to
introduce a new data type for this, and I would rather not use SQL,
either. It would be faster and simpler to just return the appropriate
string from a C function defined directly.
Also fine by me. You're right, keep it simple. It means the potential set of values isn't discoverable the same way, but ... meh. Using it usefully means reading the docs anyway.
The remaining 2 patches of interest are attached - txid_status() and txid_convert_if_recent(). Thanks for committing txid_current_if_assigned().
Now I'd best stop pretending I'm in a sensible timezone.
Вложения
On Tue, Aug 23, 2016 at 12:59 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > Also fine by me. You're right, keep it simple. It means the potential set of > values isn't discoverable the same way, but ... meh. Using it usefully means > reading the docs anyway. > > The remaining 2 patches of interest are attached - txid_status() and > txid_convert_if_recent(). Thanks for committing txid_current_if_assigned(). > > Now I'd best stop pretending I'm in a sensible timezone. I reviewed this version some more and found some more problems. + uint32 xid_epoch = (uint32)(xid_with_epoch >>32); + TransactionId xid = (TransactionId)(xid_with_epoch); I think this is not project style. In particular, I think that the first one needs a space after the cast and another space before the 32; and I think the second one has an unnecessary set of parentheses and needs a space added. +/* + * Underlying implementation of txid_status, which is mapped to an enum in + * system_views.sql. + */ Not any more... + errmsg("transaction ID "UINT64_FORMAT" is an invalid xid", + xid_with_epoch))); Spacing. + if (TransactionIdIsCurrentTransactionId(xid) || TransactionIdIsInProgress(xid)) + status = gettext_noop("in progress"); + else if (TransactionIdDidCommit(xid)) + status = gettext_noop("committed"); + else if (TransactionIdDidAbort(xid)) + status = gettext_noop("aborted"); + else + /* shouldn't happen */ + ereport(ERROR, + (errmsg_internal("unable to determine commit status of xid "UINT64_FORMAT, xid))); Maybe I'm all wet here, but it seems like there might be a problem here if the XID is older than the CLOG truncation point but less than one epoch old. get_xid_in_recent_past only guarantees that the transaction is less than one epoch old, not that we still have CLOG data for it. And there's nothing to keep NextXID from advancing under us, so if somebody asks about a transaction that's just under 2^32 transactions old, then get_xid_in_recent_past could say it's valid, then NextXID advances and we look up the XID extracted from the txid and get the status of the new transaction instead of the old one! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 24 August 2016 at 03:10, Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Aug 23, 2016 at 12:59 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > > Also fine by me. You're right, keep it simple. It means the potential set of > > values isn't discoverable the same way, but ... meh. Using it usefully means > > reading the docs anyway. > > > > The remaining 2 patches of interest are attached - txid_status() and > > txid_convert_if_recent(). Thanks for committing txid_current_if_assigned(). > > > > Now I'd best stop pretending I'm in a sensible timezone. > > I reviewed this version some more and found some more problems. Thanks. It took me a few days to prep a new patch as I found another issue in the process. Updated patch attached. The updated series starts (0001) with a change to slru.c to release the control lock when throwing an exception so that we don't deadlock with ourselves when re-entering slru.c; explanation below. Then there's the txid_status (0002) patch with fixes, then txid_convert_if_recent(0003). I omitted txid_incinerate() ; I have an updated version that sets xact status to aborted for burned xacts instead of leaving them at 0 (in-progress), but haven't had time to finish it so it doesn't try to blindly set the status of xacts on pages where it didn't hold XidGenLock when the page was added to the clog. > + uint32 xid_epoch = (uint32)(xid_with_epoch >>32); > + TransactionId xid = (TransactionId)(xid_with_epoch); > > I think this is not project style. In particular, I think that the > first one needs a space after the cast and another space before the > 32; and I think the second one has an unnecessary set of parentheses > and needs a space added. OK, no problems. I didn't realise spacing around casts was specified. > > +/* > + * Underlying implementation of txid_status, which is mapped to an enum in > + * system_views.sql. > + */ > > Not any more... That's why I shouldn't revise a patch at 1am ;) > > + if (TransactionIdIsCurrentTransactionId(xid) || > TransactionIdIsInProgress(xid)) > + status = gettext_noop("in progress"); > + else if (TransactionIdDidCommit(xid)) > + status = gettext_noop("committed"); > + else if (TransactionIdDidAbort(xid)) > + status = gettext_noop("aborted"); > + else > + /* shouldn't happen */ > + ereport(ERROR, > + (errmsg_internal("unable to determine commit status of > xid "UINT64_FORMAT, xid))); > > Maybe I'm all wet here, but it seems like there might be a problem > here if the XID is older than the CLOG truncation point but less than > one epoch old. get_xid_in_recent_past only guarantees that the > transaction is less than one epoch old, not that we still have CLOG > data for it. Good point. The call would then fail with something like ERROR: could not access status of transaction 778793573 DETAIL: could not open file "pg_clog/02E6": No such file or directory This probably didn't come up in my wraparound testing because I'm aggressively forcing wraparound by writing a lot of clog very quickly under XidGenLock, and because I'm mostly looking at xacts that are either recent or past the xid boundary. I've added better add coverage for xacts around 2^30 behind the nextXid to the wraparound tests; can't add them to txid.sql since the xid never gets that far in normal regression testing. What I'd really like is to be able to ask transam.c to handle the xid_in_recent_past logic, treating an attempt to read an xid from beyond the clog truncation threshold as a soft error indicating unknown xact state. But that involves delving into slru.c, and I really, really don't want to touch that for what should be a simple and pretty noninvasive utility function. A PG_TRY to trap ERRCODE_UNDEFINED_FILE seems like it'd be sufficient, except for two issues: * I see no accepted way to access the errcode etc from within the PG_CATCH block, though. PG_RE_THROW() can do it because pg_re_throw() is in elog.c. I couldn't find any existing code that seems to check details about an error thrown in a PG_TRY block, only SPI calls. I don't want to ignore all types of errors and potentially hide problems, so I just used geterrcode() - which is meant for errcontext callbacks - and changed the comment to say it can be used in PG_CATCH too. I don't see why it shouldn't be. We should probably have some sort of PG_CATCH_INFO(varname) that exposes the top ErrorData, but that's not needed for this patch so I left it alone. * TransactionIdGetStatus() releases the CLogControlLock taken by SimpleLruReadPage_ReadOnly() on normal exit but not after an exception thrown from SlruReportIOError(). It seems appropriate for SimpleLruReadPage() to release the LWLock before calling SlruReportIOError(), so I've done that as a separate patch (now 0001). I also removed the TransactionIdInProgress check in txid_status and just assume it's in progress if it isn't our xid, committed or aborted. TransactionIdInProgress looks like it's potentially more expensive, and most of the time we'll be looking at committed or aborted xacts anyway. I can't sanity-check TransactionIdInProgress after commited/aborted, because there's then a race where the xact can commit or abort after we decide it's not committed/aborted so it's not in progress when we go to check that. > And there's nothing to keep NextXID from advancing under > us, so if somebody asks about a transaction that's just under 2^32 > transactions old, then get_xid_in_recent_past could say it's valid, > then NextXID advances and we look up the XID extracted from the txid > and get the status of the new transaction instead of the old one! Hm, yeah. Though due to the clog truncation issue you noticed it probably won't happen. We could require that XidGenLock be held at least as LW_SHARED when entering get_xid_in_recent_past(), but I'd rather not since that'd be an otherwise-unnecessary lwlock for txid_convert_ifrecent(). Instead, I think I'll rename the wraparound flag to too_old and set it if the xact is more than say 2^30 from the epoch struct's last_xid, leaving a race window so ridiculously improbable that the nearly impossible chance of failing with a clog access error is not a worry. If the server's managing to have a proc stuck that long then it's already on fire. We're only interested in reasonably recent xacts since we can only work with xacts before wraparound / clog truncation. This just moves the threshold for "reasonably recent" a bit closer. All this certainly reinforces my view that users handling 'xid' directly or trying to extract it from a bigint epoch-extended xid is a bad idea that needs to go away soon. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
Hi, On 2016-08-29 11:25:39 +0800, Craig Ringer wrote: > ERROR: could not access status of transaction 778793573 > DETAIL: could not open file "pg_clog/02E6": No such file or directory > > What I'd really like is to be able to ask transam.c to handle the > xid_in_recent_past logic, treating an attempt to read an xid from > beyond the clog truncation threshold as a soft error indicating > unknown xact state. But that involves delving into slru.c, and I > really, really don't want to touch that for what should be a simple > and pretty noninvasive utility function. Can't you "just" check this against ShmemVariableCache->oldestXid while holding appropriate locks? > A PG_TRY to trap ERRCODE_UNDEFINED_FILE seems like it'd be sufficient, > except for two issues: It seems like a bad idea to PG_CATCH and not re-throw an error. That generally is quite error prone. At the very least locking and such gets a lot more complicated (as you noticed below). > * TransactionIdGetStatus() releases the CLogControlLock taken by > SimpleLruReadPage_ReadOnly() on normal exit but not after an exception > thrown from SlruReportIOError(). It seems appropriate for > SimpleLruReadPage() to release the LWLock before calling > SlruReportIOError(), so I've done that as a separate patch (now 0001). We normally prefer to handle this via the "bulk" releases in the error handlers. It's otherwise hard to write code that handles these situations reliably. It's different for spinlocks, but those normally protect far smaller regions of code. Andres
On Sun, Aug 28, 2016 at 11:25 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > What I'd really like is to be able to ask transam.c to handle the > xid_in_recent_past logic, treating an attempt to read an xid from > beyond the clog truncation threshold as a soft error indicating > unknown xact state. But that involves delving into slru.c, and I > really, really don't want to touch that for what should be a simple > and pretty noninvasive utility function. I think you're going to have to bite the bullet and do that, though, because ... > A PG_TRY to trap ERRCODE_UNDEFINED_FILE seems like it'd be sufficient, ...I don't think this has any chance of being acceptable. You can't catch errors and not re-throw them. That's bad news. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 29 August 2016 at 11:45, Andres Freund <andres@anarazel.de> wrote: > Hi, > > On 2016-08-29 11:25:39 +0800, Craig Ringer wrote: >> ERROR: could not access status of transaction 778793573 >> DETAIL: could not open file "pg_clog/02E6": No such file or directory >> >> What I'd really like is to be able to ask transam.c to handle the >> xid_in_recent_past logic, treating an attempt to read an xid from >> beyond the clog truncation threshold as a soft error indicating >> unknown xact state. But that involves delving into slru.c, and I >> really, really don't want to touch that for what should be a simple >> and pretty noninvasive utility function. > > Can't you "just" check this against ShmemVariableCache->oldestXid while > holding appropriate locks? Hm. Yeah, I should've thought of that. Thank you. >> A PG_TRY to trap ERRCODE_UNDEFINED_FILE seems like it'd be sufficient, >> except for two issues: > > It seems like a bad idea to PG_CATCH and not re-throw an error. That > generally is quite error prone. At the very least locking and such gets > a lot more complicated (as you noticed below). Yeah, and as I remember from the "fun" of trying to write apply errors to tables in BDR. It wasn't my first choice. >> * TransactionIdGetStatus() releases the CLogControlLock taken by >> SimpleLruReadPage_ReadOnly() on normal exit but not after an exception >> thrown from SlruReportIOError(). It seems appropriate for >> SimpleLruReadPage() to release the LWLock before calling >> SlruReportIOError(), so I've done that as a separate patch (now 0001). > > We normally prefer to handle this via the "bulk" releases in the error > handlers. It's otherwise hard to write code that handles these > situations reliably. It's different for spinlocks, but those normally > protect far smaller regions of code. Fair enough. It's not a complex path, but there are a _lot_ of callers, and while I can't really imagine any of them relying on the CLogControLock being held on error it's not something I was keen to change. I thought complicating the clog with a soft-error interface was worse and didn't come up with a better approach. Said better approach attached in revised series. Thanks. My only real complaint with doing this is that it's a bit more conservative. But in practice clog truncation probably won't follow that far behind oldestXmin so except in fairly contrived circumstances it won't hurt. Apps that need guarantees about how old an xid they can get status on can hold down xmin with a replication slot, a dummy prepared xact, or whatever. If we find that becomes a common need that should be made simpler then appropriate API to allow apps to hold down clog truncation w/o blocking vacuuming can be added down the track. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On 29 August 2016 at 15:53, Craig Ringer <craig@2ndquadrant.com> wrote: > Said better approach attached in revised series. Thanks. Here's another minor update to the txid_status() and txid_convert_if_recent() patches. The only change is moving get_xid_in_recent_past from src/backend/utils/adt/txid.c to src/backend/access/transam/xlog.c to permit its use by other code. Specifically, I think it'll be needed for logical decoding on standby. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On 1 September 2016 at 13:08, Craig Ringer <craig@2ndquadrant.com> wrote: > On 29 August 2016 at 15:53, Craig Ringer <craig@2ndquadrant.com> wrote: > >> Said better approach attached in revised series. Thanks. > > Here's another minor update to the txid_status() and > txid_convert_if_recent() patches. The only change is moving > get_xid_in_recent_past from src/backend/utils/adt/txid.c to > src/backend/access/transam/xlog.c to permit its use by other code. > Specifically, I think it'll be needed for logical decoding on standby. Ahem, wrong patches. Attached correctly now. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
Here's another update to these patches. While working on support for logical decoding on standbys I noticed that I needed something like get_xid_in_recent_past(...) there too. So I've moved it to xlog.c as TransactionIdInRecentPast too and flipped its arguments to be more convenient. No other changes. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On 2 September 2016 at 13:16, Craig Ringer <craig@2ndquadrant.com> wrote: > So I've moved it to xlog.c... I'm pretty sure it shouldn't live in xlog.c, but there may be some good reason I can't see yet. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
<p dir="ltr"><p dir="ltr">On 2 Sep. 2016 8:30 pm, "Simon Riggs" <<a href="mailto:simon@2ndquadrant.com">simon@2ndquadrant.com</a>>wrote:<br /> ><br /> > On 2 September 2016 at 13:16,Craig Ringer <<a href="mailto:craig@2ndquadrant.com">craig@2ndquadrant.com</a>> wrote:<br /> ><br /> >> So I've moved it to xlog.c...<br /> ><br /> > I'm pretty sure it shouldn't live in xlog.c, but there may besome<br /> > good reason I can't see yet.<p dir="ltr"> Ugh. Yes. transam.c would be rather saner.<p dir="ltr">Only forthe helper to determine if an xid is recent though; txid_ status stays in adt/xact.c where it belongs along with txid_current()etc.<p dir="ltr">> --<br /> > Simon Riggs <a href="http://www.2ndQuadrant.com/">http://www.2ndQuadrant.com/</a><br/> > PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services<br />
On 2 September 2016 at 20:38, Craig Ringer <craig@2ndquadrant.com> wrote: > On 2 Sep. 2016 8:30 pm, "Simon Riggs" <simon@2ndquadrant.com> wrote: >> >> On 2 September 2016 at 13:16, Craig Ringer <craig@2ndquadrant.com> wrote: >> >> > So I've moved it to xlog.c... >> >> I'm pretty sure it shouldn't live in xlog.c, but there may be some >> good reason I can't see yet. > > Ugh. Yes. transam.c would be rather saner. > > Only for the helper to determine if an xid is recent though; txid_ status > stays in adt/xact.c where it belongs along with txid_current() etc. Fixed, moved to transam.c. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On 2 September 2016 at 21:01, Craig Ringer <craig@2ndquadrant.com> wrote: > On 2 September 2016 at 20:38, Craig Ringer <craig@2ndquadrant.com> wrote: >> On 2 Sep. 2016 8:30 pm, "Simon Riggs" <simon@2ndquadrant.com> wrote: >>> >>> On 2 September 2016 at 13:16, Craig Ringer <craig@2ndquadrant.com> wrote: >>> >>> > So I've moved it to xlog.c... >>> >>> I'm pretty sure it shouldn't live in xlog.c, but there may be some >>> good reason I can't see yet. >> >> Ugh. Yes. transam.c would be rather saner. >> >> Only for the helper to determine if an xid is recent though; txid_ status >> stays in adt/xact.c where it belongs along with txid_current() etc. > > Fixed, moved to transam.c. Missed that this causes an undefined reference to GetNextXidAndEpoch() which is in xlog.c. I knew there was a reason I put it there. So most recent patch is wrong, use the prior one. GetNextXidAndEpoch() needs to be in xlog.c because it uses XLogCtl's shmem copy of checkPoint.nextXidEpoch. So either transam.c needs to #include xlog.h (which seems a bit backwards) or TransactionIdInRecentPast() should go in xlog.c. I don't like either really. Opinion? I'm sure we'll want this functionality in other places as part of dealing with the problems discussed upthread with 'xid' exposed to users. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 02/09/16 15:46, Craig Ringer wrote: > On 2 September 2016 at 21:01, Craig Ringer <craig@2ndquadrant.com> wrote: >> On 2 September 2016 at 20:38, Craig Ringer <craig@2ndquadrant.com> wrote: >>> On 2 Sep. 2016 8:30 pm, "Simon Riggs" <simon@2ndquadrant.com> wrote: >>>> >>>> On 2 September 2016 at 13:16, Craig Ringer <craig@2ndquadrant.com> wrote: >>>> >>>>> So I've moved it to xlog.c... >>>> >>>> I'm pretty sure it shouldn't live in xlog.c, but there may be some >>>> good reason I can't see yet. >>> >>> Ugh. Yes. transam.c would be rather saner. >>> >>> Only for the helper to determine if an xid is recent though; txid_ status >>> stays in adt/xact.c where it belongs along with txid_current() etc. >> >> Fixed, moved to transam.c. > > Missed that this causes an undefined reference to GetNextXidAndEpoch() > which is in xlog.c. I knew there was a reason I put it there. So most > recent patch is wrong, use the prior one. > > GetNextXidAndEpoch() needs to be in xlog.c because it uses XLogCtl's > shmem copy of checkPoint.nextXidEpoch. So either transam.c needs to > #include xlog.h (which seems a bit backwards) or > TransactionIdInRecentPast() should go in xlog.c. > > I don't like either really. Opinion? I'm sure we'll want this > functionality in other places as part of dealing with the problems > discussed upthread with 'xid' exposed to users. > You could put it to txid.c where all the other txid stuff is in? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2 September 2016 at 23:29, Petr Jelinek <petr@2ndquadrant.com> wrote: > You could put it to txid.c where all the other txid stuff is in? Yeah, even though it's in adt/ I think it'll do. I thought I'd need get_xid_in_recent_past() for catalog_xmin hot standby feedback, but upon closer examination the needed logic isn't the same anymore. txid_status() wants to ensure clog lookups are safe and limit by oldest xid, wheras the walsender doesn't actually care about that and is just avoiding wrapped xids. I'm just going back to how it was, all in adt/txid.c, and making it static again. We can move it and make it non-static if a need to do so comes up. Attached rebased patch updated and vs master. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On Thu, Sep 15, 2016 at 8:52 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 2 September 2016 at 23:29, Petr Jelinek <petr@2ndquadrant.com> wrote: > >> You could put it to txid.c where all the other txid stuff is in? > > Yeah, even though it's in adt/ I think it'll do. > > I thought I'd need get_xid_in_recent_past() for catalog_xmin hot > standby feedback, but upon closer examination the needed logic isn't > the same anymore. txid_status() wants to ensure clog lookups are safe > and limit by oldest xid, wheras the walsender doesn't actually care > about that and is just avoiding wrapped xids. > > I'm just going back to how it was, all in adt/txid.c, and making it > static again. We can move it and make it non-static if a need to do so > comes up. > > Attached rebased patch updated and vs master. You appear to have attached the wrong patch set. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 16 September 2016 at 21:28, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Sep 15, 2016 at 8:52 PM, Craig Ringer <craig@2ndquadrant.com> wrote: >> On 2 September 2016 at 23:29, Petr Jelinek <petr@2ndquadrant.com> wrote: >> >>> You could put it to txid.c where all the other txid stuff is in? >> >> Yeah, even though it's in adt/ I think it'll do. >> >> I thought I'd need get_xid_in_recent_past() for catalog_xmin hot >> standby feedback, but upon closer examination the needed logic isn't >> the same anymore. txid_status() wants to ensure clog lookups are safe >> and limit by oldest xid, wheras the walsender doesn't actually care >> about that and is just avoiding wrapped xids. >> >> I'm just going back to how it was, all in adt/txid.c, and making it >> static again. We can move it and make it non-static if a need to do so >> comes up. >> >> Attached rebased patch updated and vs master. > > You appear to have attached the wrong patch set. Whoops, multitasking fail. Sorry for the late response, hospitals are "fun". -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On Mon, Sep 19, 2016 at 9:54 PM, Craig Ringer <craig@2ndquadrant.com> wrote: >> You appear to have attached the wrong patch set. > > Whoops, multitasking fail. > > Sorry for the late response, hospitals are "fun". I did some cleanup of 0001 (see attached) and was all set to commit it when I realized what I think is a problem: holding XidGenLock doesn't seem to help with the race condition between this function and CLOG truncation, because vac_truncate_clog() updates the shared memory limits AFTER performing the truncation. If the order of those operations were reversed, we'd be fine, because it would get stuck trying to update the shared memory limits and wouldn't be able to truncate until it did - and conversely, if it updated the shared memory limits before we examined them, that would be OK, too, because we'd be sure not to consult the pages that are about to be truncated. As it is, though, I don't see that there's any real interlock here. (BTW, the stuff you moved from clog.c to clog.h doesn't actually need to be moved; one of the changes I made here was to undo that.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On 20 September 2016 at 22:46, Robert Haas <robertmhaas@gmail.com> wrote: > I did some cleanup of 0001 (see attached) and was all set to commit it > when I realized what I think is a problem: holding XidGenLock doesn't > seem to help with the race condition between this function and CLOG > truncation, because vac_truncate_clog() updates the shared memory > limits AFTER performing the truncation. Thanks ... and drat. > If the order of those > operations were reversed, we'd be fine, because it would get stuck > trying to update the shared memory limits and wouldn't be able to > truncate until it did - and conversely, if it updated the shared > memory limits before we examined them, that would be OK, too, because > we'd be sure not to consult the pages that are about to be truncated. I'm hesitant to mess with something that fundamental for what I was hoping was a low-impact feature, albeit one that seems to be trying hard not to be at every turn. It looks pretty reasonable to update oldestXid before clog truncation but I don't want to be wrong and create some obscure race or crash recovery issue related to wraparound and clog truncation. It could well be a problem if we're very close to wraparound. So far nothing has had any reason to care about this, since there's no way to attempt to access an older xid in a normally functioning system. The commit timestamp lookup code doesn't care whether the xid is still in clog or not and nothing else does lookups based on xids supplied by the user. If anything else did or does in future it will have the same problem as txid_status. We've already ruled out releasing the slru LWLock in SlruReportIOError then PG_TRY / PG_CATCH()ing clog access errors in txid_status() per my original approach to this issue. So I see a few options now: * Do nothing. Permit this race to exist, document the error, and expect apps to cope. I'm pretty tempted to go for exactly this since it pushes the cost onto users of the feature and doesn't make a mess elsewhere. People who use this will typically be invoking it as a standalone toplevel function anyway, so it's mostly just a bit of noise in the error logs if you lose the race - and we have plenty of other sources of that already. * Rather than calling TransactionIdDidCommit / TransactionIdDidAbort, call clog.c's TransactionIdGetStatus with a new missing_ok flag. That sets a bool* missing param added to SimpleLruReadPage_ReadOnly(...) and in turn to SimpleLruReadPage(...) that's set instead of calling SlruReportIOError(...). This seems rather intrusive and will add little-used params and paths to fairly hot slru.c code so I'm not keen. * Take CLogControlLock LW_SHARED in txid_status() to prevent truncation before reading oldestXid. We'd need a way to pass an "already locked" state through TransactionIdGetStatus(...) to SimpleLruReadPage_ReadOnly(...), which isn't great since again it's pretty hot code. * Don't release the slru LWLock in SlruReportIOError; instead release CLogControlLock from txid_status on clog access failure. As before means PG_TRY / PG_CATCH without PG_RE_THROW, but it means the only place it affects is callers of txid_status(...). For added safety, restrict txid_status() to being called in a toplevel virtual xact so we know we'll finish up promptly. It's a horrible layering violation having txid_status(...) release the clog control lock though, and seems risky. The only non-horrible one of those is IMO to just let the caller see an error if they lose the race. It's a function that's intended for use when you're dealing with indeterminate transaction state after a server or application error anyway, so it's part of an error path already. So I say we just document the behaviour. Because slru.c doesn't release its LWLock on error we also need to ensure txid_status(...) is also only called from a toplevel xact so the user doesn't attempt to wrap it in plpgsql BEGIN ... EXCEPTION block and it causes the xact to abort. Will follow up with just that. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Sep 21, 2016 at 3:40 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > The only non-horrible one of those is IMO to just let the caller see > an error if they lose the race. It's a function that's intended for > use when you're dealing with indeterminate transaction state after a > server or application error anyway, so it's part of an error path > already. So I say we just document the behaviour. I am not keen on that idea. The errors we're likely to be exposing are going to look like low-level internal failures, which might scare some DBA. Even if they don't, I think it's playing with fire. The system is designed on the assumption that nobody will try to look up an XID that's too old, and if we start to violate that assumption I think we're undermining the design integrity of the system in a way we'll likely come to regret. To put that more plainly, when the code is written with the assumption that X will never happen, it's usually a bad idea to casually add code that does X. > Because slru.c > doesn't release its LWLock on error we also need to ensure > txid_status(...) is also only called from a toplevel xact so the user > doesn't attempt to wrap it in plpgsql BEGIN ... EXCEPTION block and it > causes the xact to abort. I think this is muddled, because an error aborts the transaction, and AbortTransaction() and AbortSubTransaction() start with LWLockReleaseAll(). It might not be too hard to add a second copy of oldestXid in shared memory that is updated before truncation rather than afterward... but yeah, like you, I'm not finding this nearly as straightforward as might have been hoped. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 21 September 2016 at 22:16, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Sep 21, 2016 at 3:40 AM, Craig Ringer <craig@2ndquadrant.com> wrote: >> The only non-horrible one of those is IMO to just let the caller see >> an error if they lose the race. It's a function that's intended for >> use when you're dealing with indeterminate transaction state after a >> server or application error anyway, so it's part of an error path >> already. So I say we just document the behaviour. > > I am not keen on that idea. The errors we're likely to be exposing > are going to look like low-level internal failures, which might scare > some DBA. Even if they don't, I think it's playing with fire. The > system is designed on the assumption that nobody will try to look up > an XID that's too old, and if we start to violate that assumption I > think we're undermining the design integrity of the system in a way > we'll likely come to regret. To put that more plainly, when the code > is written with the assumption that X will never happen, it's usually > a bad idea to casually add code that does X. Fair point. > [snip] > > It might not be too hard to add a second copy of oldestXid in shared > memory that is updated before truncation rather than afterward... but > yeah, like you, I'm not finding this nearly as straightforward as > might have been hoped. Yeah. I suspect that'll be the way to go, to add another copy that's updated before clog truncation. It just seems ... unclean. Like it shouldn't be necessary, something else isn't right. But it's probably the lowest pain option. I'm going to take a step back on this and see if I can spot an alternative. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 27 September 2016 at 09:23, Craig Ringer <craig@2ndquadrant.com> wrote: > On 21 September 2016 at 22:16, Robert Haas <robertmhaas@gmail.com> wrote: >> >> It might not be too hard to add a second copy of oldestXid in shared >> memory that is updated before truncation rather than afterward... but >> yeah, like you, I'm not finding this nearly as straightforward as >> might have been hoped. > > Yeah. > > I suspect that'll be the way to go, to add another copy that's updated > before clog truncation. It just seems ... unclean. Like it shouldn't > be necessary, something else isn't right. But it's probably the lowest > pain option. OK, so summarizing the problem: slru.c and clog.c have no soft-failure entrypoints where we can look it up and fail gracefully if the xid isn't in the clog. vac_truncate_clog() calls TruncateCLOG to chop SLRU pages from the clog before it takes XidGenLock to advance oldestXid. So we cannot use oldestXid protected by XidGenLock as an interlock against concurrent clog truncation to prevent SLRU access errors looking up an xid, and there's no other candidate lock. This hasn't been an issue before because nobody looks up arbitrary user-supplied XIDs in clog, we only look up things that're protected by datfrozenxid etc. But the whole point of txid_status is to look up user-supplied xids. We can't just take ClogControlLock from txid_status() to block truncation because clog.c expects to own that lock, and takes it (via slru.c) in TransactionIdGetStatus, with no way to pass an already_locked state. We'd self-deadlock. Adding a second copy of oldestXid - say pendingOldestXid - won't actually help us unless we also take some suitable LWLock before updating it, otherwise truncation can continue after we look at pendingOldestXid but before we do the clog lookup. That means an extra LWLock for each clog truncation, but compared to the I/O done during clog truncation and the cost of the SlruScanDirectory() pre-check done by TruncateCLOG it's nothing. We could take XidGenLock twice, once to update this new pendingOldestXid field and once to update oldestXid, but that's a highly contested lock I'd rather not mess with even on a path that's not hit much. Instead, I've added a new LWLock, ClogTruncationLock, for that purpose. vac_truncate_clog() takes it if it decides to attempt clog truncation. This lock is held throughout the whole process of clog truncation and oldestXid advance, so there's no need for a new pendingOldestXid field in ShmemVariableCache. We just take the lock then look at oldestXid, knowing that it's guaranteed to correspond to an existing clog page that won't go away while we're looking. ClogTruncationLock is utterly uncontested so it's going to have no meaningful impact compared to all the work we do scanning the clog to decide whether we're even going to try truncating it, etc. (BTW, it seems like a pity that lwlocknames.txt doesn't have comments on each lock. We could have src/backend/storage/lmgr/generate-lwlocknames.pl transform the # comments into /* comments */ on the generated header. Thoughts?) -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Wed, Dec 21, 2016 at 3:02 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > Instead, I've added a new LWLock, ClogTruncationLock, for that > purpose. vac_truncate_clog() takes it if it decides to attempt clog > truncation. This lock is held throughout the whole process of clog > truncation and oldestXid advance, so there's no need for a new > pendingOldestXid field in ShmemVariableCache. We just take the lock > then look at oldestXid, knowing that it's guaranteed to correspond to > an existing clog page that won't go away while we're looking. > ClogTruncationLock is utterly uncontested so it's going to have no > meaningful impact compared to all the work we do scanning the clog to > decide whether we're even going to try truncating it, etc. That makes everything that happens between when we acquire that lock and when we release it non-interruptible, which seems undesirable. I think that extra copy of oldestXid is a nicer approach. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 22 December 2016 at 00:30, Robert Haas <robertmhaas@gmail.com> wrote: > That makes everything that happens between when we acquire that lock > and when we release it non-interruptible, which seems undesirable. I > think that extra copy of oldestXid is a nicer approach. That's a side-effect I didn't realise. Given that, yes, I agree. Since we don't truncate clog much, do you think it's reasonable to just take XidGenLock again before we proceed? I'm reluctant to add another acquisition of a frequently contested lock for something 99.9% of the codebase won't care about, so I think it's probably better to add a new LWLock, and I'll resubmit on that basis, but figure it's worth asking. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 22 December 2016 at 07:49, Craig Ringer <craig@2ndquadrant.com> wrote: > On 22 December 2016 at 00:30, Robert Haas <robertmhaas@gmail.com> wrote: > >> That makes everything that happens between when we acquire that lock >> and when we release it non-interruptible, which seems undesirable. I >> think that extra copy of oldestXid is a nicer approach. > > That's a side-effect I didn't realise. Given that, yes, I agree. > > Since we don't truncate clog much, do you think it's reasonable to > just take XidGenLock again before we proceed? I'm reluctant to add > another acquisition of a frequently contested lock for something 99.9% > of the codebase won't care about, so I think it's probably better to > add a new LWLock, and I'll resubmit on that basis, but figure it's > worth asking. Updated. If you think it's better to just take XidGenLock again, let me know. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On 22 December 2016 at 09:55, Craig Ringer <craig@2ndquadrant.com> wrote: > Updated. > > If you think it's better to just take XidGenLock again, let me know. Here's a further update that merges in Robert's changes from the patch posted upthread. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Thu, Dec 22, 2016 at 12:12 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 22 December 2016 at 09:55, Craig Ringer <craig@2ndquadrant.com> wrote: > >> Updated. >> >> If you think it's better to just take XidGenLock again, let me know. > > Here's a further update that merges in Robert's changes from the patch > posted upthread. This patch contains unresolved merge conflicts. Maybe SetPendingTransactionIdLimit could happen in TruncateCLOG rather than the caller. Instead of using an LWLock, how about a spin lock? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 23 December 2016 at 01:26, Robert Haas <robertmhaas@gmail.com> wrote: > This patch contains unresolved merge conflicts. Ah, it conflicts with fe0a0b59, the PostmasterRandom changes. I thought I'd rebased more recently than that. > Maybe SetPendingTransactionIdLimit could happen in TruncateCLOG rather than > the caller. I've spent a while looking at how committs and multixact handle the race hazard of the gap between clog truncation and shmem state change - and the related race on standby, where the hazard is the gap between replay of CLOG_TRUNCATE records and replay of the next checkpoint containing updated limits. pg_xact_commit_timestamp() and the underlying TransactionIdGetCommitTsData are supposed to accept arbitrary xids, like txid_status(), so they should handle this. However, as far as I can tell, committs has the same issues presented by txid_status(). We only update ShmemVariableCache->oldestCommitTsXid _after_ we truncate the committs SLRU, no lock is held over the duration, and no other lock-protected var is set before truncation. We don't record the new limit values in COMMIT_TS_TRUNCATE records so they only becomes known to the standby at replay of the next checkpoint. multixact instead runs a single critical section to write xlog, set shmem state, and _then_ truncate the SLRUs. It's more complex than clog or commit_ts since it has two inter-related SLRUs, though. So: I think we should be setting ShmemVariableCache->oldestXid (under XidGenLock) from TruncateCLOG, after we write xlog but _before_ truncation. Then the rest of SetTransactionIdLimit(...) proceeds as before. I don't _think_ we need the critical section since we only have the one SLRU to worry about. We should also add oldestXid to CLOG_TRUNCATE xlog records so it is up to date on standbys. It's a little annoying to take XidGenLock twice - once for oldestXid, once for the other xid limits - but we can just delay advancing oldestXid entirely until we've got a clog page to truncate away, in which case there's a big enough cost that it doesn't matter. We have to wait to advance the other limits until after we truncate clog to prevent new (wrapped) xids trying to set values in clog for the before-wrap xids we're about to truncate away. On the upside, the need for pendingOldestXid, which always felt hacky, goes away. While we're at it, lets do the same thing for commit_ts. Advance its limit before truncation, not after, and record that limit in its xlog records for redo. There's no need for any special dancing around there. > Instead of using an LWLock, how about a spin lock? I wrote an explanation of how that wouldn't work, but then I found the problem with standby, so it no longer matters. I'll have to follow up with a patch soon, as it's Toddler o'Clock. (It's interesting that most of what I'm doing at the moment is wrestling with resource retention limits and how they're often quite implicit. All the catalog_xmin stuff for logical decoding on standby has parallels to this, for example, though not enough to create useful overlap of functionality.) -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 23 December 2016 at 18:00, Craig Ringer <craig@2ndquadrant.com> wrote: > I'll have to follow up with a patch soon, as it's Toddler o'Clock. Here we go. This patch advances oldestXid, under XidGenLock, _before_ truncating clog. txid_status() holds XidGenLock from when it tests oldestXid until it's done looking up clog, thus eliminating the race. CLOG_TRUNCATE records now contain the oldestXid, so they can advance oldestXid on a standby, or when we've truncated clog since the most recent checkpoint on the master during recovery. It's advanced under XidGenLock during redo to protect against this race on standby. As outlined in my prior mail I think this is the right approach. I don't like taking XidGenLock twice, but we don't advance datfrozenxid much so it's not a big concern. While a separate ClogTruncationLock could be added like in my earlier patch, oldestXid is currently under XidGenLock and I'd rather not change that. The biggest change here is that oldestXid is advanced separately to the vac limits in the rest of ShmemVariableCache. As far as I can tell we don't prevent two manual VACUUMs on different DBs from trying to concurrently run vac_truncate_clog, so this has to be safe against two invocations racing each other. Rather than try to lock out such concurrency, the patch ensures that oldestXid can never go backwards. It doesn't really matter if the vac limits go backwards, it's no worse than what can already happen in the current code. We cannot advance the vacuum limits before we truncate the clog away, in case someone tries to access a very new xid (if we're near wraparound) I'm pretty sure that commit timestamps suffer from the same flaw as Robert identified upthread with clog. This patch fixes the clog race, but not the similar one in commit timestamps. Unlike the clog race with txid_status(), the commit timestamps one is already potentially user-visible since we allow arbitrary xids to be looked up for commit timestamps. I'll address that separately. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On 28 December 2016 at 18:00, Craig Ringer <craig@2ndquadrant.com> wrote: > On 23 December 2016 at 18:00, Craig Ringer <craig@2ndquadrant.com> wrote: > >> I'll have to follow up with a patch soon, as it's Toddler o'Clock. > > Here we go. > > This patch advances oldestXid, under XidGenLock, _before_ truncating clog. > > txid_status() holds XidGenLock from when it tests oldestXid until it's > done looking up clog, thus eliminating the race. > > CLOG_TRUNCATE records now contain the oldestXid, so they can advance > oldestXid on a standby, or when we've truncated clog since the most > recent checkpoint on the master during recovery. It's advanced under > XidGenLock during redo to protect against this race on standby. > > As outlined in my prior mail I think this is the right approach. I > don't like taking XidGenLock twice, but we don't advance datfrozenxid > much so it's not a big concern. While a separate ClogTruncationLock > could be added like in my earlier patch, oldestXid is currently under > XidGenLock and I'd rather not change that. > > The biggest change here is that oldestXid is advanced separately to > the vac limits in the rest of ShmemVariableCache. As far as I can tell > we don't prevent two manual VACUUMs on different DBs from trying to > concurrently run vac_truncate_clog, so this has to be safe against two > invocations racing each other. Rather than try to lock out such > concurrency, the patch ensures that oldestXid can never go backwards. > It doesn't really matter if the vac limits go backwards, it's no worse > than what can already happen in the current code. > > We cannot advance the vacuum limits before we truncate the clog away, > in case someone tries to access a very new xid (if we're near > wraparound) > > I'm pretty sure that commit timestamps suffer from the same flaw as > Robert identified upthread with clog. This patch fixes the clog race, > but not the similar one in commit timestamps. Unlike the clog race > with txid_status(), the commit timestamps one is already potentially > user-visible since we allow arbitrary xids to be looked up for commit > timestamps. I'll address that separately. Rebased patch attached. I've split the clog changes out from txid_status() its self. There is relevant discussion on the commit timestamp truncation fix thread where the similar fix for commit_ts got committed. https://www.postgresql.org/message-id/flat/979ff13d-0b8e-4937-01e8-2925c0adc306%402ndquadrant.com#979ff13d-0b8e-4937-01e8-2925c0adc306@2ndquadrant.com -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Mon, Jan 23, 2017 at 1:32 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > Rebased patch attached. I've split the clog changes out from > txid_status() its self. I think it's fairly surprising that TruncateCLOG() has responsibility for writing the xlog record that protects advancing ShmemVariableCache->oldestXid, but not the responsibility for actually advancing that value. In other words, I think the AdvanceOldestXid() call in vac_truncate_clog() should be moved into TruncateClog(). (Similarly, one wonders why AdvanceOldestCommitTsXid() isn't the responsibility of TruncateCommitTs().) I think it is not correct to advance oldestXid but not oldestXidDB. Otherwise, GetNewTransactionId() might complain about the wrong database. The way that SetTransactionIdLimit() now works looks a bit dangerous. xidWrapLimit, xidStopLimit, and xidWarnLimit are computed based on the passed-in oldestXid value and written straight into shared memory. But the shared memory copy of oldestXid could have a different value. I'm not sure if that breaks anything, but it certainly weakens any confidence callers might have had that all those values are consistent with each other. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 24 January 2017 at 23:49, Robert Haas <robertmhaas@gmail.com> wrote: > I think it's fairly surprising that TruncateCLOG() has responsibility > for writing the xlog record that protects advancing > ShmemVariableCache->oldestXid, but not the responsibility for actually > advancing that value. In other words, I think the AdvanceOldestXid() > call in vac_truncate_clog() should be moved into TruncateClog(). > (Similarly, one wonders why AdvanceOldestCommitTsXid() isn't the > responsibility of TruncateCommitTs().) Agreed, and done in attached. I haven't done the same for commit_ts but will do so on the thread for the commit_ts race fix if we go ahead with this change here. > I think it is not correct to advance oldestXid but not oldestXidDB. > Otherwise, GetNewTransactionId() might complain about the wrong > database. That's a clear oversight. Fixed in attached. > The way that SetTransactionIdLimit() now works looks a bit dangerous. > xidWrapLimit, xidStopLimit, and xidWarnLimit are computed based on the > passed-in oldestXid value and written straight into shared memory. > But the shared memory copy of oldestXid could have a different value. > I'm not sure if that breaks anything, but it certainly weakens any > confidence callers might have had that all those values are consistent > with each other. This was my main hesitation with the whole thing too. It's necessary to advance oldestXmin before we xlog the advance and truncate clog, and necessary to advance the vacuum limits only afterwards. I thought it sensible to be conservative about the xid limit to minimise the change from current behaviour, i.e. advance only up to xid limits calculated from the oldestXid determined by the currently vac_truncate_clog. But the current one is actually wrong; in the (unlikely if it's possible at all) case where two vac_truncate_clog()s A and B run with ordering A(advanceOldestXmin) B(advanceOldestXmin) A(truncate) B(truncate) B(advanceLimits) A(advanceLimits), A's advance of the limits would clobber b's. Not too bad, but it'd delay advancing the limits until the next vacuum, and some xid could get allocated before the limits go backwards. It's safer to take XidGenLock for slightly longer in order to capture the oldestXid from shmem and calculate using it. That ensures we never have any risk of going backwards. The attached updated patch does so. BTW, I find it quite amusing that this was intended to be a small, unintrusive patch, and now it's messing with xid limits and clog truncation. I'm almost tempted to say we should commit it with the (tiny) race with clog truncation in place. We certainly haven't had any complaints about the same race from people using commit_ts. But the race window on standby is quite large and I'd rather not knowingly introduce new bugs. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On 25 January 2017 at 13:44, Craig Ringer <craig@2ndquadrant.com> wrote: > On 24 January 2017 at 23:49, Robert Haas <robertmhaas@gmail.com> wrote: > >> I think it's fairly surprising that TruncateCLOG() has responsibility >> for writing the xlog record that protects advancing >> ShmemVariableCache->oldestXid, but not the responsibility for actually >> advancing that value. In other words, I think the AdvanceOldestXid() >> call in vac_truncate_clog() should be moved into TruncateClog(). >> (Similarly, one wonders why AdvanceOldestCommitTsXid() isn't the >> responsibility of TruncateCommitTs().) > > Agreed, and done in attached. > > I haven't done the same for commit_ts but will do so on the thread for > the commit_ts race fix if we go ahead with this change here. > >> I think it is not correct to advance oldestXid but not oldestXidDB. >> Otherwise, GetNewTransactionId() might complain about the wrong >> database. > > That's a clear oversight. Fixed in attached. New attached also records it in xlog and applies it to the standby. If we want to save the 4 bytes per xmin advance (probably not worth caring) we can instead skip setting it on the standby, in which case it'll be potentially wrong until the next checkpoint. I'd rather make sure it stays correct. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Wed, Jan 25, 2017 at 4:02 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > If we want to save the 4 bytes per xmin advance (probably not worth > caring) we can instead skip setting it on the standby, in which case > it'll be potentially wrong until the next checkpoint. I'd rather make > sure it stays correct. Those patches still apply and no reviews have come in yet, so moved to CF 2017-03. -- Michael
On Wed, Jan 25, 2017 at 12:44 AM, Craig Ringer <craig@2ndquadrant.com> wrote: >> The way that SetTransactionIdLimit() now works looks a bit dangerous. >> xidWrapLimit, xidStopLimit, and xidWarnLimit are computed based on the >> passed-in oldestXid value and written straight into shared memory. >> But the shared memory copy of oldestXid could have a different value. >> I'm not sure if that breaks anything, but it certainly weakens any >> confidence callers might have had that all those values are consistent >> with each other. > > This was my main hesitation with the whole thing too. > > It's necessary to advance oldestXmin before we xlog the advance and > truncate clog, and necessary to advance the vacuum limits only > afterwards. Well, that's why I tried to advocate that their should be two separate XID limits in shared memory: leave what's there now the way it is, and then add a new limit for "don't try to look up XIDs before this point: splat". I still think that'd be less risky. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 10 March 2017 at 02:55, Robert Haas <robertmhaas@gmail.com> wrote: > Well, that's why I tried to advocate that their should be two separate > XID limits in shared memory: leave what's there now the way it is, and > then add a new limit for "don't try to look up XIDs before this point: > splat". I still think that'd be less risky. I'm coming back to this "cold" after an extended break, so I hope I get the details right. TL;DR: doing it that way duplicates a bunch of stuff and is ugly without offering significant benefit over just fixing the original. I started out with the approach you suggested, but it turns out to be less helpful than you'd think. Simply advancing a new lower limit field before beginning truncation is no help; there's then a race where the lower-limit field can be advanced after we test it but before we actually do the SLRU read from clog. To guard against this it's necessary for SLRU truncation to take an exclusive lock during which it advances the lower bound. That way a concurrent reader can take the lock in shared mode before reading the lower bound and hold it until it finishes the clog read, knowing that vacuum cannot then truncate the data out from under it because it'll block trying to advance the lower limit. A spinlock isn't suitable for this. While we can take the lock only briefly to update the limit field in vac_truncate_clog, in txid_status() we have to hold it from when we test the boundary through to when we finish the SLRU clog lookup, and that lookup does I/O and might sleep. If we release it after testing the lower bound but before the SLRU lookup our race comes back since vacuum can jump in and truncate it out from under us. So now we need a new LWLock used only for vac_truncate_clog before advancing the clog truncation. I implemented just that - a new ClogTruncateLog in the lwlocks array and a new field in ShmemVariableCache for the lower xid bound, IIRC. Other than requiring an extra lwlock acquisition for vac_truncate_clog it works fine ... for the master. But it doesn't fix the much bigger race on the standby. We only emit WAL for xid limits after we truncate clog, and the clog truncation record doesn't record the new limit. So now we need a new, somewhat redundant, xlog record and redo function for this lower clog bound pointer. Which, really, is only actually tracking a slightly more up to date version of oldestXid. At that point I was just papering around a race that should just be fixed at its source instead. Advance oldestXid before truncating clog, and write a clog truncation record that includes the new oldestXid. So... I can go back to the old approach and just add the new xlog record and redo method, new lwlock, new shmemvariablecache field, etc, if you're really concerned this approach is risky. But I'd rather fix the original problem instead. It might be helpful if I separate out the patch that touches oldestXid from the rest for separate review, so I'll update here with a 2-patch series instead of a squashed single patch soon. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Mar 10, 2017 at 2:00 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 10 March 2017 at 02:55, Robert Haas <robertmhaas@gmail.com> wrote: >> Well, that's why I tried to advocate that their should be two separate >> XID limits in shared memory: leave what's there now the way it is, and >> then add a new limit for "don't try to look up XIDs before this point: >> splat". I still think that'd be less risky. > > I'm coming back to this "cold" after an extended break, so I hope I > get the details right. Yeah, sorry I've been away from this for a while. > TL;DR: doing it that way duplicates a bunch of stuff and is ugly > without offering significant benefit over just fixing the original. > > I started out with the approach you suggested, but it turns out to be > less helpful than you'd think. Simply advancing a new lower limit > field before beginning truncation is no help; there's then a race > where the lower-limit field can be advanced after we test it but > before we actually do the SLRU read from clog. To guard against this > it's necessary for SLRU truncation to take an exclusive lock during > which it advances the lower bound. That way a concurrent reader can > take the lock in shared mode before reading the lower bound and hold > it until it finishes the clog read, knowing that vacuum cannot then > truncate the data out from under it because it'll block trying to > advance the lower limit. That's a good point which I hadn't fully considered. On the other hand, there really are two separate notions of the "oldest" XID. There's the oldest XID that we can safely look up, and then there's the oldest XID that we can't reuse. These two are the same when no truncation is in progress, but when a truncation is in progress then they're different: the oldest XID that's safe to look up is the first one after whatever we're truncating away, but the oldest XID that we can't reuse is the newest one preceding the stuff that we're busy truncating. IOW, when truncation is happening, there's a portion of the XID space whose clog files are being removed - and the XIDs that are in that range aren't safe to look up any more, but are also not available for reuse to prevent wraparound. Right now, all of the relevant fields in VariableCacheData are based on the ready-for-reuse concept, and I don't think that switching some but not all of them to be based on the safe-to-look-up concept necessarily qualifies as an improvement. It's different, but I'm not sure it's better. What if we approached this problem from the other end? Suppose we use a heavyweight lock on, say, transaction ID 1 to represent the right to truncate CLOG. We grab this lock in exclusive mode before beginning to truncate, and in shared mode while looking up XIDs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11 March 2017 at 05:09, Robert Haas <robertmhaas@gmail.com> wrote: > On the other > hand, there really are two separate notions of the "oldest" XID. > There's the oldest XID that we can safely look up, and then there's > the oldest XID that we can't reuse. These two are the same when no > truncation is in progress, but when a truncation is in progress then > they're different: the oldest XID that's safe to look up is the first > one after whatever we're truncating away, but the oldest XID that we > can't reuse is the newest one preceding the stuff that we're busy > truncating. Right. My view here is that the oldest xid we cannot reuse is already guarded by xidWrapLimit, which we advance after clog truncation. Whether as this advances at the same time as or after we advance oldestXid and truncate clog doesn't actually matter, we must just ensure that it never advances _before_. So tracking a second copy of oldestXid whose only purpose is to recalculate xidWrapLimit serves no real purpose. It's redundant except during vac_truncate_clog, during which time local state is sufficient *if* we add oldestXid to the clog truncation xlog record, which we must do anyway because: Any number of locking hoop-jumping schemes fail to solve the problem of outdated oldestXid information on standbys. Right now we truncate clog and xlog the truncation before we write the new oldestXid limit to xlog. In fact, we don't write the new xid limit to xlog until the next checkpoint. So the standby has a huge window where its idea of oldestXid is completely wrong, and unless we at least add the new oldestXid to the clog truncation xlog record we can't fix that. We only get away with this now because there's no way to look up an arbitrary xid's status. No locking scheme on the master can solve this, because the locks on the master do not affect the standby or vice versa. Therefore, we _must_ advance oldestXid (or a copy of it used only for "oldest xid still in clog) before truncating clog. If we're going to do that we might as well just make sure the standby's xid limits are updated correctly when we truncate clog rather than doing it lazily at checkpoints. Advance oldestXid before truncating clog away, and record the new xid in the clog truncation xlog record. On redo after master crash, and on standbys, we're guaranteed to re-do the whole clog truncation operation - advance oldestXid, truncate clog, advance xidWrapLimit etc - and everything stays consistent. I'll extract this part of the patch so it can be looked at separately, it'll be clearer that way. I think of it as slightly contracting then slightly expanding the xid range window during clog truncation. Advance the oldest xid slightly before the xidWrapLimit, so temporarily the range of xids is narrower than 2^31. xlog it first so we ensure it's all redone on crash and on standby. Because no lock is held throughout all of vac_truncate_clog, make sure the ordering of the different phases between concurrent vac_truncate_xlog runs doesn't matter. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 11 March 2017 at 14:32, Craig Ringer <craig@2ndquadrant.com> wrote: > I'll extract this part of the patch so it can be looked at separately, > it'll be clearer that way. Apparently I thought that last time too since I already posted it split up. Ahem. Working on too many different things at once. Last-posted patches apply fine, no need for an update. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Mar 11, 2017 at 1:32 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 11 March 2017 at 05:09, Robert Haas <robertmhaas@gmail.com> wrote: >> On the other >> hand, there really are two separate notions of the "oldest" XID. >> There's the oldest XID that we can safely look up, and then there's >> the oldest XID that we can't reuse. These two are the same when no >> truncation is in progress, but when a truncation is in progress then >> they're different: the oldest XID that's safe to look up is the first >> one after whatever we're truncating away, but the oldest XID that we >> can't reuse is the newest one preceding the stuff that we're busy >> truncating. > > Right. > > My view here is that the oldest xid we cannot reuse is already guarded > by xidWrapLimit, which we advance after clog truncation. Whether as > this advances at the same time as or after we advance oldestXid and > truncate clog doesn't actually matter, we must just ensure that it > never advances _before_. > > So tracking a second copy of oldestXid whose only purpose is to > recalculate xidWrapLimit serves no real purpose. Hmm, so what this patch is doing changed quite a bit between January 23rd and January 25th. In the January 23rd version, oldestXid and oldestXidDB are changed to track the oldest XID that we can safely look up, and the remaining related fields are still relative to the oldest XID that can be reused. That seems, as I said before, scary. But in the January 25th version, *all* of the related fields have been changed to track the oldest XID that we can safely look up, because SetTransactionIdLimit() now uses the values set by AdvanceOldestXid() to compute all of the other values, which seems flat-out incorrect. AdvanceOldestXid() is called to advance that limit before clog truncation happens, and if somebody then calls SetTransactionIdLimit() before clog truncation is complete, we'll advanced those derived limits prematurely. For example, suppose vacuum #1 comes along, advances the limits, truncates clog, and then gets descheduled. Now vacuum #2 comes along, advances the limits further, and then gets descheduled. Now vacuum #1 wakes up and calls SetTransactionIdLimit() and prematurely advances xidWrapLimit. Oops. The way you put it is that having a second copy of oldestXid whose only purpose is to recompute xidWrapLimit is pointless, but the way I'd say is that you're trying to make one variable do two jobs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 14 March 2017 at 05:43, Robert Haas <robertmhaas@gmail.com> wrote: > For example, suppose vacuum #1 comes along, advances the limits, > truncates clog, and then gets descheduled. Now vacuum #2 comes along, > advances the limits further, and then gets descheduled. Now vacuum #1 > wakes up and calls SetTransactionIdLimit() and prematurely advances > xidWrapLimit. Oops. Mm, right. And without a lock held from when oldestXid advances through to completion of clog truncation, then taking the same lock in SetTransactionIdLimit, there's not really a way around it. I'm embarrassed not to have seen that. Doing things the other way around, per the earlier patch, can cause SetTransactionIdLimit to not to advance as far as it should. OK, I'm convinced, a new field is safer, even if it's redundant most of the time. I'll introduce a new LWLock, ClogTruncationLock, which will be held from when we advance the new clogOldestXid field through to when clog truncation completes. Most of the rest can stay the same. In particular, the expanded xlog record for clog truncation will still be required. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Mar 13, 2017 at 10:22 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > I'll introduce a new LWLock, ClogTruncationLock, which will be held > from when we advance the new clogOldestXid field through to when clog > truncation completes. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 14 March 2017 at 19:57, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Mar 13, 2017 at 10:22 PM, Craig Ringer <craig@2ndquadrant.com> wrote: >> I'll introduce a new LWLock, ClogTruncationLock, which will be held >> from when we advance the new clogOldestXid field through to when clog >> truncation completes. > > +1. OK, here's the revised patch. Relevant comments added where needed. It still changes the xlog record for clog truncation to carry the new xid, but I've left advancing oldestXid until the checkpoint as is currently the case, and only advanced oldestClogXid when we replay clog truncation. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Mon, Mar 20, 2017 at 1:38 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 14 March 2017 at 19:57, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Mar 13, 2017 at 10:22 PM, Craig Ringer <craig@2ndquadrant.com> wrote: >>> I'll introduce a new LWLock, ClogTruncationLock, which will be held >>> from when we advance the new clogOldestXid field through to when clog >>> truncation completes. >> >> +1. > > OK, here's the revised patch. Relevant comments added where needed. > > It still changes the xlog record for clog truncation to carry the new > xid, but I've left advancing oldestXid until the checkpoint as is > currently the case, and only advanced oldestClogXid when we replay > clog truncation. /me smacks forehead. Actually, it should be CLogTruncationLock, with a capital L, for consistency with CLogControlLock. The new lock needs to be added to the table in monitoring.sgml. I don't think the new header comments in TransactionIdDidCommit and TransactionIdDidAbort are super-clear. I'm not sure you're going to be able to explain it there in a reasonable number of words, but I think that speaking of "testing against oldestClogXid" will leave people wondering what exactly that means. Maybe just write "caller is responsible for ensuring that the clog records covering XID being looked up can't be truncated away while the lookup is in progress", and then leave the bit about CLogTruncationLock to be explained by the callers that do that. Or you could drop these comments entirely. Overall, though, I think that 0001 looks far better than any previous iteration. It's simple. It looks safe. It seems unlikely to break anything that works now. Woo hoo! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 22 March 2017 at 01:49, Robert Haas <robertmhaas@gmail.com> wrote: > /me smacks forehead. Actually, it should be CLogTruncationLock, with > a capital L, for consistency with CLogControlLock. Will do. > The new lock needs to be added to the table in monitoring.sgml. Same. > I don't think the new header comments in TransactionIdDidCommit and > TransactionIdDidAbort are super-clear. I'm not sure you're going to > be able to explain it there in a reasonable number of words, but I > think that speaking of "testing against oldestClogXid" will leave > people wondering what exactly that means. Maybe just write "caller is > responsible for ensuring that the clog records covering XID being > looked up can't be truncated away while the lookup is in progress", > and then leave the bit about CLogTruncationLock to be explained by the > callers that do that. Or you could drop these comments entirely. OK. I'll revisit and see if I can clean it up, otherwise remove it. > Overall, though, I think that 0001 looks far better than any previous > iteration. It's simple. It looks safe. It seems unlikely to break > anything that works now. Woo hoo! Funny that this started with "hey, here's a simple, non-invasive function for looking up the status of an arbitrary xid". Mature, complex systems eh? -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 22 March 2017 at 09:49, Craig Ringer <craig@2ndquadrant.com> wrote: >> Overall, though, I think that 0001 looks far better than any previous >> iteration. It's simple. It looks safe. It seems unlikely to break >> anything that works now. Woo hoo! > > Funny that this started with "hey, here's a simple, non-invasive > function for looking up the status of an arbitrary xid". Changes made per discussion. Removed the comments on TransactionIdDidCommit and TransactionIdDidAbort . It's not going to be relevant for the immense majority of callers anyway, and callers that are looking up arbitrary user supplied XIDs will (hopefully) be looking at TransactionIdInRecentPast anyway. I'll be leaving the 'xid' vs 'bigint' issues elsewhere in Pg for next release, nowhere near time for that now. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On 22 March 2017 at 03:35, Craig Ringer <craig@2ndquadrant.com> wrote: > On 22 March 2017 at 09:49, Craig Ringer <craig@2ndquadrant.com> wrote: > >>> Overall, though, I think that 0001 looks far better than any previous >>> iteration. It's simple. It looks safe. It seems unlikely to break >>> anything that works now. Woo hoo! >> >> Funny that this started with "hey, here's a simple, non-invasive >> function for looking up the status of an arbitrary xid". > > Changes made per discussion. This looks good to me also. Does what we need it to do. I was a little worried by possible performance of new lock, but its not intended to be run frequently, so its OK. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Mar 22, 2017 at 3:13 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Changes made per discussion. > > This looks good to me also. Does what we need it to do. > > I was a little worried by possible performance of new lock, but its > not intended to be run frequently, so its OK. Agreed. Reviewing 0002: + if (!TransactionIdIsValid(xid)) + { + LWLockRelease(XidGenLock); + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("transaction ID " UINT64_FORMAT " is an invalid xid", + xid_with_epoch))); + } It's unnecessary to release LWLockRelease() before throwing an error, and it's also wrong because we haven't acquired XidGenLock in this code path. But maybe it would be better to just remove this entirely and instead have TransactionIdInRecentPast return false for InvalidTransactionId. Then we'd avoid adding a translatable message for a case that is basically harmless to allow. + if (TransactionIdIsCurrentTransactionId(xid)) + status = gettext_noop("in progress"); + else if (TransactionIdDidCommit(xid)) + status = gettext_noop("committed"); + else if (TransactionIdDidAbort(xid)) + status = gettext_noop("aborted"); + else + + /* + * can't test TransactionIdIsInProgress here or we race with + * concurrent commit/abort. There's no point anyway, since it + * might then commit/abort just after we check. + */ + status = gettext_noop("in progress"); I am not sure this is going to do the right thing for transactions that are aborted by a crash without managing to write an abort record. It seems that the first check will say the transaction isn't in progress, and the second and third checks will say it isn't either committed or aborted since, if I am reading this correctly, they just read clog directly. Compare HeapTupleSatisfiesMVCC, which assumes that a not-in-progress, not-committed transaction must necessarily have aborted. I think your comment is pointing to a real issue, though. It seems like what might be needed is to add one more check. Before where you have the "else" clause, check if the XID is old, e.g. precedes our snapshot's xmin. If so, it must be committed or aborted and, since it didn't commit, it aborted. If not, it must've changed from in progress to not-in-progress just as we were in the midst checking, so labeling it as in progress is fine. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 22 March 2017 at 17:41, Robert Haas <robertmhaas@gmail.com> wrote: > + if (TransactionIdIsCurrentTransactionId(xid)) > + status = gettext_noop("in progress"); > + else if (TransactionIdDidCommit(xid)) > + status = gettext_noop("committed"); > + else if (TransactionIdDidAbort(xid)) > + status = gettext_noop("aborted"); > + else > + > + /* > + * can't test TransactionIdIsInProgress here or we race with > + * concurrent commit/abort. There's no point anyway, since it > + * might then commit/abort just after we check. > + */ > + status = gettext_noop("in progress"); > > I am not sure this is going to do the right thing for transactions > that are aborted by a crash without managing to write an abort record. Yes, perhaps we should report that state as "aborted - incomplete". And of course, we might return "subcommitted" also, which could technically also be an abort in some cases, so we'd need to do a wait loop on that. Which makes me think it would be confusing to say "in progress" for when it is our current xid, since the user might wait until it is complete and then wait forever. Prefer it if it said "in progress - current transaction" -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Mar 22, 2017 at 2:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 22 March 2017 at 17:41, Robert Haas <robertmhaas@gmail.com> wrote: >> + if (TransactionIdIsCurrentTransactionId(xid)) >> + status = gettext_noop("in progress"); >> + else if (TransactionIdDidCommit(xid)) >> + status = gettext_noop("committed"); >> + else if (TransactionIdDidAbort(xid)) >> + status = gettext_noop("aborted"); >> + else >> + >> + /* >> + * can't test TransactionIdIsInProgress here or we race with >> + * concurrent commit/abort. There's no point anyway, since it >> + * might then commit/abort just after we check. >> + */ >> + status = gettext_noop("in progress"); >> >> I am not sure this is going to do the right thing for transactions >> that are aborted by a crash without managing to write an abort record. > > Yes, perhaps we should report that state as "aborted - incomplete". > > And of course, we might return "subcommitted" also, which could > technically also be an abort in some cases, so we'd need to do a wait > loop on that. I actually don't think those are things we should expose to users. They're internal implementation details. The user had better not care whether an abort was the type of abort that wrote an abort record or the type that didn't. > Which makes me think it would be confusing to say "in progress" for > when it is our current xid, since the user might wait until it is > complete and then wait forever. Prefer it if it said "in progress - > current transaction" Hmm, or just "current transaction", maybe? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 23 March 2017 at 02:08, Simon Riggs <simon@2ndquadrant.com> wrote: > And of course, we might return "subcommitted" also, which could > technically also be an abort in some cases, so we'd need to do a wait > loop on that. Users generally don't see subxact IDs, so it wasn't something I was overly concerned by. Most notably, txid_current() doesn't report them. Users who want to know the commit status of an xact that had a commit in-flight when they lost access to it due to server crash, network loss, etc aren't going to care about subxacts. If you lose your connection after a RELEASE SAVEPOINT you know the outer xact will get aborted or the state of the individual subxacts. They're visible in heap tuples, but you can only see uncommitted heap tuples from your own top-level xid. For anything else, if you can see the xid in xmin you know it committed. There isn't really any reason you'd be looking up tuple xids with txid_status anyway, and if you did you'd have to pay attention to mess like multixacts in xmax ... but you can't tell from the xid alone if it's a multixact or not, so this doesn't make sense with xids that could be multis anyway. If we're going to handle subxacts specially, we should probably report them as "sub-committed" if we find that the current xid is a committed subxact member of an outer xact that is still in-progress. But IMO it's pretty pointless since you won't be dealing with subxact IDs in the application anyway. > Which makes me think it would be confusing to say "in progress" for > when it is our current xid, since the user might wait until it is > complete and then wait forever. Prefer it if it said "in progress - > current transaction" I'm fine with "current transaction". Though I think it's kind of a moot point as I don't see any reason for an application to ever be doing the equivalent of txid_status(txid_current()) in the first place. But it's harmless enough. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 23 March 2017 at 01:41, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Mar 22, 2017 at 3:13 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> Changes made per discussion. >> >> This looks good to me also. Does what we need it to do. >> >> I was a little worried by possible performance of new lock, but its >> not intended to be run frequently, so its OK. > > Agreed. > > Reviewing 0002: > > + if (!TransactionIdIsValid(xid)) > + { > + LWLockRelease(XidGenLock); > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("transaction ID " UINT64_FORMAT " is an invalid xid", > + xid_with_epoch))); > + } > > It's unnecessary to release LWLockRelease() before throwing an error, > and it's also wrong because we haven't acquired XidGenLock in this > code path. But maybe it would be better to just remove this entirely > and instead have TransactionIdInRecentPast return false for > InvalidTransactionId. Then we'd avoid adding a translatable message > for a case that is basically harmless to allow. Agreed, that's better. > + if (TransactionIdIsCurrentTransactionId(xid)) > + status = gettext_noop("in progress"); > + else if (TransactionIdDidCommit(xid)) > + status = gettext_noop("committed"); > + else if (TransactionIdDidAbort(xid)) > + status = gettext_noop("aborted"); > + else > + > + /* > + * can't test TransactionIdIsInProgress here or we race with > + * concurrent commit/abort. There's no point anyway, since it > + * might then commit/abort just after we check. > + */ > + status = gettext_noop("in progress"); > > I am not sure this is going to do the right thing for transactions > that are aborted by a crash without managing to write an abort record. You're right. It'll report in-progress, since the clog entry will still be 0. We don't appear to explicitly update the clog during recovery. Funny, I got so focused on the clog access safety stuff in the end that I missed the bigger picture. Given that part of the point of this is xact status after crash, that's kind of important. I've added a TAP test for this, as part of a new test set, "011_crash_recovery.pl". The recovery tests don't really have much on crash behaviour at the moment so might as well start it and add more as we go. It doesn't make sense to add a whole new top level TAP test just for txid_status. (I *love* the TAP framework now, it took 10 mins to write a test that's trivial to repeat in 5 seconds per run for this). > It seems that the first check will say the transaction isn't in > progress, and the second and third checks will say it isn't either > committed or aborted since, if I am reading this correctly, they just > read clog directly. Right. They're not concerned with subxacts or multixacts. > Compare HeapTupleSatisfiesMVCC, which assumes > that a not-in-progress, not-committed transaction must necessarily > have aborted. Right. We don't have a HeapTuple here, of course, so we can't test flags. Most importantly this means we can't handle multixacts. If we ever did decide to expose multixacts as bigint epoch-qualified xids for general users, we'd probably reserve the high bit of the epoch the bigint xid representation for the equivalent of HEAP_XMAX_IS_MULTI, and maybe it's worth reserving that bit now and ERROR'ing if we find it set. But right now we never produce a bigint xid with a multixact. I wonder if a note in the docs warning not to cast xid to bigint is worth it. Probably not. You have to cast xid::text::bigint which is kind of a hint, plus it doesn't make sense to test txid_status() for tuples' xmin/xmax . I guess you might use it with xids from pg_stat_activity, but there's not much point when you can just look at pg_stat_activity again to see if the proc of interest is still there and has the same xid. Anyway... > I think your comment is pointing to a real issue, > though. It seems like what might be needed is to add one more check. > Before where you have the "else" clause, check if the XID is old, e.g. > precedes our snapshot's xmin. If so, it must be committed or aborted > and, since it didn't commit, it aborted. If not, it must've changed > from in progress to not-in-progress just as we were in the midst > checking, so labeling it as in progress is fine. That seems clear and sensible. XidInMVCCSnapshot simply tests TransactionIdPrecedes(xid, snapshot->xmin), as does HeapTupleSatisfiesHistoricMVCC . It seems reasonable enough to just examine the snapshot xmin directly in txid_status too; it's already done in replication/logical/snapbuild.c and lmgr/predicate.c. We're running in an SQL-called function so we'll have a good snapshot and GetSnapshotData will have run, so GetActiveSnapshot()->xmin should be sufficient. Amended patch attached, with added TAP test included. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On 23 March 2017 at 11:25, Craig Ringer <craig@2ndquadrant.com> wrote: > Amended patch attached, with added TAP test included. Managed to omit it, sigh. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Tue, Mar 21, 2017 at 11:35 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > Changes made per discussion. Committed 0001. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 24 March 2017 at 02:29, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Mar 21, 2017 at 11:35 PM, Craig Ringer <craig@2ndquadrant.com> wrote: >> Changes made per discussion. > > Committed 0001. Much appreciated. Here's the 2nd patch rebased on top of master, with the TAP test included this time. Looks ready to go. I really appreciate the time you've taken to look at this. Point me at anything from your team you want some outside review on. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Fri, Mar 24, 2017 at 2:27 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 24 March 2017 at 02:29, Robert Haas <robertmhaas@gmail.com> wrote: >> On Tue, Mar 21, 2017 at 11:35 PM, Craig Ringer <craig@2ndquadrant.com> wrote: >>> Changes made per discussion. >> >> Committed 0001. > > Much appreciated. > > Here's the 2nd patch rebased on top of master, with the TAP test > included this time. Looks ready to go. Committed. > I really appreciate the time you've taken to look at this. Point me at > anything from your team you want some outside review on. Thanks, that is a valuable offer which I will be pleased to accept! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3/24/17 02:27, Craig Ringer wrote: > On 24 March 2017 at 02:29, Robert Haas <robertmhaas@gmail.com> wrote: >> On Tue, Mar 21, 2017 at 11:35 PM, Craig Ringer <craig@2ndquadrant.com> wrote: >>> Changes made per discussion. >> >> Committed 0001. > > Much appreciated. > > Here's the 2nd patch rebased on top of master, with the TAP test > included this time. Looks ready to go. I'm experiencing hangs in the new t/011_crash_recovery.pl test. It seems to hang after the first call to SELECT txid_current();. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 25 March 2017 at 07:31, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 3/24/17 02:27, Craig Ringer wrote: >> On 24 March 2017 at 02:29, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Tue, Mar 21, 2017 at 11:35 PM, Craig Ringer <craig@2ndquadrant.com> wrote: >>>> Changes made per discussion. >>> >>> Committed 0001. >> >> Much appreciated. >> >> Here's the 2nd patch rebased on top of master, with the TAP test >> included this time. Looks ready to go. > > I'm experiencing hangs in the new t/011_crash_recovery.pl test. It > seems to hang after the first call to SELECT txid_current();. if you add note "txid is $xid"; after +chomp($xid); does it report the xid? Alternately, can you see a 'psql' process and a backend doing a SELECT when it stops progressing? I'm wondering if this is a perl version/platform issue around $tx->pump until $stdout =~ /[[:digit:]]+[\r\n]$/; where we're not recognising the required output from psql when we get it. What's in src/test/recovery/tmp_check/log/regress_log_011* ? I couldn't use PostgresNode::psql or PostgresNode::safe_psql here because the session must stay open. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
> I'm wondering if this is a perl version/platform issue around > > $tx->pump until $stdout =~ /[[:digit:]]+[\r\n]$/; > > where we're not recognising the required output from psql when we get it. > > What's in src/test/recovery/tmp_check/log/regress_log_011* ? > > I couldn't use PostgresNode::psql or PostgresNode::safe_psql here > because the session must stay open. The problem was that psql needs to be called with -X. Fix committed. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3/25/17 12:12 AM, Peter Eisentraut wrote: >> I'm wondering if this is a perl version/platform issue around >> >> $tx->pump until $stdout =~ /[[:digit:]]+[\r\n]$/; >> >> where we're not recognising the required output from psql when we get it. >> >> What's in src/test/recovery/tmp_check/log/regress_log_011* ? >> >> I couldn't use PostgresNode::psql or PostgresNode::safe_psql here >> because the session must stay open. > > The problem was that psql needs to be called with -X. Fix committed. It's not clear to me what remains to be done on this patch. I feel, at the least, that patches 3 and 4 need to be rebased and the status set back to "Needs Review". This thread has been idle for six days. Please respond with a new patch by 2017-04-04 00:00 AoE (UTC-12) or this submission will be marked "Returned with Feedback". -- -David david@pgmasters.net
On 31 Mar. 2017 22:31, "David Steele" <david@pgmasters.net> wrote:
On 3/25/17 12:12 AM, Peter Eisentraut wrote:It's not clear to me what remains to be done on this patch. I feel, at the least, that patches 3 and 4 need to be rebased and the status set back to "Needs Review".I'm wondering if this is a perl version/platform issue around
$tx->pump until $stdout =~ /[[:digit:]]+[\r\n]$/;
where we're not recognising the required output from psql when we get it.
What's in src/test/recovery/tmp_check/log/regress_log_011* ?
I couldn't use PostgresNode::psql or PostgresNode::safe_psql here
because the session must stay open.
The problem was that psql needs to be called with -X. Fix committed.
This thread has been idle for six days. Please respond with a new patch by 2017-04-04 00:00 AoE (UTC-12) or this submission will be marked "Returned with Feedback".
I consider the feature complete and committed.
Patch 3 is optional and per discussion buried upthread we generally agreed that it'd be better to replace most user visible uses of the 'xid' data type with a new epoch extended 'xid64' or similar.
Patch 4, txid_incinerate, has never been intended for commit. It's a testing tool.
Patches 1 and 2 were the key parts and thanks to Robert's helpful review, advice and edits they're committed now.
Committed, done. Yay.
On 3/31/17 10:46 AM, Craig Ringer wrote: > > Patches 1 and 2 were the key parts and thanks to Robert's helpful > review, advice and edits they're committed now. > > Committed, done. Yay. Excellent. I have marked this a "Committed" by Robert. One down... -- -David david@pgmasters.net
On Fri, Mar 31, 2017 at 11:08 AM, David Steele <david@pgmasters.net> wrote: > On 3/31/17 10:46 AM, Craig Ringer wrote: >> Patches 1 and 2 were the key parts and thanks to Robert's helpful >> review, advice and edits they're committed now. >> >> Committed, done. Yay. > > Excellent. I have marked this a "Committed" by Robert. > > One down... ...71 to go? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company