Обсуждение: Temporary tables versus wraparound... again
We had an outage caused by transaction wraparound. And yes, one of the first things I did on this site was check that we didn't have any databases that were in danger of wraparound. However since then we added a monitoring job that used a temporary table with ON COMMIT DELETE ROWS. Since it was a simple monitoring job it stayed connected to the database and used this small temporary table for a very long period of time. The temporary table never got vacuumed by autovacuum and never by the monitoring job (since it was being truncated on every transaction why would it need to be vacuumed...). We've been around this bush before. Tom added orphaned table protection to autovacuum precisely because temporary tables can cause the datfrozenxid to get held back indefinitely. Then Michael Paquier and Tsunakawa Takayuki both found it worth making this more aggressive. But none of that helped as the temporary schema was still in use so they were not considered "orphaned" temp tables at all. I think we need to add some warnings to autovacuum when it detects *non* orphaned temporary tables that are older than the freeze threshold. However in the case of ON COMMIT DELETE ROWS we can do better. Why not just reset the relfrozenxid and other stats as if the table was freshly created when it's truncated? I put together this quick patch to check the idea and it seems to integrate fine in the code. I'm not sure about a few points but I don't think they're showstoppers. 1) Should we update relpages and reltuples. I think so but an argument could be made that people might be depending on running analyze once when the data is loaded and then not having to run analyze on every data load. 2) adding the dependency on heapam.h to heap.c makes sense because of heap_inplace_update bt it may be a bit annoying because I suspect that's a useful sanity check that the tableam stuff hasn't been bypassed 3) I added a check to the regression tests but I'm not sure it's a good idea to actually commit this. It could fail if there's a parallel transaction going on and even moving the test to the serial schedule might not guarantee that never happens due to autovacuum running analyze? I didn't actually add the warning to autovacuum yet. -- greg
Вложения
On Sun, Nov 08, 2020 at 06:19:57PM -0500, Greg Stark wrote: > However in the case of ON COMMIT DELETE ROWS we can do better. Why not > just reset the relfrozenxid and other stats as if the table was > freshly created when it's truncated? That concept is sound. > 1) Should we update relpages and reltuples. I think so but an argument > could be made that people might be depending on running analyze once > when the data is loaded and then not having to run analyze on every > data load. I'd wager no, we should not. An app that ever analyzes an ON COMMIT DELETE ROWS temp table probably analyzes it every time. If not, it's fair to guess that similar statistics recur in each xact. > 2) adding the dependency on heapam.h to heap.c makes sense because of > heap_inplace_update bt it may be a bit annoying because I suspect > that's a useful sanity check that the tableam stuff hasn't been > bypassed That is not terrible. How plausible would it be to call vac_update_relstats() for this, instead of reimplementing part of it? > 3) I added a check to the regression tests but I'm not sure it's a > good idea to actually commit this. It could fail if there's a parallel > transaction going on and even moving the test to the serial schedule > might not guarantee that never happens due to autovacuum running > analyze? Right. > @@ -3340,6 +3383,7 @@ heap_truncate_one_rel(Relation rel) > > /* Truncate the underlying relation */ > table_relation_nontransactional_truncate(rel); > + ResetVacStats(rel); I didn't test, but I expect this will cause a stats reset for the second TRUNCATE here: CREATE TABLE t (); ... BEGIN; TRUNCATE t; TRUNCATE t; -- inplace relfrozenxid reset ROLLBACK; -- inplace reset survives Does that indeed happen?
On Mon, 9 Nov 2020 at 00:17, Noah Misch <noah@leadboat.com> wrote: > > > 2) adding the dependency on heapam.h to heap.c makes sense because of > > heap_inplace_update bt it may be a bit annoying because I suspect > > that's a useful sanity check that the tableam stuff hasn't been > > bypassed > > That is not terrible. How plausible would it be to call vac_update_relstats() > for this, instead of reimplementing part of it? It didn't seem worth it to change its API to add boolean flags to skip setting some of the variables (I was originally only doing relfrozenxid and minmmxid). Now that I'm doing most of the variables maybe it makes a bit more sense. > > @@ -3340,6 +3383,7 @@ heap_truncate_one_rel(Relation rel) > > > > /* Truncate the underlying relation */ > > table_relation_nontransactional_truncate(rel); > > + ResetVacStats(rel); > > I didn't test, but I expect this will cause a stats reset for the second > TRUNCATE here: > > CREATE TABLE t (); > ... > BEGIN; > TRUNCATE t; > TRUNCATE t; -- inplace relfrozenxid reset > ROLLBACK; -- inplace reset survives > > Does that indeed happen? Apparently no, see below. I have to say I was pretty puzzled by the actual behaviour which is that the rollback actually does roll back the inplace update. But I *think* what is happening is that the first truncate does an MVCC update so the inplace update happens only to the newly created tuple which is never commited. Thinking about things a bit this does worry me a bit. I wonder if inplace update is really safe outside of vacuum where we know we're not in a transaction that can be rolled back. But IIRC doing a non-inplace update on pg_class for these columns breaks other things. I don't know if that's still true. Also, in checking this question I realized I had missed 3d351d91. I should be initializing reltuples to -1 not 0. postgres=# vacuum t; VACUUM postgres=# select relname,relpages,reltuples,relallvisible,relfrozenxid from pg_class where oid='t'::regclass; relname | relpages | reltuples | relallvisible | relfrozenxid ---------+----------+-----------+---------------+-------------- t | 9 | 2000 | 9 | 15557 (1 row) postgres=# begin; BEGIN postgres=*# truncate t; TRUNCATE TABLE postgres=*# truncate t; TRUNCATE TABLE postgres=*# select relname,relpages,reltuples,relallvisible,relfrozenxid from pg_class where oid='t'::regclass; relname | relpages | reltuples | relallvisible | relfrozenxid ---------+----------+-----------+---------------+-------------- t | 0 | 0 | 0 | 15562 (1 row) postgres=*# abort; ROLLBACK postgres=# select relname,relpages,reltuples,relallvisible,relfrozenxid from pg_class where oid='t'::regclass; relname | relpages | reltuples | relallvisible | relfrozenxid ---------+----------+-----------+---------------+-------------- t | 9 | 2000 | 9 | 15557 (1 row) -- greg
On Mon, Nov 9, 2020 at 3:23 PM Greg Stark <stark@mit.edu> wrote: > > On Mon, 9 Nov 2020 at 00:17, Noah Misch <noah@leadboat.com> wrote: > > > > > 2) adding the dependency on heapam.h to heap.c makes sense because of > > > heap_inplace_update bt it may be a bit annoying because I suspect > > > that's a useful sanity check that the tableam stuff hasn't been > > > bypassed > > > > That is not terrible. How plausible would it be to call vac_update_relstats() > > for this, instead of reimplementing part of it? > > It didn't seem worth it to change its API to add boolean flags to skip > setting some of the variables (I was originally only doing > relfrozenxid and minmmxid). Now that I'm doing most of the variables > maybe it makes a bit more sense. > > > > @@ -3340,6 +3383,7 @@ heap_truncate_one_rel(Relation rel) > > > > > > /* Truncate the underlying relation */ > > > table_relation_nontransactional_truncate(rel); > > > + ResetVacStats(rel); > > > > I didn't test, but I expect this will cause a stats reset for the second > > TRUNCATE here: > > > > CREATE TABLE t (); > > ... > > BEGIN; > > TRUNCATE t; > > TRUNCATE t; -- inplace relfrozenxid reset > > ROLLBACK; -- inplace reset survives > > > > Does that indeed happen? > > Apparently no, see below. I have to say I was pretty puzzled by the > actual behaviour which is that the rollback actually does roll back > the inplace update. But I *think* what is happening is that the first > truncate does an MVCC update so the inplace update happens only to the > newly created tuple which is never commited. I think in-plase update that the patch introduces is not used because TRUNCATE doesn't use heap_truncate_one_rel() to truncate a table in that scenario. It does MVCC update the pg_class tuple for a new relfilenode with new relfrozenxid and other stats, see RelationSetNewRelfilenode(). If we create and truncate a table within the transaction it does in-place update that the patch introduces but I think it's no problem in this case either. > > Thinking about things a bit this does worry me a bit. I wonder if > inplace update is really safe outside of vacuum where we know we're > not in a transaction that can be rolled back. But IIRC doing a > non-inplace update on pg_class for these columns breaks other things. > I don't know if that's still true. heap_truncate_one_rel() is not a transaction-safe operation. Doing in-place updates during that operation seems okay to me unless I'm missing something. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
On Tue, Nov 10, 2020 at 04:10:57PM +0900, Masahiko Sawada wrote: > On Mon, Nov 9, 2020 at 3:23 PM Greg Stark <stark@mit.edu> wrote: > > On Mon, 9 Nov 2020 at 00:17, Noah Misch <noah@leadboat.com> wrote: > > > > @@ -3340,6 +3383,7 @@ heap_truncate_one_rel(Relation rel) > > > > > > > > /* Truncate the underlying relation */ > > > > table_relation_nontransactional_truncate(rel); > > > > + ResetVacStats(rel); > > > > > > I didn't test, but I expect this will cause a stats reset for the second > > > TRUNCATE here: > > > > > > CREATE TABLE t (); > > > ... > > > BEGIN; > > > TRUNCATE t; > > > TRUNCATE t; -- inplace relfrozenxid reset > > > ROLLBACK; -- inplace reset survives > > > > > > Does that indeed happen? > > > > Apparently no, see below. I have to say I was pretty puzzled by the > > actual behaviour which is that the rollback actually does roll back > > the inplace update. But I *think* what is happening is that the first > > truncate does an MVCC update so the inplace update happens only to the > > newly created tuple which is never commited. > > I think in-plase update that the patch introduces is not used because > TRUNCATE doesn't use heap_truncate_one_rel() to truncate a table in > that scenario. It does MVCC update the pg_class tuple for a new > relfilenode with new relfrozenxid and other stats, see > RelationSetNewRelfilenode(). If we create and truncate a table within > the transaction it does in-place update that the patch introduces but > I think it's no problem in this case either. Agreed. Rolling back a heap_truncate_one_rel() always implies rolling back to an earlier version of the entire pg_class tuple. (That may not be true of mapped relations, but truncating them is unreasonable.) Thanks for checking. > > Thinking about things a bit this does worry me a bit. I wonder if > > inplace update is really safe outside of vacuum where we know we're > > not in a transaction that can be rolled back. But IIRC doing a > > non-inplace update on pg_class for these columns breaks other things. > > I don't know if that's still true. > > heap_truncate_one_rel() is not a transaction-safe operation. Doing > in-place updates during that operation seems okay to me unless I'm > missing something. Yep.
Here's an updated patch. I added some warning messages to autovacuum. One thing I learned trying to debug this situation in production is that it's nigh impossible to find the pid of the session using a temporary schema. The number in the schema refers to the backendId in the sinval stuff for which there's no external way to look up the corresponding pid. It would have been very helpful if autovacuum had just told me which backend pid to kill. I still have the regression test in the patch and as before I think it's probably not worth committing. I'm leaning to reverting that section of the patch before comitting. Incidentally there's still a hole here where a new session can attach to an existing temporary schema where a table was never truncated due to a session dieing abnormally. That new session could be a long-lived session but never use the temporary schema causing the old table to just sit there. Autovacuum has no way to tell it's not actually in use. I tend to think the optimization to defer cleaning the temporary schema until it's used might not really be an optimization. It still needs to be cleaned someday so it's just moving the work around. Just removing that optimization might be the easiest way to close this hole. The only alternative I see is adding a flag to PROC or somewhere where autovacuum can see if the backend has actually initialized the temporary schema yet or not.
Вложения
On 10/12/21, 3:07 PM, "Greg Stark" <stark@mit.edu> wrote: > Here's an updated patch. I added some warning messages to autovacuum. I think this is useful optimization, and I intend to review the patch more closely soon. It looks reasonable to me after a quick glance. > One thing I learned trying to debug this situation in production is > that it's nigh impossible to find the pid of the session using a > temporary schema. The number in the schema refers to the backendId in > the sinval stuff for which there's no external way to look up the > corresponding pid. It would have been very helpful if autovacuum had > just told me which backend pid to kill. I certainly think it would be good to have autovacuum log the PID, but IIUC a query like this would help you map the backend ID to the PID: SELECT bid, pg_stat_get_backend_pid(bid) AS pid FROM pg_stat_get_backend_idset() bid; Nathan
Hi, On 2021-10-12 18:04:35 -0400, Greg Stark wrote: > Here's an updated patch. Unfortunately it doesn't apply anymore these days: http://cfbot.cputube.org/patch_37_3358.log Marked as waiting-on-author. Greetings, Andres Freund
No problem, I can update the patch and check on the fuzz. But the actual conflict is just in the test and I'm not sure it's really worth having a test at all. It's testing a pretty low level detail. So I'm leaning toward fixing the conflict by just ripping the test out. Nathan also pointed out there was a simpler way to get the pid. I don't think the way I was doing it was wrong but I'll double check that.
Here's a rebased patch. I split the test into a separate patch that I would lean to dropping. But at least it applies now. I did look into pg_stat_get_backend_pid() and I guess it would work but going through the stats mechanism does seem like going the long way around since we're already looking at the backendId info directly here, we just weren't grabbing the pid. I did make a small change, I renamed the checkTempNamespaceStatus() function to checkTempNamespaceStatusAndPid(). It seems unlikely there are any external consumers of this function (the only internal consumer is autovacuum.c). But just in case I renamed it to protect against any external modules failing from the added parameter.
Вложения
I had to rebase this again after Tom's cleanup of heap.c removing some includes. I had to re-add snapmgr to access RecentXmin. I occurs to me to ask whether RecentXmin is actually guaranteed to be set. I haven't checked. I thought it was set when the first snapshot was taken and presumably even if it's a non-transactional truncate we're still in a transaction? The patch also added heapam.h to heap.c which might seem like a layer violation. I think it's ok since it's just to be able to update the catalog (heap_inplace_update is in heapam.h).
Вложения
Hi, On 2022-03-28 16:11:55 -0400, Greg Stark wrote: > From 4515075b644d1e38920eb5bdaaa898e1698510a8 Mon Sep 17 00:00:00 2001 > From: Greg Stark <stark@mit.edu> > Date: Tue, 22 Mar 2022 15:51:32 -0400 > Subject: [PATCH v4 1/2] Update relfrozenxmin when truncating temp tables > > Make ON COMMIT DELETE ROWS reset relfrozenxmin and other table stats > like normal truncate. Otherwise even typical short-lived transactions > using temporary tables can easily cause them to reach relfrozenxid. Might be worth mentioning that ON COMMIT DELETE is implemented as truncating tables. If we actually implemented it as deleting rows, it'd not at all be correct to reset relfrozenxmin. > Also add warnings when old temporary tables are found to still be in > use during autovacuum. Long lived sessions using temporary tables are > required to vacuum them themselves. I'd do that in a separate patch. > +/* > + * Reset the relfrozenxid and other stats to the same values used when > + * creating tables. This is used after non-transactional truncation. > + * > + * This reduces the need for long-running programs to vacuum their own > + * temporary tables (since they're not covered by autovacuum) at least in the > + * case where they're ON COMMIT DELETE ROWS. > + * > + * see also src/backend/commands/vacuum.c vac_update_relstats() > + * also see AddNewRelationTuple() above > + */ > + > +static void > +ResetVacStats(Relation rel) > +{ > + HeapTuple ctup; > + Form_pg_class pgcform; > + Relation classRel; > + > + /* Fetch a copy of the tuple to scribble on */ > + classRel = table_open(RelationRelationId, RowExclusiveLock); > + ctup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(RelationGetRelid(rel))); > > + if (!HeapTupleIsValid(ctup)) > + elog(ERROR, "pg_class entry for relid %u vanished during truncation", > + RelationGetRelid(rel)); > + pgcform = (Form_pg_class) GETSTRUCT(ctup); > + > + /* > + * Update relfrozenxid > + */ > + > + pgcform->relpages = 0; > + pgcform->reltuples = -1; > + pgcform->relallvisible = 0; > + pgcform->relfrozenxid = RecentXmin; Hm. Is RecentXmin guaranteed to be valid at this point? > + pgcform->relminmxid = GetOldestMultiXactId(); Ugh. That's pretty expensive for something now done at a much higher rate than before. > @@ -2113,20 +2126,31 @@ do_autovacuum(void) > * Remember it so we can try to delete it later. > */ > orphan_oids = lappend_oid(orphan_oids, relid); > + } else if (temp_status == TEMP_NAMESPACE_NOT_TEMP) { > + elog(LOG, "autovacuum: found temporary table \"%s.%s.%s\" in non-temporary namespace", > + get_database_name(MyDatabaseId), > + get_namespace_name(classForm->relnamespace), > + NameStr(classForm->relname)); > + } else if (temp_status == TEMP_NAMESPACE_IN_USE && wraparound) { we put else if on a separate line from }. And { also is always on a separate line. Greetings, Andres Freund
On Mon, 28 Mar 2022 at 16:30, Andres Freund <andres@anarazel.de> wrote: > > > Make ON COMMIT DELETE ROWS reset relfrozenxmin and other table stats > > like normal truncate. Otherwise even typical short-lived transactions > > using temporary tables can easily cause them to reach relfrozenxid. > > Might be worth mentioning that ON COMMIT DELETE is implemented as truncating > tables. If we actually implemented it as deleting rows, it'd not at all be > correct to reset relfrozenxmin. In the commit message or are you saying this needs documentation or a comment? > > Also add warnings when old temporary tables are found to still be in > > use during autovacuum. Long lived sessions using temporary tables are > > required to vacuum them themselves. > > I'd do that in a separate patch. Hm, seems a bit small but sure no problem, I'll split it out. > > + pgcform->relfrozenxid = RecentXmin; > > Hm. Is RecentXmin guaranteed to be valid at this point? I mentioned the same worry. But ok, I just looked into it and it's definitely not a problem. We only do truncates after either a user issued TRUNCATE when the table was created in the same transaction or at commit iff a flag is set indicating temporary tables have been used. Either way a snapshot has been taken. I've added some comments and an assertion and I think if assertions are disabled and this impossible condition is hit we can just skip the stats reset. > > + pgcform->relminmxid = GetOldestMultiXactId(); > > Ugh. That's pretty expensive for something now done at a much higher rate than > before. This I'm really not sure about. I really don't know much about multixacts. I've been reading a bit but I'm not sure what to do yet. I'm wondering if there's something cheaper we can use. We don't need the oldest mxid that might be visible in a table somewhere, just the oldest that has a real live uncommitted transaction in it that could yet create new tuples in the truncated table. In the case of temporary tables I think we could just set it to the next mxid since there are no live transactions capable of inserting into the temporary table. But in the case of a table created in this transaction then that wouldn't be good enough. I think? I'm not clear whether existing mxids get reused for new updates if they happen to have the same set of locks in them as some existing mxid. > we put else if on a separate line from }. And { also is always on a separate > line. Sorry, old habits... Incidentally.... in doing the above I noticed an actual bug :( The toast reset had the wrong relid in it. I'll add the toast table to the test too. -- greg
On Tue, Mar 29, 2022 at 4:52 PM Greg Stark <stark@mit.edu> wrote:
On Mon, 28 Mar 2022 at 16:30, Andres Freund <andres@anarazel.de> wrote:
>
> > Make ON COMMIT DELETE ROWS reset relfrozenxmin and other table stats
> > like normal truncate. Otherwise even typical short-lived transactions
> > using temporary tables can easily cause them to reach relfrozenxid.
>
> Might be worth mentioning that ON COMMIT DELETE is implemented as truncating
> tables. If we actually implemented it as deleting rows, it'd not at all be
> correct to reset relfrozenxmin.
In the commit message or are you saying this needs documentation or a comment?
Just flying by here but...
The user-facing documentation already covers this:
"All rows in the temporary table will be deleted at the end of each transaction block. Essentially, an automatic TRUNCATE is done at each commit. When used on a partitioned table, this is not cascaded to its partitions."
I'm not sure why we felt the need to add "essentially" here - but maybe it's because we didn't "reset relfronzedenxmin and other table stats like normal truncate."? Or maybe just natural word flow.
Either way, maybe word it like this to avoid the need for essentially altogether:
The temporary table will be automatically truncated at the end of each transaction block. However, unlike the TRUNCATE command, descendent tables will not be cascaded to. (I'm changing partitions to descendant tables to make a point here - the TRUNCATE command only references descendent tables, not mentioning partitioning by name at all. Is this desirable?)
I don't have any substantive insight into the commit message or code comments; but it doesn't seem immediately wrong to assume the reader understands that ON COMMIT DELETE ROWS uses something more akin to TRUNCATE rather than DELETE since that is what the feature is documented to do. The commit message in particular seems like it doesn't need to teach that point; but can do so if it makes understanding the changes easier.
David J.
Hi, On 2022-03-29 19:51:26 -0400, Greg Stark wrote: > On Mon, 28 Mar 2022 at 16:30, Andres Freund <andres@anarazel.de> wrote: > > > > > Make ON COMMIT DELETE ROWS reset relfrozenxmin and other table stats > > > like normal truncate. Otherwise even typical short-lived transactions > > > using temporary tables can easily cause them to reach relfrozenxid. > > > > Might be worth mentioning that ON COMMIT DELETE is implemented as truncating > > tables. If we actually implemented it as deleting rows, it'd not at all be > > correct to reset relfrozenxmin. > > In the commit message or are you saying this needs documentation or a comment? In the commit message. > > > + pgcform->relminmxid = GetOldestMultiXactId(); > > > > Ugh. That's pretty expensive for something now done at a much higher rate than > > before. > > This I'm really not sure about. I really don't know much about > multixacts. I've been reading a bit but I'm not sure what to do yet. > I'm wondering if there's something cheaper we can use. We don't need > the oldest mxid that might be visible in a table somewhere, just the > oldest that has a real live uncommitted transaction in it that could > yet create new tuples in the truncated table. > In the case of temporary tables I think we could just set it to the > next mxid since there are no live transactions capable of inserting > into the temporary table. But in the case of a table created in this > transaction then that wouldn't be good enough. I think? I'm not clear > whether existing mxids get reused for new updates if they happen to > have the same set of locks in them as some existing mxid. Yes, that can happen. But of course the current xid is always part of the multixact, so it can't be a multixact from before the transaction started. There's already a record of the oldest mxid a backend considers live, computed on the first use of multixacts in a transaction. See MultiXactIdSetOldestVisible(). Which I think might serve as a suitable relminmxid of a temporary table in an already running transaction? Greetings, Andres Freund
I've updated the patches. Adding the assertion actually turned up a corner case where RecentXmin was *not* set. If you lock a temporary table and that's the only thing you do in a transaction then the flag is set indicating you've used the temp schema but you never take a snapshot :( I also split out the warnings and added a test that relfrozenxid was advanced on the toast table as well. I haven't wrapped my head around multixacts yet. It's complicated by this same codepath being used for truncates of regular tables that were created in the same transaction.
Вложения
On Thu, 31 Mar 2022 at 16:05, Greg Stark <stark@mit.edu> wrote: > > I haven't wrapped my head around multixacts yet. It's complicated by > this same codepath being used for truncates of regular tables that > were created in the same transaction. So my best idea so far is to actually special-case the temp table case in this code path. I think that's easy enough since I have the heap tuple I'm about to replace. In the temp table case I would just use the value Andres proposes. In the "truncating table in same transaction it was created" case then I would go ahead and use the expensive GetOldestMultiXactId() which should be ok for that case. At least I think the "much higher rate" comment was motivated by the idea that every transaction commit (when temp tables are being used) is more frequent than any specific user ddl. It's not brilliant since it seems to be embedding knowledge of the cases where this optimization applies in a lower level function. If we think of some other case where it could apply it wouldn't be obvious that it will have a cost to it. But it doesn't seem too terrible to me. An alternative would be to simply not adjust relminmxid for non-temp tables at all. I guess that's not too bad either since these are non-temp tables that autovacuum will be able to do anti-wraparound vacuums on. And I get the impression mxids don't wraparound nearly as often as xids? -- greg
So here's an updated patch. I had to add a public method to multixact.c to expose the locally calculated OldestMultiXactId. It's possible we could use something even tighter (like the current next mxid since we're about to commit) but I didn't see a point in going further and it would have become more complex. I also added a branch in heapam_handler.c in ..._set_new_filenode() of temporary tables. It feels like a free win and it's more consistent. I'm not 100% on the tableam abstraction -- it's possible all of this change should have happened in heapam_handler somewhere? I don't think so but it does feel weird to be touching it and also doing the same thing elsewhere. I think this has addressed all the questions now.
Вложения
On Sun, 8 Nov 2020 at 18:19, Greg Stark <stark@mit.edu> wrote: > > We had an outage caused by transaction wraparound. And yes, one of the > first things I did on this site was check that we didn't have any > databases that were in danger of wraparound. Fwiw this patch has been in "Ready for Committer" state since April and has been moved forward three times including missing the release. It's a pretty short patch and fixes a problem that caused an outage for $previous_employer and I've had private discussions from other people who have been struggling with the same issue. Personally I consider it pretty close to a bug fix and worth backpatching. I think it's pretty annoying to have put out a release without this fix. -- greg
Greg Stark <stark@mit.edu> writes: > Simple Rebase I took a little bit of a look through these. * I find 0001 a bit scary, specifically that it's decided it's okay to apply extract_autovac_opts, pgstat_fetch_stat_tabentry_ext, and especially relation_needs_vacanalyze to another session's temp table. How safe is that really? * Don't see much point in renaming checkTempNamespaceStatus. That doesn't make it not an ABI break. If we want to back-patch this we'll have to do something different than what you did here. * In 0002, I don't especially approve of what you've done with the relminmxid calculation --- that seems to move it out of "pure bug fix" territory and into "hmm, I wonder if this creates new bugs" territory. Also, skipping that update for non-temp tables immediately falsifies ResetVacStats' claimed charter of "resetting to the same values used when creating tables". Surely GetOldestMultiXactId isn't *that* expensive, especially compared to the costs of file truncation. I think you should just do GetOldestMultiXactId straight up, and maybe submit a separate performance-improvement patch to make it do the other thing (in both places) for temp tables. * I wonder if this is the best place for ResetVacStats --- it doesn't seem to be very close to the code it needs to stay in sync with. If there's no better place, perhaps adding cross- reference comments in both directions would be advisable. * 0003 says it's running temp.sql by itself to avoid interference from other sessions, but sadly that cannot avoid interference from background autovacuum/autoanalyze. I seriously doubt this patch would survive contact with the buildfarm. Do we actually need a new test case? It's not like the code won't get exercised without it --- we have plenty of temp table truncations, surely. regards, tom lane
On Sat, 5 Nov 2022 at 11:34, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Greg Stark <stark@mit.edu> writes: > > Simple Rebase > > I took a little bit of a look through these. > > * I find 0001 a bit scary, specifically that it's decided it's > okay to apply extract_autovac_opts, pgstat_fetch_stat_tabentry_ext, > and especially relation_needs_vacanalyze to another session's > temp table. How safe is that really? I can look a bit more closely but none of them are doing any thing with the table itself, just the catalog entries which afaik have always been fair game for other sessions. So I'm not really clear what kind of unsafeness you're asking about. > * Don't see much point in renaming checkTempNamespaceStatus. > That doesn't make it not an ABI break. If we want to back-patch > this we'll have to do something different than what you did here. Well it's an ABI break but at least it's an ABI break that gives a build-time error or shared library loading error rather than one that just crashes or writes to random memory at runtime. > * In 0002, I don't especially approve of what you've done with > the relminmxid calculation --- that seems to move it out of > "pure bug fix" territory and into "hmm, I wonder if this > creates new bugs" territory. Hm. Ok, I can separate that into a separate patch. I admit I have a lot of trouble remembering how multixactids work. > Also, skipping that update > for non-temp tables immediately falsifies ResetVacStats' > claimed charter of "resetting to the same values used when > creating tables". Surely GetOldestMultiXactId isn't *that* > expensive, especially compared to the costs of file truncation. > I think you should just do GetOldestMultiXactId straight up, > and maybe submit a separate performance-improvement patch > to make it do the other thing (in both places) for temp tables. Hm. the feedback I got earlier was that it was quite expensive. That said, I think the concern was about the temp tables where the truncate was happening on every transaction commit when they're used. For regular truncates I'm not sure how important optimizing it is. > * I wonder if this is the best place for ResetVacStats --- it > doesn't seem to be very close to the code it needs to stay in > sync with. If there's no better place, perhaps adding cross- > reference comments in both directions would be advisable. I'll look at that. I think relfrozenxid and relfrozenmxid are touched in a lot of places so it may be tilting at windmills to try to centralize the code working with them at this point... > * 0003 says it's running temp.sql by itself to avoid interference > from other sessions, but sadly that cannot avoid interference > from background autovacuum/autoanalyze. I seriously doubt this > patch would survive contact with the buildfarm. Do we actually > need a new test case? It's not like the code won't get exercised > without it --- we have plenty of temp table truncations, surely. No I don't think we do. I kept it in a separate commit so it could be dropped when committing. But just having truncate working isn't really good enough either. An early version of the patch had a bug that meant it didn't run at all so truncate worked fine but relfrozenxid never got reset. In thinking about whether we could have a basic test that temp tables are getting reset at all it occurs to me that there's still a gap here: You can have a session attached to a temp namespace that never actually uses the temp tables. That would prevent autovacuum from dropping them and still never reset their vacuum stats. :( Offhand I think PreCommit_on_commit_actions() could occasionally truncate all ON COMMIT TRUNCATE tables even if they haven't been touched in this transaction. -- greg
Hi, On 2022-12-01 11:13:01 -0500, Greg Stark wrote: > On Sat, 5 Nov 2022 at 11:34, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > Greg Stark <stark@mit.edu> writes: > > > Simple Rebase > > > > I took a little bit of a look through these. > > > > * I find 0001 a bit scary, specifically that it's decided it's > > okay to apply extract_autovac_opts, pgstat_fetch_stat_tabentry_ext, > > and especially relation_needs_vacanalyze to another session's > > temp table. How safe is that really? > > I can look a bit more closely but none of them are doing any thing > with the table itself, just the catalog entries which afaik have > always been fair game for other sessions. So I'm not really clear what > kind of unsafeness you're asking about. Is that actually true? Don't we skip some locking operations for temporary tables, which then also means catalog modifications cannot safely be done in other sessions? > > Also, skipping that update > > for non-temp tables immediately falsifies ResetVacStats' > > claimed charter of "resetting to the same values used when > > creating tables". Surely GetOldestMultiXactId isn't *that* > > expensive, especially compared to the costs of file truncation. > > I think you should just do GetOldestMultiXactId straight up, > > and maybe submit a separate performance-improvement patch > > to make it do the other thing (in both places) for temp tables. > > Hm. the feedback I got earlier was that it was quite expensive. That > said, I think the concern was about the temp tables where the truncate > was happening on every transaction commit when they're used. For > regular truncates I'm not sure how important optimizing it is. And it's not called just once, but once for each relation. > > * I wonder if this is the best place for ResetVacStats --- it > > doesn't seem to be very close to the code it needs to stay in > > sync with. If there's no better place, perhaps adding cross- > > reference comments in both directions would be advisable. > > I'll look at that. I think relfrozenxid and relfrozenmxid are touched > in a lot of places so it may be tilting at windmills to try to > centralize the code working with them at this point... Not convinced. Yes, there's plenty of references to relfrozenxid, but most of them don't modify it. I find it problematic that ResetVacStats() bypasses tableam. Normal vacuums etc go through tableam but you put a ResetVacStats() besides each call to table_relation_nontransactional_truncate(). Seems like this should just be in heapam_relation_nontransactional_truncate()? Is it a good idea to use heap_inplace_update() in ResetVacStats()? Greetings, Andres Freund
On Thu, 1 Dec 2022 at 14:18, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2022-12-01 11:13:01 -0500, Greg Stark wrote: > > On Sat, 5 Nov 2022 at 11:34, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > > * I find 0001 a bit scary, specifically that it's decided it's > > > okay to apply extract_autovac_opts, pgstat_fetch_stat_tabentry_ext, > > > and especially relation_needs_vacanalyze to another session's > > > temp table. How safe is that really? > > > > I can look a bit more closely but none of them are doing any thing > > with the table itself, just the catalog entries which afaik have > > always been fair game for other sessions. So I'm not really clear what > > kind of unsafeness you're asking about. > > Is that actually true? Don't we skip some locking operations for temporary > tables, which then also means catalog modifications cannot safely be done in > other sessions? This code isn't doing any catalog modifications from other sessions. The code Tom's referring to is just autovacuum looking at relfrozenxid and relfrozenmxid and printing warnings if they're approaching the wraparound limits that would otherwise trigger an anti-wraparound vacuum. > I find it problematic that ResetVacStats() bypasses tableam. Normal vacuums > etc go through tableam but you put a ResetVacStats() besides each call to > table_relation_nontransactional_truncate(). Seems like this should just be in > heapam_relation_nontransactional_truncate()? Ok. Think this patch actually predated the tableam (by a lot. I've had others actually approach me about whether there's a good solution because it's been biting them too) and I didn't see that in the merge forward. > Is it a good idea to use heap_inplace_update() in ResetVacStats()? This is a good question. I had the impression it was actually the right thing to do and there's actually been bugs in the past caused by *not* using heap_inplace_update() so I think it's actually important to get this right. I don't see any documentation explaining what the rules are for when inplace edits are safe or unsafe or indeed when they're necessary for correctness. So I searched back through the archives and checked when it came up. It seems there are a few issues: a) Nontransactional operations like vacuum have to use it because they don't have a transaction. Presumably this is why vacuum normally uses inplace_update for these stats. b) in the past SnapshotNow scans would behave incorrectly if we do normal mvcc updates on rows without exclusive locks protecting against concurrent scans. I'm not sure if this is still a factor these days with the types of scans that still exist. c) There are some constraints having to do with logical replication that I didn't understand. I hope they don't relate to frozenxid but I don't know d) There were also some issues having to do with SInval messages but I think they were additional constraints that inplace updates needed to be concerned about. These truncates are done at end of transaction but precommit so the transaction is still alive and there obviously should be no concurrent scans on temporary tables so I think it should be safe to do a regular mvcc update. Is it a good idea to bloat the catalog though? If you have many temporary tables and don't actually touch more than a few of them in your transaction that could be a lot of new tuple inserts on every commit. Actually it's only sort of true -- if no persistent xid is created then we would be creating one just for this. But that shouldn't happen because we only truncate if the transaction ever "touched" a temporary table. It occurs to me it could still be kind of a problem if you have a temporary table that you use once and then your session stays alive for a long time without using temporary tables. Then it won't be truncated and the frozenxid won't be advanced :( It's kind of annoying that we have to put RecentXmin and Get{Our,}OldestMultiXactId() in the table when truncating and then keep advancing them even if there's no data in the table. Ideally wouldn't it be better to be able to have Invalid{Sub,}Xid there and only initialize it when a first insert is made? -- greg
So.... I talked about this patch with Ronan Dunklau and he had a good question.... Why are we maintaining relfrozenxid and relminmxid in pg_class for temporary tables at all? Autovacuum can't use them and other sessions won't care about them. The only session that might care about them is the one attached to the temp schema. So we could replace relfrozenxid and relminmxid for temp tables with a local hash table that can be updated on every truncate easily and efficiently. If a temp table actually wraps around the only session that runs into problems is the one attached to that temp schema. It can throw local session errors and doesn't need to interfere with the rest of the cluster in any way. It could even start running vacuums though I'm not convinced that's a great solution. At least I think so. I'm pretty sure about relfrozenxid but as always I don't really know how relminmxid works. I think we only need to worry about multixacts for subtransactions, all of which are in the same transaction -- does that even work that way? But this is really attractive since it means no catalog updates just for temp tables on every transaction and no wraparound cluster problems even if you have on-commit-preserve-rows tables. It really shouldn't be possible for a regular user to cause the whole cluster to run into problems just by creating a temp table and keeping a connection around a while.
Hi, On 2022-12-06 13:47:39 -0500, Greg Stark wrote: > So.... I talked about this patch with Ronan Dunklau and he had a good > question.... Why are we maintaining relfrozenxid and relminmxid in > pg_class for temporary tables at all? Autovacuum can't use them and > other sessions won't care about them. The only session that might care > about them is the one attached to the temp schema. Uh, without relfrozenxid for temp tables we can end up truncating clog "ranges" away that are required to access the temp tables. So this would basically mean that temp tables can't be used reliably anymore. Greetings, Andres Freund
On Tue, 6 Dec 2022 at 13:59, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2022-12-06 13:47:39 -0500, Greg Stark wrote: > > So.... I talked about this patch with Ronan Dunklau and he had a good > > question.... Why are we maintaining relfrozenxid and relminmxid in > > pg_class for temporary tables at all? Autovacuum can't use them and > > other sessions won't care about them. The only session that might care > > about them is the one attached to the temp schema. > > Uh, without relfrozenxid for temp tables we can end up truncating clog > "ranges" away that are required to access the temp tables. So this would > basically mean that temp tables can't be used reliably anymore. True, we would have to have some other mechanism for exporting the frozenxid that the session needs. Presumably that would be something in PGProc like the xmin and other numbers. It could be updated by scanning our local hash table whenever a transaction starts. This also probably is what would be needed for multixacts I guess? -- greg
Hi, On 2022-12-06 14:50:34 -0500, Greg Stark wrote: > On Tue, 6 Dec 2022 at 13:59, Andres Freund <andres@anarazel.de> wrote: > > On 2022-12-06 13:47:39 -0500, Greg Stark wrote: > > > So.... I talked about this patch with Ronan Dunklau and he had a good > > > question.... Why are we maintaining relfrozenxid and relminmxid in > > > pg_class for temporary tables at all? Autovacuum can't use them and > > > other sessions won't care about them. The only session that might care > > > about them is the one attached to the temp schema. > > > > Uh, without relfrozenxid for temp tables we can end up truncating clog > > "ranges" away that are required to access the temp tables. So this would > > basically mean that temp tables can't be used reliably anymore. > > True, we would have to have some other mechanism for exporting the > frozenxid that the session needs. Presumably that would be something > in PGProc like the xmin and other numbers. It could be updated by > scanning our local hash table whenever a transaction starts. That'd be a fair bit of new mechanism. Not at all impossible, but I'm doubtful the complexity is worth it. In your patch the relevant catalog change is an inplace change and thus doesn't cause bloat. And if we have to retain the clog, I don't see that much benefit in the proposed approach. > This also probably is what would be needed for multixacts I guess? Yes. Greetings, Andres Freund
On Thu, 1 Dec 2022 at 14:18, Andres Freund <andres@anarazel.de> wrote: > > I find it problematic that ResetVacStats() bypasses tableam. Normal vacuums > etc go through tableam but you put a ResetVacStats() besides each call to > table_relation_nontransactional_truncate(). Seems like this should just be in > heapam_relation_nontransactional_truncate()? So this seems a bit weird. The only other part of the tableam that touches freezexid and minmxid is table_relation_set_new_filelocator() which returns them via references so that the callers (heap.c:heap_create() and relcache.c:RelationSetNewRelfilenumber()) can set them themselves. I can't really duplicate that layering without changing the API of heapam_relation_nontransactional_truncate(). Which if it changes would be quite annoying I think for external pluggable tableams. But you're right that where I've put it will update relfrozenxid and minmxid even for relations that have a tableam handler that returns InvalidXid and doesn't need it. That does seem inappropriate. I could put it directly in the heapam_handler but is that a layering issue to be doing a catalog update from within the tableam_handler? There are no other catalog updates in there. On the other hand the existing callers of *_nontransactional_truncate() don't have any existing catalog updates they want to make so perhaps returning the xid values by reference was just for convenience to avoid doing an extra update and isn't needed here. -- greg
On Wed, 7 Dec 2022 at 22:02, Greg Stark <stark@mit.edu> wrote: > > Seems like this should just be in > > heapam_relation_nontransactional_truncate()? So here I've done it that way. It is a bit of an unfortunate layering since it means the heapam_handler is doing the catalog change but it does seem inconvenient to pass relfrozenxid etc back up and have the caller make the changes when there are no other changes to make. Also, I'm not sure what changed but maybe there was some other commits in vacuum.c in the meantime. I remember being frustrated previously trying to reuse that code but now it works fine. So I was able to reduce the copy-pasted code significantly. (The tests are probably not worth committing, they're just here for my own testing to be sure it's doing anything) -- greg
Вложения
On Sat, 5 Nov 2022 at 15:34, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Greg Stark <stark@mit.edu> writes: > > Simple Rebase > > I took a little bit of a look through these. > > * I find 0001 a bit scary, specifically that it's decided it's > okay to apply extract_autovac_opts, pgstat_fetch_stat_tabentry_ext, > and especially relation_needs_vacanalyze to another session's > temp table. How safe is that really? So I don't see any evidence we skip any locking on pg_class when doing updates on rows for temporary tables. It's a bit hard to tell because there are several ways of checking if a table is temporary. Most places just check relpersistence but there is also an accessing macro RelationIsPermanent() as well as a relcache field rd_islocaltemp which could be used. I'm still looking But so far nearly all the checks I've found just throw errors for temporary tables and none relate to any operations on pg_class entries. In any case we're already using the pg_class struct to look at relpersistence itself.... So... the danger to check for is something we would already be at risk of. Namely that the pg_class row is updated without any locking and then vacuumed away while we hold this struct pointer and we're looking at fields that have since been overwritten with other data from an unrelated row. But that would require all kinds of rules to be broken and would be posing a risk for anyone just running select * from pg_class. So I find it hard to believe we would be doing this. extract_autovac_opts looks at a variable sized field so concurrent updates would be an issue, but obviously there are only mvcc updates to this field so I don't see how it could be a problem. pgstat_fetch_stat_tabentry I don't even see what the possible risks would be. The words persistence and temporary don't appear in pgstat.c (except "temporary statistics" in some log messages). And then there's relation_needs_vacanalyze() and it looks at relfrozenxid and relminmxid (and relname in some debug messages). Those fields could get updated by a concurrent vacuum or -- after this patch -- a truncate in an inplace_update. That seems to be the only real risk here. But this is not related to temporary tables at all. Any pg_class entry can get in_place_update'd by plain old vacuum to update the relfrozenxid and relminmxid. The in_place_update would take an exclusive lock on the buffer but I think that doesn't actually protect us since autovacuum would only have a pin? Or does the SysCache protect us by copying out the whole row while it's locked? This is worth answering but it's not an issue related to this patch or temporary tables. Is autovacuum looking at relfrozenxid and relminmxid in a way that's safely protected against a concurrent manual vacuum issuing an in_place_update? -- greg
On Wed, Dec 14, 2022 at 1:18 PM Greg Stark <stark@mit.edu> wrote: > So I don't see any evidence we skip any locking on pg_class when doing > updates on rows for temporary tables. I don't know what this means. You don't have to lock pg_class to update rows in any table, whether temporary or otherwise. You do have to lock a table in order to update its pg_class row, though, whether the table is temporary or not. Otherwise, another session could drop it while you're doing something with it, after which bad things would happen. -- Robert Haas EDB: http://www.enterprisedb.com
> You do have to lock a table in order to update its pg_class row, > though, whether the table is temporary or not. Otherwise, another > session could drop it while you're doing something with it, after > which bad things would happen. I was responding to this from Andres: > Is that actually true? Don't we skip some locking operations for temporary > tables, which then also means catalog modifications cannot safely be done in > other sessions? I don't actually see this in the code but in any case we're not doing any catalog modifications here. We're just inspecting values of relfrozenxid and relminmxid in the struct returned from SearchSysCache. Which I think is no different for temp tables than any other table.
On Wed, Dec 14, 2022 at 4:44 PM Greg Stark <stark@mit.edu> wrote: > > You do have to lock a table in order to update its pg_class row, > > though, whether the table is temporary or not. Otherwise, another > > session could drop it while you're doing something with it, after > > which bad things would happen. > > I was responding to this from Andres: > > > Is that actually true? Don't we skip some locking operations for temporary > > tables, which then also means catalog modifications cannot safely be done in > > other sessions? > > I don't actually see this in the code ... Yes, I think Andres may be wrong in this case. (Dang, I don't get to say that very often.) -- Robert Haas EDB: http://www.enterprisedb.com
On 2022-Dec-13, Greg Stark wrote: > So here I've done it that way. It is a bit of an unfortunate layering > since it means the heapam_handler is doing the catalog change but it > does seem inconvenient to pass relfrozenxid etc back up and have the > caller make the changes when there are no other changes to make. Are you still at this? CFbot says the meson tests failed last time for some reason: http://commitfest.cputube.org/greg-stark.html -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
I think that was spurious. It looked good when we looked at it yesterday. The rest that failed seemed unrelated and was also taking on my SSL patch too.
I talked to Andres about the possibility of torn reads in the pg_class stats but those are all 4-byte columns so probably safe. And in any case that's a pre-existing possibility just more likely (if it's possible at all) by frequent truncates.
On Thu, Feb 2, 2023, 15:47 Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Dec-13, Greg Stark wrote:
> So here I've done it that way. It is a bit of an unfortunate layering
> since it means the heapam_handler is doing the catalog change but it
> does seem inconvenient to pass relfrozenxid etc back up and have the
> caller make the changes when there are no other changes to make.
Are you still at this? CFbot says the meson tests failed last time for
some reason:
http://commitfest.cputube.org/greg-stark.html
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Hi, On 2023-02-04 17:12:36 +0100, Greg Stark wrote: > I think that was spurious. It looked good when we looked at it yesterday. > The rest that failed seemed unrelated and was also taking on my SSL patch > too. I don't think the SSL failures are related to the failure of this patch. That was in one of the new tests executed as part of the main regression tests: https://api.cirrus-ci.com/v1/artifact/task/6418299974582272/testrun/build/testrun/regress/regress/regression.diffs diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/temp.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/temp.out --- /tmp/cirrus-ci-build/src/test/regress/expected/temp.out 2023-02-04 05:43:14.225905000 +0000 +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/temp.out 2023-02-04 05:46:57.468250000 +0000 @@ -108,7 +108,7 @@ :old_relfrozenxid <> :new_relfrozenxid AS frozenxid_advanced; pages_analyzed | pages_reset | tuples_analyzed | tuples_reset | frozenxid_advanced ----------------+-------------+-----------------+--------------+-------------------- - t | t | t | t | t + t | t | t | t | f (1 row) -- The toast table can't be analyzed so relpages and reltuples can't Whereas the SSL test once failed in subscription/031_column_list (a test with some known stability issues) and twice in postgres_fdw. Unfortunately the postgres_fdw failures are failing to upload: [17:41:25.601] Failed to upload artifacts: Put "https://storage.googleapis.com/cirrus-ci-5309429912436736-3271c9/artifacts/postgresql-cfbot/postgresql/6061134453669888/testrun/build/testrun/runningcheck.log?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=cirrus-ci%40cirrus-ci-community.iam.gserviceaccount.com%2F20230128%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20230128T174012Z&X-Goog-Expires=600&X-Goog-SignedHeaders=host%3Bx-goog-content-length-range%3Bx-goog-meta-created_by_task&X-Goog-Signature=6f5606e3966d68060a14077deb93ed5bf680c4636e6409e5eba6ca8f1ff9b11302c1b5605089e2cd759fd90d1542a4e2c794fd4c1210f04b056d7e09db54d3e983c34539fb4c24787b659189c27e1b6d0ebc1d1807b38a066c10e62fa57a374c3a7fbc610edddf1dfe900b3c788c8d7d7ded3366449b4520992c5ed7a3136c7103b7a668b591542bba58a32f5a84cb21bbeeafea09dc525d1631a5f413a0f98df43cc90ebf6c4206e6df61606bc634c3a8116c53d7c6dd4bc5b26547cd7d1a1633839ace694b73426267a9f434317350905b905b9c88132be14a7762c2f204b8072a3bd7e4e1d30217d9e60102d525b08e28bcfaabae80fba734a1015d8eb0a7": http2:request body larger than specified content length Hm, I suspect the problem is that we didn't shut down the server due to the error, so the log file was changing while cirrus was trying to upload. Greetings, Andres Freund
Hi, On 2023-02-05 15:30:57 -0800, Andres Freund wrote: > Hm, I suspect the problem is that we didn't shut down the server due to > the error, so the log file was changing while cirrus was trying to > upload. Pushed a fix for that. Greetings, Andres Freund
On Thu, Feb 2, 2023, 15:47 Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Are you still at this? CFbot says the meson tests failed last time for > some reason: > http://commitfest.cputube.org/greg-stark.html On Sat, Feb 04, 2023 at 05:12:36PM +0100, Greg Stark wrote: > I think that was spurious. It looked good when we looked at it yesterday. > The rest that failed seemed unrelated and was also taking on my SSL patch > too. The patch still occasionally fails its tests under freebsd. https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/42/3358
On Wed, 29 Mar 2023 at 17:48, Justin Pryzby <pryzby@telsasoft.com> wrote: > > The patch still occasionally fails its tests under freebsd. > https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/42/3358 So on the one hand, I don't think the plan is to actually commit the tests. They're very specific to one bit of internal implementation and they're the kind of test that makes maintaining the test suite a pain and patches to cause false positives. They're only in the patch I posted at all to demonstrate that the code was actually running at all and having the desired effect. That said I would be a lot more sanguine about this failure if I had any theory for *why* it would fail. And on FreeBSD specifically which is even stranger. Afaict the relfrozenxid should always be our own transaction when the table is created and then again our own new transaction when the table is truncated. And neither the INSERT nor the ANALYZE should be touching relfrozenxid, nor should it be possible autovacuum is interfering given it's a temp table (and we're attached to the schema). And none of this should be platform dependent. I wonder if some other test is behaving differently on FreeBSD and leaving behind a prepared transaction or a zombie session in some idle state or something like that? Is there anything (aside from autovacuum) connecting or running in the background in the test environment that could be creating a transaction id and holding back snapshot xmin? -- greg
On Wed, 5 Apr 2023 at 01:41, Greg Stark <stark@mit.edu> wrote: > > On Wed, 29 Mar 2023 at 17:48, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > The patch still occasionally fails its tests under freebsd. > > https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/42/3358 > > I wonder if some other test is behaving differently on FreeBSD and > leaving behind a prepared transaction or a zombie session in some idle > state or something like that? Is there anything (aside from > autovacuum) connecting or running in the background in the test > environment that could be creating a transaction id and holding back > snapshot xmin? Ok, I've reproduced this here by running the tests under meson. It doesn't look like it's platform dependent. It seems under meson the different test suites are run in parallel or at least isolation/deadlock-parallel are still running stuff when the regression checks are running. If that's not expected then maybe something's not behaving as expected? I've attached pg_stat_activity from during the test run. Regardless it shows these tests are obviously not robust enough to include as they would break for anyone running make installcheck on a non-idle cluster. That's fine, as I said, the tests were just there to give a reviewer more confidence and I think it's fine to just not include them in the commit. -- greg
Вложения
Hi, On 2023-04-05 10:19:10 -0400, Greg Stark wrote: > On Wed, 5 Apr 2023 at 01:41, Greg Stark <stark@mit.edu> wrote: > > > > On Wed, 29 Mar 2023 at 17:48, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > > > The patch still occasionally fails its tests under freebsd. > > > https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/42/3358 > > > > I wonder if some other test is behaving differently on FreeBSD and > > leaving behind a prepared transaction or a zombie session in some idle > > state or something like that? Is there anything (aside from > > autovacuum) connecting or running in the background in the test > > environment that could be creating a transaction id and holding back > > snapshot xmin? > > Ok, I've reproduced this here by running the tests under meson. It > doesn't look like it's platform dependent. > > It seems under meson the different test suites are run in parallel or > at least isolation/deadlock-parallel are still running stuff when the > regression checks are running. If that's not expected then maybe > something's not behaving as expected? I've attached pg_stat_activity > from during the test run. The freebsd test that failed is running tests in parallel, against an existing cluster. In that case it's expected that there's some concurrency. Why does this cause your tests to fail? They're in separate databases, so the visibility effects of the concurrent tests should be somewhat limited. Greetings, Andres Freund
On Wed, 5 Apr 2023 at 11:15, Andres Freund <andres@anarazel.de> wrote: > > The freebsd test that failed is running tests in parallel, against an existing > cluster. In that case it's expected that there's some concurrency. > > Why does this cause your tests to fail? They're in separate databases, so the > visibility effects of the concurrent tests should be somewhat limited. Because I'm checking that relfrozenxid was updated but any concurrent transactions even in other databases hold back the xmin. Honestly I'm glad I wrote the test because it was hard to know whether my code was doing anything at all without it (and it wasn't in the first cut...) But I don't think there's much value in having it be in the regression suite. We don't generally write tests to ensure that a specific internal implementation behaves in the specific way it was written to. -- greg
Hi, On 2023-04-05 13:26:53 -0400, Greg Stark wrote: > On Wed, 5 Apr 2023 at 11:15, Andres Freund <andres@anarazel.de> wrote: > > > > The freebsd test that failed is running tests in parallel, against an existing > > cluster. In that case it's expected that there's some concurrency. > > > > Why does this cause your tests to fail? They're in separate databases, so the > > visibility effects of the concurrent tests should be somewhat limited. > > Because I'm checking that relfrozenxid was updated but any concurrent > transactions even in other databases hold back the xmin. Not if you determine a relation specific xmin, and the relation is not a shared relation. ISTM that the problem here really is that you're relying on RecentXmin, rather than computing something more accurate. Why not use GetOldestNonRemovableTransactionId(rel) - It's a bit more expensive, but I don't think it'll matter compared to the cost of truncating the relation. Somehow it doesn't feel right to use vac_update_relstats() in heapam_handler.c. I also don't like that your patch references heapam_relation_nontransactional_truncate in AddNewRelationTuple() - we shouldn't add more comments piercing tableam than necessary. > Honestly I'm glad I wrote the test because it was hard to know whether > my code was doing anything at all without it (and it wasn't in the > first cut...) But I don't think there's much value in having it be in > the regression suite. We don't generally write tests to ensure that a > specific internal implementation behaves in the specific way it was > written to. To me it seems important to test that your change actually does what it intends to. Possibly the test needs to be relaxed some, but I do think we want tests for the change. Greetings, Andres Freund
On Wed, 5 Apr 2023 at 13:42, Andres Freund <andres@anarazel.de> wrote: > > Not if you determine a relation specific xmin, and the relation is not a > shared relation. > > ISTM that the problem here really is that you're relying on RecentXmin, rather > than computing something more accurate. Why not use > GetOldestNonRemovableTransactionId(rel) - It's a bit more expensive, but I > don't think it'll matter compared to the cost of truncating the relation. Thanks for the review! Hm, I was just copying heapam_handler.c:593 so it would be consistent with what we do when we create a new table. I wasn't aware we had anything that did this extra work I'll look at it. But I'm not sure it's the best idea to decide on how truncate/vacuum/create table work based on what happens to be easier to test. I mean I'm all for testable code but tieing vacuum behaviour to what our test framework happens to not interfere with might be a bit fragile. Like, if we happen to want to change the testing framework I think this demonstrates that it will be super easy for it to break the tests again. And if we discover we have to change the relfrozenxid behaviour it might be hard to keep this test working. > Somehow it doesn't feel right to use vac_update_relstats() in > heapam_handler.c. > > I also don't like that your patch references > heapam_relation_nontransactional_truncate in AddNewRelationTuple() - we > shouldn't add more comments piercing tableam than necessary. I'll take another look at this tomorrow. Probably I can extract the common part of that function or I've misunderstood which bits of code are above or below the tableam. I think fundamentally the hardest bit was that the initial relfrozenxid bubbles up from heapam_handler.c via a return value from set_new_filelocator. So unless I want to add a new tableam method just for relfrozenxid it's a bit awkward to get the right data to AddNewRelationTuple and vac_update_relstats without duplicating code and crosslinking in comments. > To me it seems important to test that your change actually does what it > intends to. Possibly the test needs to be relaxed some, but I do think we want > tests for the change. I missed the comment about relaxing the tests until just now. I'll think about if there's an easy way out in that direction too. If it's cutting it too fine to the end of the commitfest we could always just commit the warnings from the 001 patch which would already be a *huge* help for admins running into this issue. Chag Sameach! -- greg
On Wed, 5 Apr 2023 at 13:42, Andres Freund <andres@anarazel.de> wrote: > > ISTM that the problem here really is that you're relying on RecentXmin, rather > than computing something more accurate. Why not use > GetOldestNonRemovableTransactionId(rel) - It's a bit more expensive, but I > don't think it'll matter compared to the cost of truncating the relation. I'm trying to wrap my head around GetOldestNonRemovableTransactionId() and whether it's the right thing here. This comment is not helping me: /* * Return the oldest XID for which deleted tuples must be preserved in the * passed table. * * If rel is not NULL the horizon may be considerably more recent than * otherwise (i.e. fewer tuples will be removable). In the NULL case a horizon * that is correct (but not optimal) for all relations will be returned. * * This is used by VACUUM to decide which deleted tuples must be preserved in * the passed in table. */ Am I crazy or is the parenthetical comment there exactly backwards? If the horizon is *more recent* then fewer tuples are *non*-removable. I.e. *more* tuples are removable, no? -- greg
On Wed, Apr 12, 2023 at 4:23 PM Greg Stark <stark@mit.edu> wrote: > I'm trying to wrap my head around GetOldestNonRemovableTransactionId() > and whether it's the right thing here. This comment is not helping me: > > /* > * Return the oldest XID for which deleted tuples must be preserved in the > * passed table. > * > * If rel is not NULL the horizon may be considerably more recent than > * otherwise (i.e. fewer tuples will be removable). In the NULL case a horizon > * that is correct (but not optimal) for all relations will be returned. > * > * This is used by VACUUM to decide which deleted tuples must be preserved in > * the passed in table. > */ > > > Am I crazy or is the parenthetical comment there exactly backwards? If > the horizon is *more recent* then fewer tuples are *non*-removable. > I.e. *more* tuples are removable, no? Isn't it the non-parenthetical part that's wrong? I would expect that if we don't know which relation it is, the horizon might be considerably LESS recent, which would result in fewer tuples being removable. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Apr 13, 2023 at 9:45 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Apr 12, 2023 at 4:23 PM Greg Stark <stark@mit.edu> wrote: > > Am I crazy or is the parenthetical comment there exactly backwards? If > > the horizon is *more recent* then fewer tuples are *non*-removable. > > I.e. *more* tuples are removable, no? > > Isn't it the non-parenthetical part that's wrong? I would expect that > if we don't know which relation it is, the horizon might be > considerably LESS recent, which would result in fewer tuples being > removable. You can make arguments for either way of restating it being clearer than the other. Personally I think that the comment should explain what happens when you pass NULL as your relation, rather than explaining what doesn't happen (or does happen?) when you pass a non-NULL relation pointer. That way the just-pass-NULL case can be addressed as the possibly-aberrant case -- the possibly-sloppy approach. You're really supposed to pass a non-NULL relation pointer if at all possible. -- Peter Geoghegan
On Thu, 13 Apr 2023 at 13:01, Peter Geoghegan <pg@bowt.ie> wrote: > > On Thu, Apr 13, 2023 at 9:45 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Wed, Apr 12, 2023 at 4:23 PM Greg Stark <stark@mit.edu> wrote: > > > Am I crazy or is the parenthetical comment there exactly backwards? If > > > the horizon is *more recent* then fewer tuples are *non*-removable. > > > I.e. *more* tuples are removable, no? > > > > Isn't it the non-parenthetical part that's wrong? I would expect that > > if we don't know which relation it is, the horizon might be > > considerably LESS recent, which would result in fewer tuples being > > removable. > > You can make arguments for either way of restating it being clearer > than the other. Yeah, I think Robert is being confused by the implicit double negative. If we don't know which relation it is it's because relation is NULL and the comment is talking about if it's "not NULL". I think you're right that it would be less confusing if it just says "if you pass NULL we have to give a conservative result which means an older xid and fewer removable tuples". But I'm saying the parenthetical part is not just confusing, it's outright wrong. I guess that just means the first half was so confusing it confused not only the reader but the author too. -- greg
On Fri, Apr 14, 2023 at 7:05 AM Greg Stark <stark@mit.edu> wrote: > But I'm saying the parenthetical part is not just confusing, it's > outright wrong. I guess that just means the first half was so > confusing it confused not only the reader but the author too. I knew that that was what you meant. I agree that it's outright wrong. -- Peter Geoghegan
Hi, On 2023-04-14 10:05:08 -0400, Greg Stark wrote: > On Thu, 13 Apr 2023 at 13:01, Peter Geoghegan <pg@bowt.ie> wrote: > > > > On Thu, Apr 13, 2023 at 9:45 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > > > > On Wed, Apr 12, 2023 at 4:23 PM Greg Stark <stark@mit.edu> wrote: > > > > Am I crazy or is the parenthetical comment there exactly backwards? If > > > > the horizon is *more recent* then fewer tuples are *non*-removable. > > > > I.e. *more* tuples are removable, no? > > > > > > Isn't it the non-parenthetical part that's wrong? I would expect that > > > if we don't know which relation it is, the horizon might be > > > considerably LESS recent, which would result in fewer tuples being > > > removable. If rel is *not NULL*, the horizon is more recent - that seems correct? > > You can make arguments for either way of restating it being clearer > > than the other. > > Yeah, I think Robert is being confused by the implicit double > negative. If we don't know which relation it is it's because relation > is NULL and the comment is talking about if it's "not NULL". I think > you're right that it would be less confusing if it just says "if you > pass NULL we have to give a conservative result which means an older > xid and fewer removable tuples". > > But I'm saying the parenthetical part is not just confusing, it's > outright wrong. I guess that just means the first half was so > confusing it confused not only the reader but the author too. I don't think it's outright wrong, but it is very confusing what it relates to. For some reason I tried to "attach" the parenthetical to the "otherwise", which doesn't make a whole lot of sense. How about: * If rel is not NULL the horizon may be considerably more recent (i.e. * allowing more tuples to be removed) than otherwise. In the NULL case a * horizon that is correct (but not optimal) for all relations will be * returned. Thus, if possible, a relation should be provided. Greetings, Andres Freund
On Fri, Apr 14, 2023 at 10:47 AM Andres Freund <andres@anarazel.de> wrote: > I don't think it's outright wrong, but it is very confusing what it relates > to. For some reason I tried to "attach" the parenthetical to the "otherwise", > which doesn't make a whole lot of sense. How about: I suppose that it doesn't matter whether it's outright wrong, or just unclear. Either way it should be improved. > * If rel is not NULL the horizon may be considerably more recent (i.e. > * allowing more tuples to be removed) than otherwise. In the NULL case a > * horizon that is correct (but not optimal) for all relations will be > * returned. Thus, if possible, a relation should be provided. That seems much better to me. The most important part is the last sentence. The key idea is that you as a caller should provide a rel if at all possible (and if not you should feel a pang of guilt). That emphasis makes the potential consequences much more obvious. -- Peter Geoghegan
On Wed, 5 Apr 2023 at 13:42, Andres Freund <andres@anarazel.de> wrote: > > Somehow it doesn't feel right to use vac_update_relstats() in > heapam_handler.c. > > I also don't like that your patch references > heapam_relation_nontransactional_truncate in AddNewRelationTuple() - we > shouldn't add more comments piercing tableam than necessary. I'm really puzzled because this does look like it was in the last patch on the mailing list archive. But it's definitely not the code I have here. I guess I did some cleanup that I never posted, so sorry. I've attached patches using GetOldestNonRemovableTransactinId() and it seems to have fixed the race condition here. At least I can't reproduce it any more. > Not if you determine a relation specific xmin, and the relation is not a > shared relation. > > ISTM that the problem here really is that you're relying on RecentXmin, rather > than computing something more accurate. Why not use > GetOldestNonRemovableTransactionId(rel) - It's a bit more expensive, but I > don't think it'll matter compared to the cost of truncating the relation. I am a bit nervous about the overhead here because if your transaction touched *any* temporary tables then this gets called for *every* temporary table with ON COMMIT DELETE. That could be a lot and it's not obvious to users that having temporary tables will impose an overhead even if they're not actually using them. So I went ahead and used GetOldestNonRemovableTransactionId and tried to do some profiling. But this is on a cassert enabled build with -O0 so it's not serious profiling. I can repeat it on a real build if it matters. But it's been a long time since I've read gprof output. This is for -F PreCommit_on_commit_actions so the percentages are as a percent of just the precommit cleanup: index % time self children called name 0.00 0.00 10102/10102 CommitTransaction (1051) [1] 100.0 0.01 31.47 10102 PreCommit_on_commit_actions [1] 0.01 31.43 10100/10100 heap_truncate [2] 0.00 0.03 1005050/1005260 lappend_oid [325] ----------------------------------------------- 0.01 31.43 10100/10100 PreCommit_on_commit_actions [1] [2] 99.9 0.01 31.43 10100 heap_truncate [2] 0.09 27.30 1005050/1005050 heap_truncate_one_rel [3] 0.20 3.57 1005050/6087120 table_open <cycle 1> [465] 0.01 0.22 1005050/6045137 table_close [48] 0.00 0.03 1005050/1017744 lappend [322] 0.01 0.00 10100/10100 heap_truncate_check_FKs [425] ----------------------------------------------- 0.09 27.30 1005050/1005050 heap_truncate [2] [3] 87.0 0.09 27.30 1005050 heap_truncate_one_rel [3] 0.02 12.23 1005050/1005050 RelationTruncateIndexes [5] 0.06 10.08 1005050/1005050 ResetVacStats [7] 0.03 4.89 1005050/1005050 table_relation_nontransactional_truncate [12] I think this is saying that more than half the time is being spent just checking for indexes. There were no indexes on these temporary tables. Does not having any indexes cause the relcache treat it as a cache miss every time? 0.06 10.08 1005050/1005050 heap_truncate_one_rel [3] [7] 32.2 0.06 10.08 1005050 ResetVacStats [7] 0.02 3.83 1005050/1005250 SearchSysCacheCopy [16] 0.20 3.57 1005050/6087120 table_open <cycle 1> [465] 0.01 2.02 1005050/1005050 heap_inplace_update [35] 0.01 0.22 1005050/6045137 table_close [48] 0.00 0.20 1005050/1005150 GetOldestNonRemovableTransactionId [143] 0.00 0.01 1005050/1005150 GetOurOldestMultiXactId [421] 0.00 0.00 1005050/1008750 ObjectIdGetDatum [816] I guess this means GetOldestNonRemovableTransactionId is not the main cost in ResetVacStats though I don't understand why the syscache would be so slow. I think there's a facility for calculating the Horizons and then reusing them for a while but I don't see how to use that here. It would be appropriate I think. > > > Honestly I'm glad I wrote the test because it was hard to know whether > > my code was doing anything at all without it (and it wasn't in the > > first cut...) But I don't think there's much value in having it be in > > the regression suite. We don't generally write tests to ensure that a > > specific internal implementation behaves in the specific way it was > > written to. > > To me it seems important to test that your change actually does what it > intends to. Possibly the test needs to be relaxed some, but I do think we want > tests for the change. > > Greetings, > > Andres Freund -- greg
Вложения
Hm, in an optimized build using kernel perf I see this. But I don't know how to find what the call sites are for LWLockAcquire/Release. If it's the locks on pgproc that would be kind of bad. I wonder if I should be gathering horizons once in the PrecommitActions and then just using those for every temp table somehow. Perhaps only actually doing an update if the relfrozenxid is actually at least vacuum_freeze_table_age old. 3.98% postgres LWLockAcquire 3.51% postgres LWLockRelease 3.18% postgres hash_search_with_hash_value 2.20% postgres DropRelationLocalBuffers 1.80% [kernel] check_preemption_disabled 1.52% postgres hash_bytes 1.27% postgres LockAcquireExtended 0.97% postgres _bt_compare 0.95% [kernel] kmem_cache_alloc I still think we should be applying the vacuum warning messages to stable and probably backpatching. I've actually heard from other users who have faced the same surprise wraparound shutdown.
2024-01 Commitfest. Hi, this patch was marked in CF as "Needs Review", but there has been no activity on this thread for 9+ months. Since there seems not much interest, I have changed the status to "Returned with Feedback" [1]. Feel free to propose a stronger use case for the patch and add an entry for the same. ====== [1] https://commitfest.postgresql.org/46/3358/ Kind Regards, Peter Smith.