Обсуждение: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c
Hi all, Kristian Lejao (colleague, in CC) has found the following assertion failure in postgres_fdw.c when rechecking the result tuple via EvalPlanQual(): TRAP: failed Assert("outerPlan != NULL"), File: "postgres_fdw.c", Line: 2366, PID: 2043518 Here is the reproducible steps that I've simplified from the one Kristian originally created: 1. setup local node: create extension postgres_fdw; create server srv foreign data wrapper postgres_fdw options (host 'localhost', port '5433', dbname 'postgres'); create user mapping for public server srv; create table a (i int primary key); create foreign table b (i int) server srv; create foreign table c (i int) server srv; insert into a values (1); 2. setup remote node: create table b (i int); create table c (i int); insert into b values (1); insert into c values (1); 3. attach to the backend process started on the local node (say conn1) using gdb and set breakpoint at table_tuple_lock(). 4. run the following query on conn1 (which stops before locking the result tuple): select a.i, (select 1 from b, c where a.i = b.i and b.i = c.i) from a for update; 5. on another session, update the tuple concurrently: update a set i = i + 1; -- update 1 tuple 6. continue the query on conn1, the server crashes due to the assertion failure. The plan of the FOR UPDATE query lead to this issue is: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ LockRows (cost=0.00..615886.00 rows=2550 width=14) Output: a.i, ((SubPlan 1)), a.ctid -> Seq Scan on public.a (cost=0.00..615860.50 rows=2550 width=14) Output: a.i, (SubPlan 1), a.ctid SubPlan 1 -> Foreign Scan (cost=100.00..241.50 rows=225 width=4) Output: 1 Relations: (public.b) INNER JOIN (public.c) Remote SQL: SELECT NULL FROM (public.b r1 INNER JOIN public.c r2 ON (((r2.i = $1::integer)) AND ((r1.i = $1::integer)))) (9 rows) The point is that in the subquery in the target list we pushed the inner join to the foreign server. In postgresGetForeignJoinPaths(), we prepare the join path for EvalPlanQual() check (and used in postgresRecheckForeignScan()) if the query is DELETE, UPDATE, or FOR UPDATE/SHARE (as shown below) but we skip it since the subquery itself is parsed as a normal SELECT query without rowMarks, leaving fdw_outerpath of the ForeignScan node NULL: /* * If there is a possibility that EvalPlanQual will be executed, we need * to be able to reconstruct the row using scans of the base relations. * GetExistingLocalJoinPath will find a suitable path for this purpose in * the path list of the joinrel, if one exists. We must be careful to * call it before adding any ForeignPath, since the ForeignPath might * dominate the only suitable local path available. We also do it before * calling foreign_join_ok(), since that function updates fpinfo and marks * it as pushable if the join is found to be pushable. */ if (root->parse->commandType == CMD_DELETE || root->parse->commandType == CMD_UPDATE || root->rowMarks) { epq_path = GetExistingLocalJoinPath(joinrel); Therefore, if the tuple is concurrently updated before taking a lock, we recheck the traversed tuple via EvalPlanQual() but we end up with the assertion failure since we didn't prepare the join plan for that. The attached patch includes the draft fix and regression tests (using injection points). I don't have enough experience with the planner and FDW code area to evaluate whether the patch fixes the issue in the right approach. Feedback is very welcome. I've confirmed this assertion could happen with the same scenario on all supported branches. In addition to that, I realized that none of the regression tests execute postgresRecheckForeignScan()[1]. I think we need to add regression tests to cover that function. Regards, [1] https://coverage.postgresql.org/contrib/postgres_fdw/postgres_fdw.c.gcov.html#2354() -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Вложения
Sawada-san, On Wed, Aug 6, 2025 at 3:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Kristian Lejao (colleague, in CC) has found the following assertion > failure in postgres_fdw.c when rechecking the result tuple via > EvalPlanQual(): > > TRAP: failed Assert("outerPlan != NULL"), File: "postgres_fdw.c", > Line: 2366, PID: 2043518 > > Here is the reproducible steps that I've simplified from the one > Kristian originally created: [...] > The point is that in the subquery in the target list we pushed the > inner join to the foreign server. In postgresGetForeignJoinPaths(), we > prepare the join path for EvalPlanQual() check (and used in > postgresRecheckForeignScan()) if the query is DELETE, UPDATE, or FOR > UPDATE/SHARE (as shown below) but we skip it since the subquery itself > is parsed as a normal SELECT query without rowMarks, leaving > fdw_outerpath of the ForeignScan node NULL: > > /* > * If there is a possibility that EvalPlanQual will be executed, we need > * to be able to reconstruct the row using scans of the base relations. > * GetExistingLocalJoinPath will find a suitable path for this purpose in > * the path list of the joinrel, if one exists. We must be careful to > * call it before adding any ForeignPath, since the ForeignPath might > * dominate the only suitable local path available. We also do it before > * calling foreign_join_ok(), since that function updates fpinfo and marks > * it as pushable if the join is found to be pushable. > */ > if (root->parse->commandType == CMD_DELETE || > root->parse->commandType == CMD_UPDATE || > root->rowMarks) > { > epq_path = GetExistingLocalJoinPath(joinrel); > > Therefore, if the tuple is concurrently updated before taking a lock, > we recheck the traversed tuple via EvalPlanQual() but we end up with > the assertion failure since we didn't prepare the join plan for that. > > The attached patch includes the draft fix and regression tests (using > injection points). > > I don't have enough experience with the planner and FDW code area to > evaluate whether the patch fixes the issue in the right approach. > Feedback is very welcome. I've confirmed this assertion could happen > with the same scenario on all supported branches. Will review. Thank you for the report and patch! > In addition to that, I realized that none of the regression tests > execute postgresRecheckForeignScan()[1]. I think we need to add > regression tests to cover that function. Yeah, I think so too. Best regards, Etsuro Fujita
On Tue, Aug 05, 2025 at 11:00:54AM -0700, Masahiko Sawada wrote: > The attached patch includes the draft fix and regression tests (using > injection points). +$psql_session->query_safe(qq[ + select injection_points_set_local(); + select injection_points_attach('heapam_lock_tuple-before-lock', 'wait'); +]); It seems to me that an isolation test would be a better fit here. TAP tests are usually a good fit for injection points if you need to do direct node manipulations, like restarts or stops. The whole test posted only does SQL-ish things, and you could rely on a loopback server as we do in the SQL tests of postgres_fdw? -- Michael
Вложения
On Wed, Aug 6, 2025 at 8:39 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Aug 05, 2025 at 11:00:54AM -0700, Masahiko Sawada wrote: > > The attached patch includes the draft fix and regression tests (using > > injection points). > > +$psql_session->query_safe(qq[ > + select injection_points_set_local(); > + select injection_points_attach('heapam_lock_tuple-before-lock', 'wait'); > +]); > > It seems to me that an isolation test would be a better fit here. TAP > tests are usually a good fit for injection points if you need to do > direct node manipulations, like restarts or stops. The whole test > posted only does SQL-ish things, and you could rely on a loopback > server as we do in the SQL tests of postgres_fdw? Yes, it's definitely possible to create the test using isolation. I wasn't sure how to invoke isolation tests only when injection points are enabled. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Thu, Aug 07, 2025 at 10:29:41AM -0700, Masahiko Sawada wrote: > Yes, it's definitely possible to create the test using isolation. I > wasn't sure how to invoke isolation tests only when injection points > are enabled. For meson, I think that I would tweak the isolation test list based on get_option('injection_points'), and do a similar thing for Makefile. -- Michael
Вложения
On Thu, Aug 7, 2025 at 4:39 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Aug 07, 2025 at 10:29:41AM -0700, Masahiko Sawada wrote: > > Yes, it's definitely possible to create the test using isolation. I > > wasn't sure how to invoke isolation tests only when injection points > > are enabled. > > For meson, I think that I would tweak the isolation test list based on > get_option('injection_points'), and do a similar thing for Makefile. Thank you for the advice! I've changed the regression tests to use isolation tests. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Вложения
On Fri, Aug 08, 2025 at 12:15:19AM -0700, Masahiko Sawada wrote: > On Thu, Aug 7, 2025 at 4:39 PM Michael Paquier <michael@paquier.xyz> wrote: > > > > On Thu, Aug 07, 2025 at 10:29:41AM -0700, Masahiko Sawada wrote: > > > Yes, it's definitely possible to create the test using isolation. I > > > wasn't sure how to invoke isolation tests only when injection points > > > are enabled. > > > > For meson, I think that I would tweak the isolation test list based on > > get_option('injection_points'), and do a similar thing for Makefile. > > Thank you for the advice! I've changed the regression tests to use > isolation tests. That's enough to reproduce the MULL pointer dereference, thanks! I would suggest to add a description at the top of foreign_recheck, documenting the purpose of the test. +# "s0_lock" execute a FOR UPDATE query but it stops before locking the result s/execute/executes/, with an 's'. Fujita-san, are you planning to look at the proposal? -- Michael
Вложения
On Mon, Aug 18, 2025 at 03:58:45PM +0900, Michael Paquier wrote: > That's enough to reproduce the MULL pointer dereference, thanks! > > I would suggest to add a description at the top of foreign_recheck, > documenting the purpose of the test. > > +# "s0_lock" execute a FOR UPDATE query but it stops before locking the result > s/execute/executes/, with an 's'. Another thing that I've noticed has been forgotten: postgres_fdw's .gitignore needs entries for output_iso/ and tmp_check_iso/. Based on git-prompt, GIT_PS1_SHOWUNTRACKEDFILES reports these paths after a test run. -- Michael
Вложения
Hi, On Mon, Aug 18, 2025 at 3:58 PM Michael Paquier <michael@paquier.xyz> wrote: > Fujita-san, are you planning to look at the proposal? I have just started reviewing the patch. I will provide my first comments by this Friday at the very latest. Best regards, Etsuro Fujita
On Mon, Aug 18, 2025 at 05:42:58PM +0900, Etsuro Fujita wrote: > I have just started reviewing the patch. I will provide my first > comments by this Friday at the very latest. Great, thanks a lot! -- Michael
Вложения
On Mon, Aug 18, 2025 at 12:02 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Aug 18, 2025 at 03:58:45PM +0900, Michael Paquier wrote: > > That's enough to reproduce the MULL pointer dereference, thanks! > > > > I would suggest to add a description at the top of foreign_recheck, > > documenting the purpose of the test. > > > > +# "s0_lock" execute a FOR UPDATE query but it stops before locking the result > > s/execute/executes/, with an 's'. > > Another thing that I've noticed has been forgotten: postgres_fdw's > .gitignore needs entries for output_iso/ and tmp_check_iso/. Based on > git-prompt, GIT_PS1_SHOWUNTRACKEDFILES reports these paths after a > test run. Thank you for reviewing the patch! I'll incorporate these comments in the next version patch. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Fri, Aug 22, 2025 at 3:59 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > On Wed, Aug 6, 2025 at 8:30 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > On Wed, Aug 6, 2025 at 3:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > The attached patch includes the draft fix and regression tests (using > > > injection points). > > > > > > I don't have enough experience with the planner and FDW code area to > > > evaluate whether the patch fixes the issue in the right approach. > > > Feedback is very welcome. I've confirmed this assertion could happen > > > with the same scenario on all supported branches. > > > > Will review. Thank you for the report and patch! > > First, my apologies for the delay. > > I reviewed the postgres_fdw.c part of the fix: > > @@ -6313,9 +6314,18 @@ postgresGetForeignJoinPaths(PlannerInfo *root, > * calling foreign_join_ok(), since that function updates fpinfo and marks > * it as pushable if the join is found to be pushable. > */ > - if (root->parse->commandType == CMD_DELETE || > - root->parse->commandType == CMD_UPDATE || > - root->rowMarks) > + for (PlannerInfo *proot = root; proot != NULL; proot = proot->parent_root) > + { > + if (proot->parse->commandType == CMD_DELETE || > + proot->parse->commandType == CMD_UPDATE || > + proot->rowMarks) > + { > + need_epq = true; > + break; > + } > + } > + > + if (need_epq) > { > epq_path = GetExistingLocalJoinPath(joinrel); > if (!epq_path) > > I think this successfully avoids the assertion failure and produces > the correct result, but sorry, I don't think it's the right way to go. > I think the root cause of this issue is in the EPQ handling of > foreign/custom joins in ExecScanFetch() in execScan.h: it runs the > recheck method function for a given foreign/custom join whenever it is > called for EPQ rechecking, but that is not 100% correct. I think the > correct handling is: run the recheck method function for the join if > it is called for EPQ rechecking and the join is a *descendant* join in > the EPQ plan tree; otherwise run the access method function for the > join even if it is called for EPQ rechecking, like the attached (where > I used the epqParam of the given EPQState to determine whether the > join is a descendant join or not, which localizes the fix pretty > well). For the SELECT FOR UPDATE query shown upthread, when doing an > EPQ recheck, the fix evaluates the sub-select expression in the target > list by doing a remote join, not a local join, so it would work more > efficiently than the fix you proposed. Thank you for reviewing the patch! I've confirmed that your patch fixes the issue too. If I understand your proposed fix correctly, the reported problem is fixed by not rechecking the test tuple by ForeignRecheck() (performing a local join), but instead we simply call ForeignNext() and get the next tuple. I think while this fix would have to have a regression test like I've proposed, it's probably a good time to add some regression tests to cover postgresRecheckForeignScan() too possibly as a separate patch. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Sat, Aug 23, 2025 at 5:54 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Fri, Aug 22, 2025 at 3:59 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > I reviewed the postgres_fdw.c part of the fix: > > > > @@ -6313,9 +6314,18 @@ postgresGetForeignJoinPaths(PlannerInfo *root, > > * calling foreign_join_ok(), since that function updates fpinfo and marks > > * it as pushable if the join is found to be pushable. > > */ > > - if (root->parse->commandType == CMD_DELETE || > > - root->parse->commandType == CMD_UPDATE || > > - root->rowMarks) > > + for (PlannerInfo *proot = root; proot != NULL; proot = proot->parent_root) > > + { > > + if (proot->parse->commandType == CMD_DELETE || > > + proot->parse->commandType == CMD_UPDATE || > > + proot->rowMarks) > > + { > > + need_epq = true; > > + break; > > + } > > + } > > + > > + if (need_epq) > > { > > epq_path = GetExistingLocalJoinPath(joinrel); > > if (!epq_path) > > > > I think this successfully avoids the assertion failure and produces > > the correct result, but sorry, I don't think it's the right way to go. > > I think the root cause of this issue is in the EPQ handling of > > foreign/custom joins in ExecScanFetch() in execScan.h: it runs the > > recheck method function for a given foreign/custom join whenever it is > > called for EPQ rechecking, but that is not 100% correct. I think the > > correct handling is: run the recheck method function for the join if > > it is called for EPQ rechecking and the join is a *descendant* join in > > the EPQ plan tree; otherwise run the access method function for the > > join even if it is called for EPQ rechecking, like the attached (where > > I used the epqParam of the given EPQState to determine whether the > > join is a descendant join or not, which localizes the fix pretty > > well). For the SELECT FOR UPDATE query shown upthread, when doing an > > EPQ recheck, the fix evaluates the sub-select expression in the target > > list by doing a remote join, not a local join, so it would work more > > efficiently than the fix you proposed. > > Thank you for reviewing the patch! I've confirmed that your patch > fixes the issue too. Thanks for testing! > If I understand your proposed fix correctly, the reported problem is > fixed by not rechecking the test tuple by ForeignRecheck() (performing > a local join), but instead we simply call ForeignNext() and get the > next tuple. That's right. > I think while this fix would have to have a regression > test like I've proposed, it's probably a good time to add some > regression tests to cover postgresRecheckForeignScan() too possibly as > a separate patch. Agreed. Actually, we have fixed many EvalPlanQual issues with postgres_fdw so far, but have always left adding test cases for future work, so let's do that now! The test case you showed upthread and added to the patch is useful, so I think we should add it as well, but my question about it is: is it really a good idea to use injection points? Why don't you just use BEGIN/COMMIT statements like this: -- session 1 begin isolation level read committed; update a set i = i + 1; -- session 2 begin isolation level read committed; select a.i, (select 1 from b, c where a.i = b.i and b.i = c.i) from a for update; -- waits for the transaction in session 1 to complete -- session 1 commit; -- session 2 select a.i, (select 1 from b, c where a.i = b.i and b.i = c.i) from a for update; -- produces results after doing an EvalPlanQual recheck i | ?column? ---+---------- 2 | (1 row) Again, my apologies for the late response. Best regards, Etsuro Fujita
On Wed, Sep 17, 2025 at 07:42:36PM +0900, Etsuro Fujita wrote: > The test case you showed upthread and added to the patch is useful, so > I think we should add it as well, but my question about it is: is it > really a good idea to use injection points? > Why don't you just use BEGIN/COMMIT statements like this: > > -- session 1 > begin isolation level read committed; > update a set i = i + 1; > > -- session 2 > begin isolation level read committed; > select a.i, > (select 1 from b, c where a.i = b.i and b.i = c.i) > from a > for update; -- waits for the transaction in session 1 to complete > > -- session 1 > commit; > > -- session 2 > select a.i, > (select 1 from b, c where a.i = b.i and b.i = c.i) > from a > for update; -- produces results after doing an EvalPlanQual recheck > i | ?column? > ---+---------- > 2 | > (1 row) > > Again, my apologies for the late response. As far as I can see, this causes the SELECT FOR UPDATE of session 2 that's waiting for the commit of session 1 to crash, if we don't have the fix, of course. Removing the dependency with injection points is nice if we don't require it, so we can just tweak the isolation test proposed upthread to use the same schema, but the queries you are suggesting. As a whole, +1 for your suggestion. -- Michael
Вложения
On Wed, Sep 17, 2025 at 5:25 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Sep 17, 2025 at 07:42:36PM +0900, Etsuro Fujita wrote: > > The test case you showed upthread and added to the patch is useful, so > > I think we should add it as well, but my question about it is: is it > > really a good idea to use injection points? > > > Why don't you just use BEGIN/COMMIT statements like this: > > > > -- session 1 > > begin isolation level read committed; > > update a set i = i + 1; > > > > -- session 2 > > begin isolation level read committed; > > select a.i, > > (select 1 from b, c where a.i = b.i and b.i = c.i) > > from a > > for update; -- waits for the transaction in session 1 to complete > > > > -- session 1 > > commit; > > > > -- session 2 > > select a.i, > > (select 1 from b, c where a.i = b.i and b.i = c.i) > > from a > > for update; -- produces results after doing an EvalPlanQual recheck > > i | ?column? > > ---+---------- > > 2 | > > (1 row) > > > > Again, my apologies for the late response. > > As far as I can see, this causes the SELECT FOR UPDATE of session 2 > that's waiting for the commit of session 1 to crash, if we don't have > the fix, of course. Removing the dependency with injection points is > nice if we don't require it, so we can just tweak the isolation test > proposed upthread to use the same schema, but the queries you are > suggesting. +1 I changed the regression tests and used the fix proposed by Fujita-san. Please review the attached new version patch. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Вложения
On Tue, Sep 23, 2025 at 02:41:45PM -0700, Masahiko Sawada wrote: > I changed the regression tests and used the fix proposed by > Fujita-san. Please review the attached new version patch. Thanks for the updated patch. Confirmed that I can still blow up the backend as expected when not using the fix, only the test. -- Michael
Вложения
On Wed, Sep 24, 2025 at 6:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I changed the regression tests and used the fix proposed by > Fujita-san. Please review the attached new version patch. Thanks for updating the patch! I took a quick glance at the patch. My initial comment is: it only includes the test case discussed here, but I think it's a good idea to add more cases in it (that exercise code paths in postgresRecheckForeignScan), as I mentioned upthread. What do you think about that? If no objections, I'd like to update it to include such cases. Anyway I'll review it in more detail. Best regards, Etsuro Fujita
On Wed, Sep 24, 2025 at 3:28 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > On Wed, Sep 24, 2025 at 6:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I changed the regression tests and used the fix proposed by > > Fujita-san. Please review the attached new version patch. > > Thanks for updating the patch! > > I took a quick glance at the patch. My initial comment is: it only > includes the test case discussed here, but I think it's a good idea to > add more cases in it (that exercise code paths in > postgresRecheckForeignScan), as I mentioned upthread. What do you > think about that? If no objections, I'd like to update it to include > such cases. +1 to add more test cases for the code paths in postgresRecheckForeignScan. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Wed, Sep 24, 2025 at 09:45:26AM -0700, Masahiko Sawada wrote: > On Wed, Sep 24, 2025 at 3:28 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: >> I took a quick glance at the patch. My initial comment is: it only >> includes the test case discussed here, but I think it's a good idea to >> add more cases in it (that exercise code paths in >> postgresRecheckForeignScan), as I mentioned upthread. What do you >> think about that? If no objections, I'd like to update it to include >> such cases. > > +1 to add more test cases for the code paths in postgresRecheckForeignScan. +1. Would these be included in the same isolation test? That may be cleaner, especially as it seems like the EPQ re-evaluations should be able to happen with the same schema as the one set up by the test. -- Michael
Вложения
On Thu, Sep 25, 2025 at 8:15 AM Michael Paquier <michael@paquier.xyz> wrote: > On Wed, Sep 24, 2025 at 09:45:26AM -0700, Masahiko Sawada wrote: > > On Wed, Sep 24, 2025 at 3:28 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > >> I took a quick glance at the patch. My initial comment is: it only > >> includes the test case discussed here, but I think it's a good idea to > >> add more cases in it (that exercise code paths in > >> postgresRecheckForeignScan), as I mentioned upthread. What do you > >> think about that? If no objections, I'd like to update it to include > >> such cases. > > > > +1 to add more test cases for the code paths in postgresRecheckForeignScan. > > +1. Would these be included in the same isolation test? That may be > cleaner, especially as it seems like the EPQ re-evaluations should be > able to happen with the same schema as the one set up by the test. Yes, I will add those cases to the same isolation test, as you mentioned. Best regards, Etsuro Fujita
On 2025-Sep-24, Etsuro Fujita wrote: > I took a quick glance at the patch. My initial comment is: it only > includes the test case discussed here, but I think it's a good idea to > add more cases in it (that exercise code paths in > postgresRecheckForeignScan), as I mentioned upthread. What do you > think about that? If no objections, I'd like to update it to include > such cases. I would like to suggest a different approach: because this patch is known to fix a server crash in a known case, it would be better to get the fix (and its corresponding test) pushed as is without further delay. Additional tests are an excellent idea, but I think we shouldn't make this patch wait for them. If further fixes are deemed necessary for those other (thus far hypothetical) tests, we can still make them afterwards. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
On Fri, Sep 26, 2025 at 10:34 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > On 2025-Sep-24, Etsuro Fujita wrote: > > > I took a quick glance at the patch. My initial comment is: it only > > includes the test case discussed here, but I think it's a good idea to > > add more cases in it (that exercise code paths in > > postgresRecheckForeignScan), as I mentioned upthread. What do you > > think about that? If no objections, I'd like to update it to include > > such cases. > > I would like to suggest a different approach: because this patch is > known to fix a server crash in a known case, it would be better to get > the fix (and its corresponding test) pushed as is without further delay. > Additional tests are an excellent idea, but I think we shouldn't make > this patch wait for them. If further fixes are deemed necessary for > those other (thus far hypothetical) tests, we can still make them > afterwards. The 2-step approach sounds reasonable, so I'll review the proposed isolation test and report the results first. Thanks! Best regards, Etsuro Fujita
On Wed, Sep 24, 2025 at 6:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I changed the regression tests and used the fix proposed by > Fujita-san. Please review the attached new version patch. I reviewed the test part. Here are my comments about it. You renamed the spec file to concurrent_update.spec. It's a broader name to cover the test case discussed here, but seems a bit vague to me. How about eval_plan_qual.spec like src/test/isolation/specs/eval-plan-qual.spec, instead? Which I think is more specific. In the setup block in the test, you created schemas: + CREATE SCHEMA sch1; + CREATE TABLE sch1.a (i int); + CREATE FOREIGN TABLE sch1.b (i int) SERVER loopback OPTIONS (schema_name 'sch2', table_name 'b'); + CREATE FOREIGN TABLE sch1.c (i int) SERVER loopback OPTIONS (schema_name 'sch2', table_name 'c'); + + CREATE SCHEMA sch2; + CREATE TABLE sch2.b (i int); + CREATE TABLE sch2.c (i int); But actually, we don't need these schemas. As I'm planning to add more permutations using the same setup, and setup is executed once per permutation, I'd like to propose to save cycles by creating all these tables in the current schema like this: CREATE TABLE a (i int); CREATE TABLE b (i int); CREATE TABLE c (i int); CREATE FOREIGN TABLE fb (i int) SERVER loopback OPTIONS (table_name 'b'); CREATE FOREIGN TABLE fc (i int) SERVER loopback OPTIONS (table_name 'c'); +step "s1_lock" { + SELECT a.i, + (SELECT 1 FROM sch1.b AS b, sch1.c AS c WHERE a.i = b.i AND b.i = c.i) + FROM sch1.a as a + FOR UPDATE; +} I think the important point in the test case discussed here is that the sub-select has a foreign-join plan (without an alternative local join plan). So I'd like to propose to add an EXPLAIN command to this step, to confirm that it has such a plan. Also, how about s/s1_lock/s1_tuplock/? This is just my taste, though. +# "s1_lock" waits for concurrent update on the tuple on sch1.a table by +# "s0_update", and once s0 transaction is committed it resumes and does EPQ +# recheck for the locked tuple, which should not use postgresRecheckForeignScan +# as the remote join is not a descendant join in the EPQ plan tree. +permutation "s0_begin" "s0_update" "s1_begin" "s1_lock" "s0_commit" Let's add to the permutation s1_commit as well, which I think is a good practice. Considering the permutation is a typical sequence to exercise an EvalPlanQual recheck, I don't feel the need for the first part of the comments, sorry. Also, IMO I think the second part is a bit too detailed. How about simplifying the comments to something like this (I used a comment for a similar test in src/test/isolation/specs/eval-plan-qual.spec.): "This test exercises EvalPlanQual with a SubLink sub-select (which should be unaffected by any EPQ recheck behavior in the outer query)" Best regards, Etsuro Fujita
On Wed, Oct 1, 2025 at 7:53 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > On Wed, Sep 24, 2025 at 6:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I changed the regression tests and used the fix proposed by > > Fujita-san. Please review the attached new version patch. > > I reviewed the test part. Here are my comments about it. > > You renamed the spec file to concurrent_update.spec. It's a broader > name to cover the test case discussed here, but seems a bit vague to > me. How about eval_plan_qual.spec like > src/test/isolation/specs/eval-plan-qual.spec, instead? Which I think > is more specific. > > In the setup block in the test, you created schemas: > > + CREATE SCHEMA sch1; > + CREATE TABLE sch1.a (i int); > + CREATE FOREIGN TABLE sch1.b (i int) SERVER loopback OPTIONS > (schema_name 'sch2', table_name 'b'); > + CREATE FOREIGN TABLE sch1.c (i int) SERVER loopback OPTIONS > (schema_name 'sch2', table_name 'c'); > + > + CREATE SCHEMA sch2; > + CREATE TABLE sch2.b (i int); > + CREATE TABLE sch2.c (i int); > > But actually, we don't need these schemas. As I'm planning to add > more permutations using the same setup, and setup is executed once per > permutation, I'd like to propose to save cycles by creating all these > tables in the current schema like this: > > CREATE TABLE a (i int); > CREATE TABLE b (i int); > CREATE TABLE c (i int); > CREATE FOREIGN TABLE fb (i int) SERVER loopback OPTIONS (table_name 'b'); > CREATE FOREIGN TABLE fc (i int) SERVER loopback OPTIONS (table_name 'c'); > > +step "s1_lock" { > + SELECT a.i, > + (SELECT 1 FROM sch1.b AS b, sch1.c AS c WHERE a.i = b.i AND b.i = c.i) > + FROM sch1.a as a > + FOR UPDATE; > +} > > I think the important point in the test case discussed here is that > the sub-select has a foreign-join plan (without an alternative local > join plan). So I'd like to propose to add an EXPLAIN command to this > step, to confirm that it has such a plan. > > Also, how about s/s1_lock/s1_tuplock/? This is just my taste, though. > > +# "s1_lock" waits for concurrent update on the tuple on sch1.a table by > +# "s0_update", and once s0 transaction is committed it resumes and does EPQ > +# recheck for the locked tuple, which should not use postgresRecheckForeignScan > +# as the remote join is not a descendant join in the EPQ plan tree. > +permutation "s0_begin" "s0_update" "s1_begin" "s1_lock" "s0_commit" > > Let's add to the permutation s1_commit as well, which I think is a > good practice. > > Considering the permutation is a typical sequence to exercise an > EvalPlanQual recheck, I don't feel the need for the first part of the > comments, sorry. Also, IMO I think the second part is a bit too > detailed. How about simplifying the comments to something like this > (I used a comment for a similar test in > src/test/isolation/specs/eval-plan-qual.spec.): > > "This test exercises EvalPlanQual with a SubLink sub-select (which > should be unaffected by any EPQ recheck behavior in the outer query)" Some more comments: +teardown +{ +} I think it's good to add cleanup statements here; otherwise let's remove this for now, to suppress a "teardown failed:" error. --- a/contrib/postgres_fdw/Makefile +++ b/contrib/postgres_fdw/Makefile @@ -16,6 +16,8 @@ SHLIB_LINK_INTERNAL = $(libpq) EXTENSION = postgres_fdw DATA = postgres_fdw--1.0.sql postgres_fdw--1.0--1.1.sql postgres_fdw--1.1--1.2.sql +ISOLATION = concurrent_update + I'd like to propose to add "ISOLATION_OPTS = --load-extension=postgres_fdw", by which we don't need to do CREATE EXTENSION in the setup block. That's it. Best regards, Etsuro Fujita
On Thu, Oct 2, 2025 at 4:01 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > On Wed, Oct 1, 2025 at 7:53 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > On Wed, Sep 24, 2025 at 6:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > I changed the regression tests and used the fix proposed by > > > Fujita-san. Please review the attached new version patch. > > > > I reviewed the test part. Here are my comments about it. > > > > You renamed the spec file to concurrent_update.spec. It's a broader > > name to cover the test case discussed here, but seems a bit vague to > > me. How about eval_plan_qual.spec like > > src/test/isolation/specs/eval-plan-qual.spec, instead? Which I think > > is more specific. > > > > In the setup block in the test, you created schemas: > > > > + CREATE SCHEMA sch1; > > + CREATE TABLE sch1.a (i int); > > + CREATE FOREIGN TABLE sch1.b (i int) SERVER loopback OPTIONS > > (schema_name 'sch2', table_name 'b'); > > + CREATE FOREIGN TABLE sch1.c (i int) SERVER loopback OPTIONS > > (schema_name 'sch2', table_name 'c'); > > + > > + CREATE SCHEMA sch2; > > + CREATE TABLE sch2.b (i int); > > + CREATE TABLE sch2.c (i int); > > > > But actually, we don't need these schemas. As I'm planning to add > > more permutations using the same setup, and setup is executed once per > > permutation, I'd like to propose to save cycles by creating all these > > tables in the current schema like this: > > > > CREATE TABLE a (i int); > > CREATE TABLE b (i int); > > CREATE TABLE c (i int); > > CREATE FOREIGN TABLE fb (i int) SERVER loopback OPTIONS (table_name 'b'); > > CREATE FOREIGN TABLE fc (i int) SERVER loopback OPTIONS (table_name 'c'); > > > > +step "s1_lock" { > > + SELECT a.i, > > + (SELECT 1 FROM sch1.b AS b, sch1.c AS c WHERE a.i = b.i AND b.i = c.i) > > + FROM sch1.a as a > > + FOR UPDATE; > > +} > > > > I think the important point in the test case discussed here is that > > the sub-select has a foreign-join plan (without an alternative local > > join plan). So I'd like to propose to add an EXPLAIN command to this > > step, to confirm that it has such a plan. > > > > Also, how about s/s1_lock/s1_tuplock/? This is just my taste, though. > > > > +# "s1_lock" waits for concurrent update on the tuple on sch1.a table by > > +# "s0_update", and once s0 transaction is committed it resumes and does EPQ > > +# recheck for the locked tuple, which should not use postgresRecheckForeignScan > > +# as the remote join is not a descendant join in the EPQ plan tree. > > +permutation "s0_begin" "s0_update" "s1_begin" "s1_lock" "s0_commit" > > > > Let's add to the permutation s1_commit as well, which I think is a > > good practice. > > > > Considering the permutation is a typical sequence to exercise an > > EvalPlanQual recheck, I don't feel the need for the first part of the > > comments, sorry. Also, IMO I think the second part is a bit too > > detailed. How about simplifying the comments to something like this > > (I used a comment for a similar test in > > src/test/isolation/specs/eval-plan-qual.spec.): > > > > "This test exercises EvalPlanQual with a SubLink sub-select (which > > should be unaffected by any EPQ recheck behavior in the outer query)" > > Some more comments: > > +teardown > +{ > +} > > I think it's good to add cleanup statements here; otherwise let's > remove this for now, to suppress a "teardown failed:" error. > > --- a/contrib/postgres_fdw/Makefile > +++ b/contrib/postgres_fdw/Makefile > @@ -16,6 +16,8 @@ SHLIB_LINK_INTERNAL = $(libpq) > EXTENSION = postgres_fdw > DATA = postgres_fdw--1.0.sql postgres_fdw--1.0--1.1.sql > postgres_fdw--1.1--1.2.sql > > +ISOLATION = concurrent_update > + > > I'd like to propose to add "ISOLATION_OPTS = > --load-extension=postgres_fdw", by which we don't need to do CREATE > EXTENSION in the setup block. > > That's it. > Thank you for reviewing the patch! I've updated the patch based on your comments. Please find the attached patch. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Вложения
On Thu, Oct 02, 2025 at 03:20:32PM -0700, Masahiko Sawada wrote: > I've updated the patch based on your comments. Please find the attached patch. Looks sensible here in shape, a few style nits noticed while reading. x +# This test exerices EvalPlanQual with a SubLink sub-select (which should +# be unaffected by any EPQ recheck behavior in the outer query) The setup phase has a mix of tabs (used in the DO block) and spaces used, that may render weirdly depending on one's setup when showing the diffs. It looks to me that your intention is to use spaces. s/exerices/exercises/ And I would add a short explanation about the reason why this script exists at the top of eval_plan_qual.spec. If you don't feel strongly about that, feel free to ignore me. One other bit I have noticed on the way, due to the extra whitespaces at the bottom of the file: $ git diff master --check contrib/postgres_fdw/specs/eval_plan_qual.spec:52: new blank line at EOF. -- Michael
Вложения
On Fri, Oct 3, 2025 at 7:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I've updated the patch based on your comments. Please find the attached patch. Thanks for updating the patch! +step "s1_tuplock" { + EXPLAIN (COSTS OFF, ANALYZE ON, TIMING OFF, SUMMARY OFF, BUFFERS OFF) + SELECT a.i, + (SELECT 1 FROM fb, fc WHERE a.i = fb.i AND fb.i = fc.i) + FROM a as a + FOR UPDATE; +} Maybe my comment about this step was not enough, but I'm wondering we should run EXPLAIN and then SELECT here like below, rather than running EXPLAIN ANALYZE, as that seems more usual to me: step "s1_tuplock" { EXPLAIN (VERBOSE, COSTS OFF) SELECT a.i, (SELECT 1 FROM fb, fc WHERE a.i = fb.i AND fb.i = fc.i) FROM a FOR UPDATE; SELECT a.i, (SELECT 1 FROM fb, fc WHERE a.i = fb.i AND fb.i = fc.i) FROM a FOR UPDATE; } I added the VERBOSE option to show the remote query, and removed the alias for a. BTW: you added quotation marks around a name for each session or step like "s1_tuplock". Do we really need them? This is nitpicking, though. Best regards, Etsuro Fujita