Обсуждение: Re: Issue with markers in isolation tester? Or not?
On Mon, Jan 13, 2025 at 02:02:00AM +0100, Michail Nikolaev wrote: > While stabilizing tests for [0] I have noticed unclear (and confusing in my > opinion) behavior of markers in the isolation tester. > > I have attached a test with reproducer. > > There are two permutations in the test: > > permutation > after(before) > before > detach1 > wakeup1 > detach2 > wakeup2 > > In that case, I see expected results: > step before: <... completed> > step after: <... completed> > > But changing the order of steps slightly: > > permutation > after(before) > wakeup1 <------- wakeup moved here > before > detach1 > detach2 > wakeup2 > > makes "after" to be completed before "before". Does that make sense? Yes. I don't see a good reason for isolationtester to disregard that "(before)" marker, so I think you've found a bug in isolationtester. How might we fix it?
On Tue, Jan 14, 2025 at 12:16:22PM +0100, Michail Nikolaev wrote: > --- a/src/test/isolation/README > +++ b/src/test/isolation/README > @@ -183,9 +183,9 @@ A marker consisting solely of a step name indicates that this step may > not be reported as completing until that other step has completed. > This allows stabilizing cases where two queries might be seen to complete > in either order. Note that this step can be *launched* before the other > -step has completed. (If the other step is used more than once in the > -current permutation, this step cannot complete while any of those > -instances is active.) > +step has completed or launched. (If the other step is used more than once > +in the current permutation, this step cannot complete while any of those > +instances are not yet complete.) I misunderstood, and I was mistaken to see this as a bug fix. The isolationtester is acting per its definition, and this would be a definition change. Do others have opinions on the merits of today's definition vs. the proposed definition? I'd need to review the motivating test to form my own opinion on whether the new definition makes it easier to write tests. I'm optimistic that it does make things easier, because I think one can get the old behavior by defining two steps with the same commands. Thanks for the patch. You could bundle this in the thread that wants this to stabilize that thread's CI results. As a side note on the broader topic of the difficulty of stabilizing isolation test race conditions, I've thought it would be helpful to have a "strict blocking condition mode" that complains at the end of the test run if the spec had completion order ambiguities. I think these are the rules sufficient to have a fully-specified completion order: 1. Every step that waits needs a blocking condition. 2. No two steps A & B name the same step C as a blocking condition. Instead, make B block on A. 3. If a step appears in another step's blocking condition and is not the last step listed in the permutation, the step listed after the unblocker needs a blocking condition on a formerly-blocked step. > +session s1 > +step after { SELECT injection_points_run('injection-points-wait-1'); } FYI, I prefer the convention of ending each step name with the session number, i.e. s/after/after1/ here. The first isolation specs used that style. I mildly prefer it over the newer alternative styles, and I strongly prefer styles that convey the session identity over styles that don't. No need to send a new patch version, though.
Hello, Noah!
> I misunderstood, and I was mistaken to see this as a bug fix. The
> isolationtester is acting per its definition, and this would be a definition
> change.
> isolationtester is acting per its definition, and this would be a definition
> change.
Yes, you are right, but I think it is better to clarify it somehow because from my point of view that definition feels like logic in patched version.
But maybe my non-native English prevents me from understanding it correctly.
In my opinion, it is better to add some clarification like "this marker affects only other steps which were launched before this step or during its execution".
> I'd need to review the motivating test to form my own opinion on whether the
> new definition makes it easier to write tests.> You could bundle this in the thread that wants this to stabilize that thread's CI results.
I'll think about it, but currently it is stabilized using "notices" and, you know, more changes - harder to merge :)
Also, I'll try one more time to stabilize it without "notices" - maybe I'll get some insight.
Anyway - I'll ping you in that thread [0] (you're already in recipients) or bring it here.
Best regards,
Mikhail.
Michael Paquier <michael@paquier.xyz> writes: > On Tue, Jan 14, 2025 at 11:48:28AM -0800, Noah Misch wrote: >> I misunderstood, and I was mistaken to see this as a bug fix. The >> isolationtester is acting per its definition, and this would be a definition >> change. Do others have opinions on the merits of today's definition vs. the >> proposed definition? > I agree that Michail's case with the handling of the markers is kind > of strange in the case he has reported. Anyway, do you think that it > would be a good idea to change that knowing for how long we've relied > on isolationtester to do things the way they are? We've changed it before, so I'm not totally averse to considering more change. But I'm not happy about the lack of any analysis in this thread of the impact on existing test cases. In particular, it seems to me that this proposal amounts to reducing the number of states that can be reached during an isolation test. So it seems to me that means strictly less coverage than we had before. I'd prefer to start by asking why the proposed new test can't be made stable without such a restriction. (I don't mean to minimize the question of whether we need better documentation about this behavior.) regards, tom lane
On Wed, Jan 15, 2025 at 02:41:31PM +0900, Michael Paquier wrote: > On Tue, Jan 14, 2025 at 11:48:28AM -0800, Noah Misch wrote: > > I misunderstood, and I was mistaken to see this as a bug fix. The > > isolationtester is acting per its definition, and this would be a definition > > change. Do others have opinions on the merits of today's definition vs. the > > proposed definition? > > I agree that Michail's case with the handling of the markers is kind > of strange in the case he has reported. Anyway, do you think that it > would be a good idea to change that knowing for how long we've relied > on isolationtester to do things the way they are? I think that would depend on what proposed tests become easier. Since the original poster used the "notices" marker type to solve the motivating use case, let's shelve this until there's new demand. If we ever unshelve this and find much in today's specs that would malfunction under the new definition, we might offer both definitions. A new marker syntax would reach the new definition, and old specs would work as before. > If we do it only on > HEAD, it would make the back-patching of newly-added tests for bug > fixes potentially more complex to deal with. If applying the new > definition across all the stable branches, this could impact something > outside code :/ When I'm exposed to non-core tests, those tests usually target many major versions simultaneously. Having the spec meaning vary across major versions would hurt them more than it would help. Hence, back-patch if we do adopt the change. As above, the impact on existing specs could inform whether we add syntax vs. redefine existing syntax.
Hello, everyone!
I was able to stabilize (I hope so) tests on [0] without any changes to isolationtester and without "notices N".
So, I agree, it is better to put it on a shelf now. But a few words in documentation may be a good idea.
Best regards,
Mikhail.
Hello, Michael!
> Could you summarize here what you have done to achieve test
> stabilization in your new patch set posted at [1] without using the
> proposal of this thread?
> stabilization in your new patch set posted at [1] without using the
> proposal of this thread?
Mostly idea is next:
Let's imagine we have two steps - step_before and step_after which may end in either order.
Then instead of such step/markers structure:
step_before(step_after)
stepN
stepN+1
step_after
use the next:
step_before
stepN
stepN+1
step_after(step_before)
In the first case, there are two possible results:
1) step_before is finished before step_after - reported as step_before, step_after
2) step_after is launched before step_before ends - reported as step_after, step_before
But in the case second variant:
1) step_before is finished before step_after - reported as step_before, step_after
2) step_after is launched before step_before ends - reported as step_before, step_after
So, the second option provides the same result regardless of order of finishing of step_before and step_after, which is the thing I want to achieve here.
Best regards,
Mikhail.
On Sun, Jan 19, 2025 at 07:26:58PM +0100, Michail Nikolaev wrote: > Mostly idea is next: > > Let's imagine we have two steps - step_before and step_after which may end > in either order. > Then instead of such step/markers structure: > > step_before(step_after) > stepN > stepN+1 > step_after > > use the next: > > step_before > stepN > stepN+1 > step_after(step_before) > > In the first case, there are two possible results: > 1) step_before is finished before step_after - reported as step_before, > step_after > 2) step_after is launched before step_before ends - reported as step_after, > step_before > > But in the case second variant: > 1) step_before is finished before step_after - reported as step_before, > step_after > 2) step_after is launched before step_before ends - reported as > step_before, step_after > > So, the second option provides the same result regardless of order of > finishing of step_before and step_after, which is the thing I want to > achieve here. Stepping back, any variation in the total ordering of step-start and step-complete events necessarily changes the expected output. Hence, we want to freeze that ordering. We could do that directly, by having isolationtester read a format that lists start and complete events. We could also have the isolationtester write the same format, so you could draft a permutation without blocking conditions and just save the order that isolationtester discovers by experiment. I'd find that less mentally taxing than following rules to reach a frozen ordering via blocking conditions. As an example of one concrete way to achieve that, have isolationtester write a results/SPECNAME.wait file that contains syntax like "permutation-event waiting:S1 S2 complete:S1 S3". You'd copy that file to expected/SPECNAME.wait, which isolationtester would read at startup. This wouldn't replace "notices" blocking conditions for certain complex tests, and tests with multiple expected outputs might not want it. It would cover the rest. Any unprefixed entry (e.g. the plain "S2") would get the simplification of skipping the check for whether the step has entered a wait. How well would that meet your test's needs?
Hello, Noah!
> Stepping back, any variation in the total ordering of step-start and
> step-complete events necessarily changes the expected output. Hence, we want
> to freeze that ordering. We could do that directly, by having isolationtester
> read a format that lists start and complete events. We could also have the
> isolationtester write the same format, so you could draft a permutation
> without blocking conditions and just save the order that isolationtester
> discovers by experiment. I'd find that less mentally taxing than following
> rules to reach a frozen ordering via blocking conditions.
> How well would that meet your test's needs?
It feels a bit strange to me to force isolationtest to output something in a specific order, especially since we don't actually care about the order in the test.
But why do we need that output at all? I think a simpler solution would be to mark a step in a way that prevents isolationtester from outputting detailed information about the completion of that step—just an overview of the steps launched and completed at the end of the permutation.
In my tests, I evaluate how [CREATE|RE] INDEX CONCURRENTLY interferes with UPSERTs. I’m not testing the exact ordering; I simply set the backends into a specific state (with different index information) and let them proceed one by one. I don’t care which one finishes first or last—I’m only interested in identifying errors in the output.
Best regards,
Mikhail.
> Stepping back, any variation in the total ordering of step-start and
> step-complete events necessarily changes the expected output. Hence, we want
> to freeze that ordering. We could do that directly, by having isolationtester
> read a format that lists start and complete events. We could also have the
> isolationtester write the same format, so you could draft a permutation
> without blocking conditions and just save the order that isolationtester
> discovers by experiment. I'd find that less mentally taxing than following
> rules to reach a frozen ordering via blocking conditions.
> How well would that meet your test's needs?
It feels a bit strange to me to force isolationtest to output something in a specific order, especially since we don't actually care about the order in the test.
But why do we need that output at all? I think a simpler solution would be to mark a step in a way that prevents isolationtester from outputting detailed information about the completion of that step—just an overview of the steps launched and completed at the end of the permutation.
In my tests, I evaluate how [CREATE|RE] INDEX CONCURRENTLY interferes with UPSERTs. I’m not testing the exact ordering; I simply set the backends into a specific state (with different index information) and let them proceed one by one. I don’t care which one finishes first or last—I’m only interested in identifying errors in the output.
Best regards,
Mikhail.
On Sun, Jan 19, 2025 at 09:55:32PM +0100, Michail Nikolaev wrote: > > Stepping back, any variation in the total ordering of step-start and > > step-complete events necessarily changes the expected output. Hence, we > want > > to freeze that ordering. We could do that directly, by having > isolationtester > > read a format that lists start and complete events. We could also have > the > > isolationtester write the same format, so you could draft a permutation > > without blocking conditions and just save the order that isolationtester > > discovers by experiment. I'd find that less mentally taxing than > following > > rules to reach a frozen ordering via blocking conditions. > > > How well would that meet your test's needs? > > It feels a bit strange to me to force isolationtest to output something in > a specific order, especially since we don't actually care about the order > in the test. > > But why do we need that output at all? It's good to think of ways we can output differently for more stability. However, ordered output helps the human studying what happened. Today's output is well-optimized for that. Today's output resembles what a person would see if running a collection of psql sessions. For example, if I'm reviewing a patch, reviewing the expected spec output feels natural. > I think a simpler solution would be > to mark a step in a way that prevents isolationtester from outputting > detailed information about the completion of that step—just an overview of > the steps launched and completed at the end of the permutation. My main concern about that is the difficulty of marking steps: - Best: no markings - Medium: easy rules for when you need a marking - Worst: run on FreeBSD to determine marking requirements by experiment I've not been able to identify an easy rule that avoids "run on FreeBSD" with today's framework. (I have a non-easy list of four rules, but I'm not confident even that list is complete.) I can't think of an easy rule for "just an overview" markings, either. Even if there is one, not needing markings is better, all else being equal. What other advantages may compensate for this concern? There's another, minor concern. The more "just an overview" markings in a spec, the more reading its expected output wouldn't feel like reading psql transcripts. I'd likely end up logging both today's ordered format and the new format. We'd diff the new format only, and we'd write the old format for human debugging. > In my tests, I evaluate how [CREATE|RE] INDEX CONCURRENTLY interferes with > UPSERTs. I’m not testing the exact ordering; I simply set the backends into > a specific state (with different index information) and let them proceed > one by one. I don’t care which one finishes first or last—I’m only > interested in identifying errors in the output. That makes sense. Thanks, nm