Обсуждение: SQL workflow for crash testing correctness
Good evening PGSQL admin email distribution list,
I have built an HA cluster setup. I would like to instrument a workflow to test for lost or duplicated writes.
Does anyone know of prior art that does this?
Does anyone have thoughts on how to model this? My initial thoughts were to find the serialization tests in the Postgres project core.
Thanks in advance for any assistance anyone can provide,
Joseph Hammerman
On Wed, Sep 18, 2019 at 3:27 AM Joseph Hammerman <jhammerman@squarespace.com> wrote: > Does anyone have thoughts on how to model this? My initial thoughts were to find the serialization tests in the Postgresproject core. It's quite hard to suggest without knowing what you are trying to achieve. I would however look for an inspiration on the test suite for PostgreSQL-XC, if available. Hope it helps. Luca
Good afternoon Luca,
Thanks for the response.
I would then like to have a crash test suite, that instruments partial and full network partitions in addition to process and machine crashes.
I'll have a look at that projects code, thank you! Please let me know if you have any other thoughts or links or anything of that nature.
Regards,
Joe Hammerman
On Wed, Sep 18, 2019 at 11:22 AM Luca Ferrari <fluca1978@gmail.com> wrote:
On Wed, Sep 18, 2019 at 3:27 AM Joseph Hammerman
<jhammerman@squarespace.com> wrote:
> Does anyone have thoughts on how to model this? My initial thoughts were to find the serialization tests in the Postgres project core.
It's quite hard to suggest without knowing what you are trying to achieve.
I would however look for an inspiration on the test suite for
PostgreSQL-XC, if available.
Hope it helps.
Luca
On Tue, Sep 17, 2019 at 9:27 PM Joseph Hammerman <jhammerman@squarespace.com> wrote:
Good evening PGSQL admin email distribution list,I have built an HA cluster setup. I would like to instrument a workflow to test for lost or duplicated writes.Does anyone know of prior art that does this?
I have a testing framework which injects faults under high load, and then tests to see that automatic recovery happens correctly. I have used it to find several bugs, but haven't turned up any in the last couple releases (likely because improved regression tests are now catching them before I get a chance to). I've always just tested this as crash recovery within a single instance, but I think there is no reason the technique couldn't be used for multiple instances is well. You can search for my name and "count.pl" on the hackers list to find multiple example of the testing harness. The nature of the fault injected (torn page writes) is just a function of what I was working on at the time I wrote it, most of the bugs uncovered had nothing to do with the exact thing which caused the crash.
Does anyone have thoughts on how to model this? My initial thoughts were to find the serialization tests in the Postgres project core.
Looking at the core regression tests may also be a good idea. Of course then you would have to ponder, if you test the same way as they do, will you find different bugs from what they find? So I would view it more as inspiration than as instructions.
Cheers,
Jeff
Thanks Jeff!
On Wed, Sep 18, 2019 at 2:39 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Sep 17, 2019 at 9:27 PM Joseph Hammerman <jhammerman@squarespace.com> wrote:Good evening PGSQL admin email distribution list,I have built an HA cluster setup. I would like to instrument a workflow to test for lost or duplicated writes.Does anyone know of prior art that does this?I have a testing framework which injects faults under high load, and then tests to see that automatic recovery happens correctly. I have used it to find several bugs, but haven't turned up any in the last couple releases (likely because improved regression tests are now catching them before I get a chance to). I've always just tested this as crash recovery within a single instance, but I think there is no reason the technique couldn't be used for multiple instances is well. You can search for my name and "count.pl" on the hackers list to find multiple example of the testing harness. The nature of the fault injected (torn page writes) is just a function of what I was working on at the time I wrote it, most of the bugs uncovered had nothing to do with the exact thing which caused the crash.Does anyone have thoughts on how to model this? My initial thoughts were to find the serialization tests in the Postgres project core.Looking at the core regression tests may also be a good idea. Of course then you would have to ponder, if you test the same way as they do, will you find different bugs from what they find? So I would view it more as inspiration than as instructions.Cheers,Jeff