Re: Perl modules for testing/viewing/corrupting/repairing your heapfiles

Поиск
Список
Период
Сортировка
От Mark Dilger
Тема Re: Perl modules for testing/viewing/corrupting/repairing your heapfiles
Дата
Msg-id 913D6F73-8337-4FDA-B11E-EFFCA20E1A44@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Perl modules for testing/viewing/corrupting/repairing your heap files  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: Perl modules for testing/viewing/corrupting/repairing your heap files  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers

> On Apr 14, 2020, at 6:17 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Wed, Apr 8, 2020 at 3:51 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
>> Recently, as part of testing something else, I had need of a tool to create
>> surgically precise corruption within heap pages.  I wanted to make the
>> corruption from within TAP tests, so I wrote the tool as a set of perl modules.
>
> There is also pg_hexedit:
>
> https://github.com/petergeoghegan/pg_hexedit

I steered away from software released under the GPL, such as pg_hexedit, owing to difficulties in getting anything I
developaccepted.  (That's a hard enough problem without licensing issues.).  I'm not taking a political stand for or
againstthe GPL here, just a pragmatic position that I wouldn't be able to integrate pg_hexedit into a postgres
submission.

(Thanks for writing pg_hexedit, BTW.  I'm not criticizing it.)

The purpose of these perl modules is not the viewing of files, but the intentional and targeted corruption of files
fromwithin TAP tests.  There are limited examples of tests in the postgres source tree that intentionally corrupt
files,and as I read them, they employ a blunt force trauma approach: 

In src/bin/pg_basebackup/t/010_pg_basebackup.pl:

> # induce corruption
> system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
> open $file, '+<', "$pgdata/$file_corrupt1";
> seek($file, $pageheader_size, 0);
> syswrite($file, "\0\0\0\0\0\0\0\0\0");
> close $file;
> system_or_bail 'pg_ctl', '-D', $pgdata, 'start';

In src/bin/pg_checksums/t/002_actions.pl:
>     # Time to create some corruption
>     open my $file, '+<', "$pgdata/$file_corrupted";
>     seek($file, $pageheader_size, 0);
>     syswrite($file, "\0\0\0\0\0\0\0\0\0");
>     close $file;

These blunt force trauma tests are fine, as far as they go.  But I wanted to be able to do things like

        # Corrupt the tuple to look like it has lots of attributes, some of
        # them null.  This falsely creates the impression that the t_bits
        # array is longer than just one byte, but t_hoff still says otherwise.
        $tup->{HEAP_HASNULL} = 1;
        $tup->{HEAP_NATTS_MASK} = 0x3FF;
        $tup->{t_bits} = 0xAA;

or

    # Same as above, but this time t_hoff plays along
        $tup->{HEAP_HASNULL} = 1;
        $tup->{HEAP_NATTS_MASK} = 0x3FF;
        $tup->{t_bits} = 0xAA;
        $tup->{t_hoff} = 32;

That's hard to do from a TAP test without modules like this, as you have to calculate by hand the offsets where you're
goingto write the corruption, and the bit pattern you are going to write to that location.  Even if you do all that,
nobodyelse is likely going to be able to read and maintain your tests. 

I'd like an easy way from within TAP tests to selectively corrupt files, to test whether various parts of the system
failgracefully in the presence of corruption.  What happens when a child partition is corrupted?  Does that impact
queriesthat only access other partitions?  What kinds of corruption cause pg_upgrade to fail? ...to expand the scope of
thecorruption?  What happens to logical replication when there is corruption on the primary? ...on the standby?  What
kindsof corruption cause a query to return data from neighboring tuples that the querying role has not permission to
view? What happens when a NAS is only intermittently corrupt? 

The modules I've submitted thus far are incomplete for this purpose.  They don't yet handle toast tables, btree, hash,
gist,gin, fsm, or vm, and I might be forgetting a few other things in the list.  Before I go and implement all of that,
Ithought perhaps others would express preferences about how this should all work, even stuff like, "Don't bother
implementingthat in perl, as I'm reimplementing the entire testing structure in COBOL", or similarly unexpected
feedback.


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Parallel copy
Следующее
От: Robert Haas
Дата:
Сообщение: Re: wrong relkind error messages