Обсуждение: pgsql: Add pg_relation_check_pages() to check on-disk pages of a relati

Поиск
Список
Период
Сортировка

pgsql: Add pg_relation_check_pages() to check on-disk pages of a relati

От
Michael Paquier
Дата:
Add pg_relation_check_pages() to check on-disk pages of a relation

This makes use of CheckBuffer() introduced in c780a7a, adding a SQL
wrapper able to do checks for all the pages of a relation.  By default,
all the fork types of a relation are checked, and it is possible to
check only a given relation fork.  Note that if the relation given in
input has no physical storage or is temporary, then no errors are
generated, allowing full-database checks when coupled with a simple scan
of pg_class for example.  This is not limited to clusters with data
checksums enabled, as clusters without data checksums can still apply
checks on pages using the page headers or for the case of a page full of
zeros.

This function returns a set of tuples consisting of:
- The physical file where a broken page has been detected (without the
segment number as that can be AM-dependent, which can be guessed from
the block number for heap).  A relative path from PGPATH is used.
- The block number of the broken page.

By default, only superusers have an access to this function but
execution rights can be granted to other users.

The feature introduced here is still minimal, and more improvements
could be done, like:
- Addition of a start and end block number to run checks on a range
of blocks, which would apply only if one fork type is checked.
- Addition of some progress reporting.
- Throttling, with configuration parameters in function input or
potentially some cost-based GUCs.

Regression tests are added for positive cases in the main regression
test suite, and TAP tests are added for cases involving the emulation of
page corruptions.

Bump catalog version.

Author: Julien Rouhaud, Michael Paquier
Reviewed-by: Masahiko Sawada, Justin Pryzby
Discussion: https://postgr.es/m/CAOBaU_aVvMjQn=ge5qPiJOPMmOj5=ii3st5Q0Y+WuLML5sR17w@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/f2b883969557f4572cdfa87e1a40083d2b1272e7

Modified Files
--------------
doc/src/sgml/func.sgml                  |  50 +++++++
src/backend/catalog/system_views.sql    |   9 ++
src/backend/utils/adt/Makefile          |   1 +
src/backend/utils/adt/pagefuncs.c       | 229 +++++++++++++++++++++++++++++++
src/include/catalog/catversion.h        |   2 +-
src/include/catalog/pg_proc.dat         |   7 +
src/test/recovery/t/022_page_check.pl   | 231 ++++++++++++++++++++++++++++++++
src/test/regress/expected/pagefuncs.out |  72 ++++++++++
src/test/regress/parallel_schedule      |   2 +-
src/test/regress/serial_schedule        |   1 +
src/test/regress/sql/pagefuncs.sql      |  41 ++++++
src/tools/pgindent/typedefs.list        |   1 +
12 files changed, 644 insertions(+), 2 deletions(-)


Re: pgsql: Add pg_relation_check_pages() to check on-disk pages of a relati

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> Add pg_relation_check_pages() to check on-disk pages of a relation

Seems to have some issues according to florican:

2020-10-28 00:04:40.336 EDT [27040:3] 022_page_check.pl LOG:  statement: SELECT relname, failed_block_num FROM (SELECT
relname,(pg_catalog.pg_relation_check_pages(oid)).*   FROM pg_class    WHERE relkind in ('r','i', 'm') AND oid >=
16384)AS s 
2020-10-28 00:04:53.191 EDT [27031:4] LOG:  server process (PID 27040) was terminated by signal 11: Segmentation fault
2020-10-28 00:04:53.191 EDT [27031:5] DETAIL:  Failed process was running: SELECT relname, failed_block_num FROM
(SELECTrelname, (pg_catalog.pg_relation_check_pages(oid)).*   FROM pg_class    WHERE relkind in ('r','i', 'm') AND oid
>=16384) AS s 

            regards, tom lane



Re: pgsql: Add pg_relation_check_pages() to check on-disk pages of a relati

От
Michael Paquier
Дата:
On Wed, Oct 28, 2020 at 12:26:29AM -0400, Tom Lane wrote:
> Seems to have some issues according to florican:
>
> 2020-10-28 00:04:53.191 EDT [27031:5] DETAIL:  Failed process was
> running: SELECT relname, failed_block_num FROM (SELECT relname,
> (pg_catalog.pg_relation_check_pages(oid)).*   FROM pg_class    WHERE
> relkind in ('r','i', 'm') AND oid >= 16384) AS s

Yes, thanks.  I was already investigating it.  No need for a
back-trace, I have been able to reproduce it here avec some -m32'ing
with gcc:
#1  0x566e0572 in fill_val (att=0x57d205c0, bit=0x0,
bitmask=0xff9c51e8, dataP=0xff9c521c, infomask=0x57d207e8, datum=0,
isnull=false) at heaptuple.c:287
#2  0x566e066e in heap_fill_tuple (tupleDesc=0x57d2053c,
values=0xff9c52b0, isnull=0xff9c52ae, data=0x57d207fd "",
data_size=28, infomask=0x57d207e8, bit=0x0)
at heaptuple.c:336
#3  0x566e2660 in heap_form_minimal_tuple
(tupleDescriptor=0x57d2053c, values=0xff9c52b0, isnull=0xff9c52ae)
at heaptuple.c:1412
#4  0x56d18e84 in tuplestore_putvalues (state=0x57d20648,
tdesc=0x57d2053c, values=0xff9c52b0, isnull=0xff9c52ae) at
tuplestore.c:756
#5  0x56c2b94b in check_relation_fork (tupdesc=0x57d2053c,
tupstore=0x57d20648, relation=0xeebf346c, forknum=MAIN_FORKNUM) at
pagefuncs.c:222
#6  0x56c2b779 in check_one_relation (tupdesc=0x57d2053c,
tupstore=0x57d20648, relid=16384,
single_forknum=InvalidForkNumber) at pagefuncs.c:148
#7  0x56c2b645 in pg_relation_check_pages (fcinfo=0x57d1c888) at
pagefuncs.c:104
--
Michael

Вложения

Re: pgsql: Add pg_relation_check_pages() to check on-disk pages of a relati

От
Michael Paquier
Дата:
On Wed, Oct 28, 2020 at 01:41:01PM +0900, Michael Paquier wrote:
> Yes, thanks.  I was already investigating it.  No need for a
> back-trace, I have been able to reproduce it here avec some -m32'ing
> with gcc:

And this was just a thinko with one of the GetDatum() calls.  Now
fixed with ce7f772 after checking that 32-bit builds work correctly.
--
Michael

Вложения

Re: pgsql: Add pg_relation_check_pages() to check on-disk pages of a relati

От
Julien Rouhaud
Дата:
Le mer. 28 oct. 2020 à 13:11, Michael Paquier <michael@paquier.xyz> a écrit :
On Wed, Oct 28, 2020 at 01:41:01PM +0900, Michael Paquier wrote:
> Yes, thanks.  I was already investigating it.  No need for a
> back-trace, I have been able to reproduce it here avec some -m32'ing
> with gcc:

And this was just a thinko with one of the GetDatum() calls.  Now
fixed with ce7f772 after checking that 32-bit builds work correctly.

Thanks a lot!