Обсуждение: pg_checksums (or checksums in general) vs tableam

Поиск
Список
Период
Сортировка

pg_checksums (or checksums in general) vs tableam

От
Magnus Hagander
Дата:
How is this intended to work?

pg_checksums enumerate the files. What if there are files there from a different tableam? Isn't pg_checksums just going to badly fail then, since it assumes everything is heap?

Also, do we allow AMs that don't support checksumming data? Do we have any checks for tables created with such AMs in a system that has checksums enabled?

Re: pg_checksums (or checksums in general) vs tableam

От
Michael Paquier
Дата:
On Wed, Jul 10, 2019 at 11:42:34AM +0200, Magnus Hagander wrote:
> pg_checksums enumerate the files. What if there are files there from a
> different tableam? Isn't pg_checksums just going to badly fail then, since
> it assumes everything is heap?
>
> Also, do we allow AMs that don't support checksumming data? Do we have any
> checks for tables created with such AMs in a system that has checksums
> enabled?

Table AMs going through shared buffers and smgr.c, like zedstore,
share the same page header, meaning that the on-disk file is the same
as heap, and that checksums are compiled similarly to heap.
pg_checksums is not going to complain on those ones and would work
just fine.

Table AMs using their own storage layer (which would most likely use
their own checksum method normally?) would be ignored by pg_checksums
if the file names don't match what smgr uses, but it could result in
failures if they use on-disk file names which match.
--
Michael

Вложения

Re: pg_checksums (or checksums in general) vs tableam

От
Magnus Hagander
Дата:
On Wed, Jul 10, 2019 at 3:05 PM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Jul 10, 2019 at 11:42:34AM +0200, Magnus Hagander wrote:
> pg_checksums enumerate the files. What if there are files there from a
> different tableam? Isn't pg_checksums just going to badly fail then, since
> it assumes everything is heap?
>
> Also, do we allow AMs that don't support checksumming data? Do we have any
> checks for tables created with such AMs in a system that has checksums
> enabled?

Table AMs going through shared buffers and smgr.c, like zedstore,
share the same page header, meaning that the on-disk file is the same
as heap, and that checksums are compiled similarly to heap.
pg_checksums is not going to complain on those ones and would work
just fine.

Table AMs using their own storage layer (which would most likely use
their own checksum method normally?) would be ignored by pg_checksums
if the file names don't match what smgr uses, but it could result in
failures if they use on-disk file names which match.

That would be fine, if we actually knew. Should we (or have we already?) defined a rule that they are not allowed to use the same naming standard unless they have the same type of header?

--

Re: pg_checksums (or checksums in general) vs tableam

От
Andres Freund
Дата:
Hi,

On July 10, 2019 9:12:18 AM PDT, Magnus Hagander <magnus@hagander.net> wrote:
>On Wed, Jul 10, 2019 at 3:05 PM Michael Paquier <michael@paquier.xyz>
>wrote:
>
>> On Wed, Jul 10, 2019 at 11:42:34AM +0200, Magnus Hagander wrote:
>> > pg_checksums enumerate the files. What if there are files there
>from a
>> > different tableam? Isn't pg_checksums just going to badly fail
>then,
>> since
>> > it assumes everything is heap?
>> >
>> > Also, do we allow AMs that don't support checksumming data? Do we
>have
>> any
>> > checks for tables created with such AMs in a system that has
>checksums
>> > enabled?
>>
>> Table AMs going through shared buffers and smgr.c, like zedstore,
>> share the same page header, meaning that the on-disk file is the same
>> as heap, and that checksums are compiled similarly to heap.
>> pg_checksums is not going to complain on those ones and would work
>> just fine.
>
>
>> Table AMs using their own storage layer (which would most likely use
>> their own checksum method normally?) would be ignored by pg_checksums
>> if the file names don't match what smgr uses, but it could result in
>> failures if they use on-disk file names which match.
>>
>
>That would be fine, if we actually knew. Should we (or have we
>already?)
>defined a rule that they are not allowed to use the same naming
>standard
>unless they have the same type of header?

No, don't think we have already. There's the related problem of what to include in base backups, too.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: pg_checksums (or checksums in general) vs tableam

От
Michael Paquier
Дата:
On Wed, Jul 10, 2019 at 09:19:03AM -0700, Andres Freund wrote:
> On July 10, 2019 9:12:18 AM PDT, Magnus Hagander <magnus@hagander.net> wrote:
>> That would be fine, if we actually knew. Should we (or have we already?)
>> defined a rule that they are not allowed to use the same naming standard
>> unless they have the same type of header?
>
> No, don't think we have already.  There's the related problem of
> what to include in base backups, too.

Yes.  This one needs a careful design and I am not sure exactly what
that would be.  At least one new callback would be needed, called from
basebackup.c to decide if a given file should be backed up or not
based on a path.  But then how do you make sure that a path applies to
one table AM or another, by using a regex given by all table AMs to
see if there is a match?  How do we handle conflicts?  I am not sure
either that it is a good design to restrict table AMs to have a given
format for paths as that actually limits the possibilities when it
comes to split across data across multiple files for attributes and/or
tablespaces.  (I am a pessimistic guy by nature.)
--
Michael

Вложения

Re: pg_checksums (or checksums in general) vs tableam

От
Magnus Hagander
Дата:


On Thu, Jul 11, 2019 at 2:30 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Jul 10, 2019 at 09:19:03AM -0700, Andres Freund wrote:
> On July 10, 2019 9:12:18 AM PDT, Magnus Hagander <magnus@hagander.net> wrote:
>> That would be fine, if we actually knew. Should we (or have we already?)
>> defined a rule that they are not allowed to use the same naming standard
>> unless they have the same type of header?
>
> No, don't think we have already.  There's the related problem of
> what to include in base backups, too.

Yes.  This one needs a careful design and I am not sure exactly what
that would be.  At least one new callback would be needed, called from
basebackup.c to decide if a given file should be backed up or not
based on a path.

That wouldn't be at all enough, of course. We have to think of everybody who uses the pg_start_backup/pg_stop_backup functions (including the deprecated versions we don't want to get rid of :P). So whatever it is it has to be externally reachable. And just calling something before you start your backup won't be enough, as there can be files showing up during the backup etc.

Having a strict naming standard would help a lot with that, then you'd just need the metadata. For example, one could say that each non-default storage engine has to put all their files in a subdirectory, and inside that subdirectory they can name them whatever they want. If we do that, then all a backup tool would need to know about is all the possible subdirectories in the current installation (and *that* doesn't change frequently).

 
  But then how do you make sure that a path applies to
one table AM or another, by using a regex given by all table AMs to
see if there is a match?  How do we handle conflicts?  I am not sure
either that it is a good design to restrict table AMs to have a given
format for paths as that actually limits the possibilities when it
comes to split across data across multiple files for attributes and/or
tablespaces.  (I am a pessimistic guy by nature.)

As long as the restriction contains enough wildcards, it should hopefully be enough :) E.g. data/base/1234/zheap/whatever.they.like. 

--