Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Дата
Msg-id 20210826173750.GK22637@momjian.us
обсуждение исходный текст
Ответ на Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-hackers
On Thu, Aug 26, 2021 at 01:24:46PM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (bruce@momjian.us) wrote:
> > On Thu, Aug 26, 2021 at 01:03:54PM -0400, Stephen Frost wrote:
> > > Yes, we're talking about either incremental (or perhaps differential)
> > > backup where only the files which are actually different would be backed
> > > up.  Just like with PG, I can't provide any complete guarantees that
> > > we'd be able to actually make this possible after a major version with
> > > pgBackRest with this change, but it definitely isn't possible *without*
> > > this change.  I can't see any reason why we wouldn't be able to do a
> > > checksum-based incremental backup though (which would be *much* faster
> > > than a regular backup) once this change is made and have that be a
> > > reliable and trustworthy backup.  I'd want to think about it more and
> > > discuss it with David in some detail before saying if we could maybe
> > > perform a timestamp-based incremental backup (without checksum'ing the
> > > files, as we do in normal situations), but that would really just be a
> > > bonus.
> > 
> > Well, it would be nice to know exactly how it would help pgBackRest if
> > that is one of the reasons we are adding this feature.
> 
> pgBackRest keeps a manifest for every file in the PG data directory that
> is backed up and we identify that file by the filename.  Further, we
> calculate a checksum for every file.  If the filenames didn't change
> then we'd be able to compare the file in the new cluster against the
> file and checksum in the manifest in order to be able to perform the
> incremental/differential backup.  We don't store the inodes in the
> manifest though, and we don't have any concept of looking at multiple
> data directories at the same time or anything like that (which would
> also mean that the old data directory would have to be kept around for
> that to even work, which seems like a good bit of additional
> complication and risk that someone might start up the old cluster by
> accident..).
> 
> That's how it'd be very helpful to pgBackRest for the filenames to be
> preserved across pg_upgrade's.

OK, that is clear.

> > > > > > As far as TDE, I haven't seen any concrete plan for that, so why add
> > > > > > this code for that reason?
> > > > > 
> > > > > That this would help with TDE (of which there seems little doubt...) is
> > > > > an additional benefit to this.  Specifically, taking the existing work
> > > > > that's already been done to allow block-by-block encryption and
> > > > > adjusting it for AES-XTS and then using the db-dir+relfileno+block
> > > > > number as the IV, just like many disk encryption systems do, avoids the
> > > > > concerns that were brought up about using LSN for the IV with CTR and
> > > > > it's certainly not difficult to do, but it does depend on this change.
> > > > > This was all discussed previously and it sure looks like a sensible
> > > > > approach to use that mirrors what many other systems already do
> > > > > successfully.
> > > > 
> > > > Well, I would think we would not add this for TDE until we were sure
> > > > someone was working on adding TDE.
> > > 
> > > That this would help with TDE is what I'd consider an added bonus.
> > 
> > Not if we have no plans to implement TDE, which was my point.  Why not
> > wait to see if we are actually going to implement TDE rather than adding
> > it now.  It is just so obvious, why do I have to state this?
> 
> There's been multiple years of effort put into implementing TDE and I'm
> sure hopeful that it continues as I'm trying to put effort into moving
> it forward myself.  I'm a bit baffled by the idea that we're just

Well, this is the first time I am hearing this publicly.

> suddenly going to stop putting effort into TDE as it is brought up time
> and time again by clients that I've talked to as one of the few reasons
> they haven't moved to PG yet- I can't believe that hasn't been
> experienced by folks at other organizations too, I mean, there's people
> maintaining forks of PG specifically for TDE ...

Agreed.

> > > I've certainly done it and I'd be kind of surprised if others haven't,
> > > but I've also played a lot with pg_dump in various modes, so perhaps
> > > that's not a great representation.  I've definitely had to explain to
> > > clients why there's a whole different set of filenames after a
> > > pg_upgrade and why that is the case for an 'in place' upgrade before
> > > too.
> > 
> > Uh, so I guess I am right that few people have mentioned this in the
> > past.  Why were users caring about the file names?
> 
> This is a bit baffling to me.  Users and admins certainly care about
> what files their data is stored in and knowing how to find them.
> Covering the data directory structure is a commonly asked for part of
> the training that I regularly do for clients.

I just never thought people cared about the file names, since I have
never heard a complaint about how pg_upgrade works all these years.

> > > I have a very hard time seeing what changes might happen in the server
> > > in this space that wouldn't have an impact on pg_upgrade, with or
> > > without this.
> > 
> > I don't know, but I have to ask since I can't know the future, so any
> > "preseration" has to be studied.
> 
> We can gain, perhaps, some insight looking into the past and that seems
> to indicate that this is certainly a very stable part of the server code
> in the first place, which would imply that it's unlikely that there'll
> be much need to adjust this code in the future in the first place.

Good, it have to ask.

> > > > I am not saying this change is wrong, but I think the reasons need to be
> > > > stated in this thread, rather than just moving forward.
> > > 
> > > Ok, they've been stated and it seems to at least Robert and myself that
> > > this is worthwhile to at least continue through to a concluded patch,
> > > after which we can contemplate that patch's complexity against these
> > > reasons.
> > 
> > OK, that works for me.  What bothers me is that the Desirability of this
> > changes has not be clearly stated in this thread.
> 
> I hope that this email and the many many prior ones have gotten across
> the desirability of the change.

Yes, I think we are in a better position now to evaluate this.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Следующее
От: Mark Dilger
Дата:
Сообщение: Re: verify_heapam for sequences?