Re: hot backups: am I doing it wrong, or do we have a problem with pg_clog?
От | Merlin Moncure |
---|---|
Тема | Re: hot backups: am I doing it wrong, or do we have a problem with pg_clog? |
Дата | |
Msg-id | BANLkTikKjTNwx+0uGHMcjDFVPMdKHxGgPA@mail.gmail.com обсуждение исходный текст |
Ответ на | hot backups: am I doing it wrong, or do we have a problem with pg_clog? (Daniel Farina <daniel@heroku.com>) |
Список | pgsql-hackers |
On Thu, Apr 21, 2011 at 6:15 AM, Daniel Farina <daniel@heroku.com> wrote: > To start at the end of this story: "DETAIL: Could not read from file > "pg_clog/007D" at offset 65536: Success." > > This is a message we received on a a standby that we were bringing > online as part of a test. The clog file was present, but apparently > too small for Postgres (or at least I tihnk this is what the message > meant), so one could stub in another clog file and then continue > recovery successfully (modulus the voodoo of stubbing in clog files in > general). I am unsure if this is due to an interesting race condition > in Postgres or a result of my somewhat-interesting hot-backup > protocol, which is slightly more involved than the norm. I will > describe what it does here: > > 1) Call pg start backup > 2) crawl the entire postgres cluster directory structure, except > pg_xlog, taking notes of the size of every file present > 3) begin writing TAR files, but *only up to the size noted during the > original crawling of the cluster directory,* so if the file grows > between the original snapshot and subsequently actually calling read() > on the file those extra bytes will not be added to the TAR. > 3a) If a file is truncated partially, I add "\0" bytes to pad the > tarfile member up to the size sampled in step 2, as I am streaming the > tar file and cannot go back in the stream and adjust the tarfile > member size > 4) call pg stop backup > > The reason I go to this trouble is because I use many completely > disjoint tar files to do parallel compression, decompression, > uploading, and downloading of the base backup of the database, and I > want to be able to control the size of these files up-front. The > requirement of stubbing in \0 is because of a limitation of the tar > format when dealing with streaming archives and the requirement to > truncate the files to the size snapshotted in the step 2 is to enable > splitting up the files between volumes even in the presence of > possible concurrent growth while I'm performing the hot backup. (ex: a > handful of nearly-empty heap files can rapidly grow due to a > concurrent bulk load if I get unlucky, which I do not intend to allow > myself to be). > > Any ideas? Or does it sound like I'm making some bookkeeping errors > and should review my code again? It does work most of the time. I > have not gotten a sense how often this reproduces just yet. Everyone here is going to assume the problem is in your (too?) fancy tar/diff delta archiving approach because we can't see that code and it just sounds suspicious. A busted clog file is of course very noteworthy but to eliminate your stuff you should try reproducing using a more standard method of grabbing the base backup. Have you considered using rsync instead? merlin
В списке pgsql-hackers по дате отправления: