Обсуждение: [ADMIN] Fixing OID directory names after a fsck
Hi all, Our virtual server (running 9.4) suffered a disk corruption outage due to a flaky SAN. As a result, all the contents of /var/lib/postgresql ended up (along with lots of other things) in lost+found with names like #1835289. By looking for directories containing PG_VERSION (and checking the version number), I found 9 directories named like: ./#1835987/PG_VERSION. These look similar to other top-level dirs within main/base, and "strings" shows contents that does indeed look like I've identified the right thing. So, I made a clean install of Postgres 9.4 on a new machine (Ubuntu 17.04), and copied the directories into /var/lib/postgresql/9.4/main/base. I now have: root@treacle:/var/lib/postgresql/9.4/main/base# ls 1 12172 12177 16384 1828647 1835009 1835289 1835317 1835987 1843977 1844229 1844901 1958920 where the first 3 directories are the system defaults, 16384 is a test that I created, and 1828647 - 1958920 are the recovered top-level database directories (but with the wrong names). However, Postgresql doesn't recognise them. From what I've been able to understand, if I could find out what the database OIDs used to be, I could just rename the 1828647 etc to the correct names, and it would work. But I'm not sure how to do this... all the documents are about how to find the file if you know the OID, rather than how to set the OID to match the file. I tried the "obvious" step of creating a new database called "rescue", getting 16384, stopping postgresql, renaming one of my backups to 16384, and restarting - but that doesn't help. Can anyone here give me a pointer? Thanks very much for your help, Richard P.S. I've just joined, and I think I'm posting this to the right place but if not, please let me know. -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
Richard Neill <postgresql@richardneill.org> writes: > Our virtual server (running 9.4) suffered a disk corruption outage due > to a flaky SAN. As a result, all the contents of /var/lib/postgresql > ended up (along with lots of other things) in lost+found with names like > #1835289. > By looking for directories containing PG_VERSION (and checking the > version number), I found 9 directories named like: > ./#1835987/PG_VERSION. These look similar to other top-level dirs within > main/base, and "strings" shows contents that does indeed look like I've > identified the right thing. > So, I made a clean install of Postgres 9.4 on a new machine (Ubuntu > 17.04), and copied the directories into > /var/lib/postgresql/9.4/main/base. I now have: > root@treacle:/var/lib/postgresql/9.4/main/base# ls > 1 12172 12177 16384 1828647 1835009 1835289 1835317 1835987 > 1843977 1844229 1844901 1958920 > where the first 3 directories are the system defaults, 16384 is a test > that I created, and 1828647 - 1958920 are the recovered top-level > database directories (but with the wrong names). However, Postgresql > doesn't recognise them. This is unlikely to work unless you can also recover the pg_clog contents, which unfortunately might be pretty difficult to identify. (In any case, putting the cluster back into production is way too scary. If you can start it, with autovacuum off, and dump the data, I'd recommend doing that and reloading.) regards, tom lane -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
Hi Tom, Thanks for your help - and speedy reply. >> So, I made a clean install of Postgres 9.4 on a new machine (Ubuntu >> 17.04), and copied the directories into >> /var/lib/postgresql/9.4/main/base. I now have: > >> root@treacle:/var/lib/postgresql/9.4/main/base# ls >> 1 12172 12177 16384 1828647 1835009 1835289 1835317 1835987 >> 1843977 1844229 1844901 1958920 > >> where the first 3 directories are the system defaults, 16384 is a test >> that I created, and 1828647 - 1958920 are the recovered top-level >> database directories (but with the wrong names). However, Postgresql >> doesn't recognise them. > > This is unlikely to work unless you can also recover the pg_clog > contents, which unfortunately might be pretty difficult to identify. I can't find it. However, the failure was in the middle of the night, and I think there's a pretty decent chance that postgres was idle at the time - there was "probably" no transaction in progress, and the WAL should have committed. The DBs concerned were all for our wikis, so the structure will also be comparatively simple (and some risk of data-errors would be tolerable in this context). There is ~ 400 MB of data in total, almost all of it in one of the directories. So, I think it's worth a try - but I'm not sure how to proceed next. > (In any case, putting the cluster back into production is way too > scary. If you can start it, with autovacuum off, and dump the data, > I'd recommend doing that and reloading.) Definitely! I was going to do that anyway. The point of the new machine was just to ensure that Postgres itself is known to be in a sane state. But as soon as I can get anything to work, I'll do a dump and restore. Thanks very much, Richard > > regards, tom lane > -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
Richard Neill <postgresql@richardneill.org> writes: >> This is unlikely to work unless you can also recover the pg_clog >> contents, which unfortunately might be pretty difficult to identify. > I can't find it. However, the failure was in the middle of the night, > and I think there's a pretty decent chance that postgres was idle at the > time - there was "probably" no transaction in progress, and the WAL > should have committed. That's all fine, but it only means that the right transactions would be marked committed in pg_clog. There's no very good reason to suppose that that marking has propagated back to the heap tuples in all cases. Actually though, I suspect a big chunk of your problem is that even tuples that are hinted committed will seem to be "in the future". You'd need to also recover pg_control, or at least reset it to something approximating the current XID counter, before stuff is likely to look sane. If you don't have pg_control, you might be able to dig around in the most recently modified tables and guess at the current XID by looking for the largest XIDs you can find. But, TBH, this is getting way past what you're likely to successfully pull off if you've never done it before. There are folks who do PG data recovery professionally [not me], and if your data is worth money to you, you'd be best off hiring someone who's done this before. If it's not, chalk this up to a learning experience, and set up a better backup system. regards, tom lane -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin