DROP DATABASE vs patch to not remove files right away
От | Tom Lane |
---|---|
Тема | DROP DATABASE vs patch to not remove files right away |
Дата | |
Msg-id | 18026.1208300591@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: DROP DATABASE vs patch to not remove files right away
(Alvaro Herrera <alvherre@commandprompt.com>)
Re: DROP DATABASE vs patch to not remove files right away (Heikki Linnakangas <heikki@enterprisedb.com>) Re: DROP DATABASE vs patch to not remove files right away (Heikki Linnakangas <heikki@enterprisedb.com>) |
Список | pgsql-hackers |
Over the last couple days I twice saw complaints like this during DROP DATABASE: WARNING: could not remove file or directory "base/80750/80825": No such file or directory WARNING: could not remove database directory "base/80750" I poked at it for awhile and was eventually able to extract a repeatable test case: while true do psql -c "create database foo;" postgres || exit 1 psql -c "create table foo(f1 int primary key);" foo || exit 1 psql -c "drop table foo;" foo || exit 1 psql -c "checkpoint" postgres & psql -c "drop database foo;" postgres || exit1 done On my machine this fairly consistently draws warnings in both 8.3 and HEAD. I believe what is happening is that the bgwriter has a PendingUnlinkEntry for table foo, and completion of the checkpoint prompts it to exercise that. Meanwhile in the DROP DATABASE, rmtree is working through a list of files to drop, and when it hits the already-deleted one it complains --- and not only does it complain, it stops trying to delete any more. (The second WARNING is quite misleading, because what it really means is "I stopped trying".) Without the CHECKPOINT, what we get instead is that each cycle builds up some more PendingUnlinkEntrys, which will all fail when the checkpoint comes. The bgwriter is coded to not report ENOENT, so you don't see any evidence of that, but it's clearly a possible case and the comment saying it shouldn't happen is misleading. Actually ... what if the same DB OID and relfilenode get re-made before the checkpoint? Then we'd be unlinking live data. This is improbable but hardly less so than the scenario the PendingUnlinkEntry code was put in to prevent. ISTM that we must fix the bgwriter so that ForgetDatabaseFsyncRequests causes PendingUnlinkEntrys for the doomed DB to be thrown away too. This should prevent the unlink-live-data scenario, I think. Even then, concurrent deletion attempts are probably possible (since ForgetDatabaseFsyncRequests is asynchronous) and rmtree() is being far too fragile about dealing with them. I think that it should be coded to ignore ENOENT the same as the bgwriter does, and that it should press on and keep trying to delete things even if it gets a failure. Thoughts? regards, tom lane
В списке pgsql-hackers по дате отправления: