Re: Why copy_relation_data only use wal when WALarchiving is enabled
От | Heikki Linnakangas |
---|---|
Тема | Re: Why copy_relation_data only use wal when WALarchiving is enabled |
Дата | |
Msg-id | 47161732.7090601@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Why copy_relation_data only use wal when WALarchiving is enabled ("Florian G. Pflug" <fgp@phlo.org>) |
Список | pgsql-hackers |
Florian G. Pflug wrote: > Heikki Linnakangas wrote: >> I wrote: >>> Unfortunately I don't see any easy way to fix it. One approach would be >>> to avoid reusing the relfilenodes until next checkpoint, but I don't see >>> any nice place to keep track of OIDs that have been dropped since last >>> checkpoint. >> >> Ok, here's one idea: >> >> Instead of deleting the file immediately on commit of DROP TABLE, the >> file is truncated to release the space, but not unlink()ed, to avoid >> reusing that relfilenode. The truncated file can be deleted after next >> checkpoint. >> >> Now, how does checkpoint know what to delete? We can use the fsync >> request mechanism for that. When a file is truncated, a new kind of >> fsync request, a "deletion request", is sent to the bgwriter, which >> collects all such requests to a list. Before checkpoint calculates new >> RedoRecPtr, the list is swapped with an empty one, and after writing the >> new checkpoint record, all the files that were in the list are deleted. >> >> We would leak empty files on crashes, but we leak files on crashes >> anyway, so that shouldn't be an issue. This scheme wouldn't require >> catalog changes, so it would be suitable for backpatching. >> >> Any better ideas? > Couldn't we fix this by forcing a checkpoint before we commit the > transaction that created the new pg_class entry for the clustered table? > Or rather, more generally, before committing a transaction that created > a new non-temporary relfilenode but didn't WAL-log any subsequent inserts. Yes, that would work. As a small optimization, you could set a flag in shared mem whenever you delete a rel file, and skip the checkpoint when that flag isn't set. > Thats of course a rather sledgehammer-like approach to this problem - > but at least for the backbranched the fix would be less intrusive... Too much of a sledgehammer IMHO. BTW, CREATE INDEX is also vulnerable. And in 8.3, COPY to a table created/truncated in the same transaction. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: