Re: CREATE DATABASE with filesystem cloning
От | Andrew Dunstan |
---|---|
Тема | Re: CREATE DATABASE with filesystem cloning |
Дата | |
Msg-id | eb02dd00-3fba-9611-d2eb-b99b7c1723cf@dunslane.net обсуждение исходный текст |
Ответ на | CREATE DATABASE with filesystem cloning (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: CREATE DATABASE with filesystem cloning
|
Список | pgsql-hackers |
On 2023-10-07 Sa 01:51, Thomas Munro wrote: > Hello hackers, > > Here is an experimental POC of fast/cheap database cloning. For > clones from little template databases, no one cares much, but it might > be useful to be able to create a snapshot or fork of very large > database for testing/experimentation like this: > > create database foodb_snapshot20231007 template=foodb strategy=file_clone > > It should be a lot faster, and use less physical disk, than the two > existing strategies on recent-ish XFS, BTRFS, very recent OpenZFS, > APFS (= macOS), and it could in theory be extended to other systems > that invented different system calls for this with more work (Solaris, > Windows). Then extra physical disk space will be consumed only as the > two clones diverge. > > It's just like the old strategy=file_copy, except it asks the OS to do > its best copying trick. If you try it on a system that doesn't > support copy-on-write, then copy_file_range() should fall back to > plain old copy, but it might still be better than we could do, as it > can push copy commands to network storage or physical storage. > > Therefore, the usual caveats from strategy=file_copy also apply here. > Namely that it has to perform checkpoints which could be very > expensive, and there are some quirks/brokenness about concurrent > backups and PITR. Which makes me wonder if it's worth pursuing this > idea. Thoughts? > > I tested on bleeding edge FreeBSD/ZFS, where you need to set sysctl > vfs.zfs.bclone_enabled=1 to enable the optimisation, as it's still a > very new feature that is still being rolled out. The system call > succeeds either way, but that controls whether the new database > initially shares blocks on disk, or get new copies. I also tested on > a Mac. In both cases I could clone large databases in a fraction of a > second. I've had to disable COW on my BTRFS-resident buildfarm animals (see previous discussion re Direct I/O). cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: