Re: pg_combinebackup --copy-file-range
От | Tomas Vondra |
---|---|
Тема | Re: pg_combinebackup --copy-file-range |
Дата | |
Msg-id | c82137eb-460e-41ca-be78-1bb32829bf1f@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: pg_combinebackup --copy-file-range (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: pg_combinebackup --copy-file-range
|
Список | pgsql-hackers |
On 3/31/24 06:46, Thomas Munro wrote: > On Sun, Mar 31, 2024 at 5:33 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> I'm on 2.2.2 (on Linux). But there's something wrong, because the >> pg_combinebackup that took ~150s on xfs/btrfs, takes ~900s on ZFS. >> >> I'm not sure it's a ZFS config issue, though, because it's not CPU or >> I/O bound, and I see this on both machines. And some simple dd tests >> show the zpool can do 10x the throughput. Could this be due to the file >> header / pool alignment? > > Could ZFS recordsize > 8kB be making it worse, repeatedly dealing with > the same 128kB record as you copy_file_range 16 x 8kB blocks? > (Guessing you might be using the default recordsize?) > No, I reduced the record size to 8kB. And the pgbench init takes about the same as on other filesystems on this hardware, I think. ~10 minutes for scale 5000. >> I admit I'm not very familiar with the format, but you're probably right >> there's a header, and header_length does not seem to consider alignment. >> make_incremental_rfile simply does this: >> >> /* Remember length of header. */ >> rf->header_length = sizeof(magic) + sizeof(rf->num_blocks) + >> sizeof(rf->truncation_block_length) + >> sizeof(BlockNumber) * rf->num_blocks; >> >> and sendFile() does the same thing when creating incremental basebackup. >> I guess it wouldn't be too difficult to make sure to align this to >> BLCKSZ or something like this. I wonder if the file format is documented >> somewhere ... It'd certainly be nicer to tweak before v18, if necessary. >> >> Anyway, is that really a problem? I mean, in my tests the CoW stuff >> seemed to work quite fine - at least on the XFS/BTRFS. Although, maybe >> that's why it took longer on XFS ... > > Yeah I'm not sure, I assume it did more allocating and copying because > of that. It doesn't matter and it would be fine if a first version > weren't as good as possible, and fine if we tune the format later once > we know more, ie leaving improvements on the table. I just wanted to > share the observation. I wouldn't be surprised if the block-at-a-time > coding makes it slower and maybe makes the on disk data structures > worse, but I dunno I'm just guessing. > > It's also interesting but not required to figure out how to tune ZFS > well for this purpose right now... No idea. Any idea if there's some good ZFS statistics to check? -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: