Re: pgcon unconference / impact of block size on performance
От | Tomas Vondra |
---|---|
Тема | Re: pgcon unconference / impact of block size on performance |
Дата | |
Msg-id | 58f299fd-2812-61f8-d089-9836e8bea333@enterprisedb.com обсуждение исходный текст |
Ответ на | RE: pgcon unconference / impact of block size on performance (Jakub Wartak <Jakub.Wartak@tomtom.com>) |
Список | pgsql-hackers |
I did a couple tests to evaluate the impact of filesystem overhead and block size, so here are some preliminary results. I'm running a more extensive set of tests, but some of this seems interesting. I did two sets of tests: 1) fio test on raw devices 2) fio tests on ext4/xfs with different fs block size Both sets of tests were executed with varying iodepth (1, 2, 4, ...) and number of processes (1, 8). The results are attached - CSV file with results, and PDF with pivot tables showing them in more readable format. 1) raw device tests The results for raw devices have regular patterns, with smaller blocks giving better performance - particularly for read workloads. For write workloads, it's similar, except that 4K blocks perform better than 1-2K ones (this applies especially to the NVMe device). 2) fs tests This shows how the tests perform on ext4/xfs filesystems with different block sizes (1K-4K). Overall the patterns are fairly similar to raw devices. There are a couple strange things, though. For example, ext4 often behaves like this on the "write" (i.e. sequential write) benchmark: fs block 1K 2K 4K 8K 16K 32K -------------------------------------------------------------- 1024 33374 28290 27286 26453 22341 19568 2048 33420 38595 75741 63790 48474 33474 4096 33959 38913 73949 63940 49217 33017 It's somewhat expected that 1-2K blocks perform worse than 4K (the raw device behaves the same way), but notice how the behavior differs depending on the fs block. For 2k and 4K fs blocks the throughput improves, but for 1K blocks it just goes down. For higher iodepth values this is even more visible: fs block 1K 2K 4K 8K 16K 32K ------------------------------------------------------------ 1024 34879 25708 24744 23937 22527 19357 2048 31648 50348 282696 236118 121750 60646 4096 34273 39890 273395 214817 135072 66943 The interesting thing is xfs does not have this issue. Furthermore, it seems interesting to compare iops on a filesystem to the raw device, which might be seen as "best case" without the fs overhead. The "comparison" attachmens do exactly that. There are two interesting observations, here: 1) ext4 seems to have some issue with 1-2K random writes (randrw and randwrite tests) with larger 2-4K filesystem blocks. Consider for example this: fs block 1K 2K 4K 8K 16K 32K ------------------------------------------------------------------ 1024 214765 143564 108075 83098 58238 38569 2048 66010 216287 260116 214541 113848 57045 4096 66656 64155 268141 215860 109175 54877 Agian, the xfs does not behave like this. 2) Interestingly enough, compe cases can actually perform better on a filesystem than directly on the raw device - I'm not sure what's the explanation, but it only happens on the SSD RAID (not on the NVMe), and with higher iodepth values. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления: