possible new option for wal_sync_method
От | Dan Scales |
---|---|
Тема | possible new option for wal_sync_method |
Дата | |
Msg-id | 1258397887.1997975.1329412703998.JavaMail.root@zimbra-prod-mbox-4.vmware.com обсуждение исходный текст |
Ответы |
Re: possible new option for wal_sync_method
Re: possible new option for wal_sync_method Re: possible new option for wal_sync_method |
Список | pgsql-hackers |
When running Postgres on a single ext3 filesystem on Linux, we find that the attached simple patch gives significant performance benefit (7-8% in numbers below). The patch adds a new option for wal_sync_method, which is "open_direct". With this option, the WAL is always opened with O_DIRECT (but not O_SYNC or O_DSYNC). For Linux, the use of only O_DIRECT should be correct. All WAL logs are fully allocated before being used, and the WAL buffers are 8K-aligned, so all direct writes are guaranteed to complete before returning. (See http://lwn.net/Articles/348739/) The advantage of using O_DIRECT is that there is no fsync/fdatasync() used. All of the other wal_sync_methods use fsync/fdatasync(), either explicitly or implicitly (via the O_SYNC and O_DATASYNC options). fsync/fdatasync can be very slow on ext3, because it seems to have to always wait for the current filesystem meta-data transaction to complete, even if that meta-data operation is completely unrelated to the file being fsync'ed. There can be many metadata operations happening on the data files, so the WAL log fsync can wait for metadata operations on the data files. Since O_DIRECT does not do any fsync/fdatasync operation, it avoids this bottleneck, and can finish more quickly on average. The open_sync and open_dsync options do not have this benefit, because they do an equivalent of an fsync/fdatasync after every WAL write. For the open_sync and open_dsync options, O_DIRECT is used for writes only if the xlog will not need to be consumed by the archiver or hot-standby. I am not keying the open_direct behavior based on whether XLogIsNeeded() is true, because we see performance gain even when archiving is enabled (using a simple script that copies and compresses the log segments). For 2-processor, 50-warehouse DBT2 run on SLES 11, I get the following NOTPM results: wal_sync_method fdatasync open_direct open_sync archiving off: 17076 18481 17094 archiving on: 15704 16923 15898 Do folks have any interest in this change, or comments on its usefulness/correctness? It would be just an extra option for wal_sync_method that users can try out and has benefits for certain configurations. Dan
Вложения
В списке pgsql-hackers по дате отправления: