Re: checkpointer continuous flushing
От | Andres Freund |
---|---|
Тема | Re: checkpointer continuous flushing |
Дата | |
Msg-id | 20150817114138.GG3522@awork2.anarazel.de обсуждение исходный текст |
Ответ на | Re: checkpointer continuous flushing (Fabien COELHO <coelho@cri.ensmp.fr>) |
Список | pgsql-hackers |
On 2015-08-11 17:15:22 +0200, Fabien COELHO wrote: > +void > +PerformFileFlush(FileFlushContext * context) > +{ > + if (context->ncalls != 0) > + { > + int rc; > + > +#if defined(HAVE_SYNC_FILE_RANGE) > + > + /* Linux: tell the memory manager to move these blocks to io so > + * that they are considered for being actually written to disk. > + */ > + rc = sync_file_range(context->fd, context->offset, context->nbytes, > + SYNC_FILE_RANGE_WRITE); > + > +#elif defined(HAVE_POSIX_FADVISE) > + > + /* Others: say that data should not be kept in memory... > + * This is not exactly what we want to say, because we want to write > + * the data for durability but we may need it later nevertheless. > + * It seems that Linux would free the memory *if* the data has > + * already been written do disk, else the "dontneed" call is ignored. > + * For FreeBSD this may have the desired effect of moving the > + * data to the io layer, although the system does not seem to > + * take into account the provided offset & size, so it is rather > + * rough... > + */ > + rc = posix_fadvise(context->fd, context->offset, context->nbytes, > + POSIX_FADV_DONTNEED); > + > +#endif > + > + if (rc < 0) > + ereport(ERROR, > + (errcode_for_file_access(), > + errmsg("could not flush block " INT64_FORMAT > + " on " INT64_FORMAT " blocks in file \"%s\": %m", > + context->offset / BLCKSZ, > + context->nbytes / BLCKSZ, > + context->filename))); > + } I'm a bit wary that this might cause significant regressions on platforms not supporting sync_file_range, but support posix_fadvise() for workloads that are bigger than shared_buffers. Consider what happens if the workload does *not* fit into shared_buffers but *does* fit into the OS's buffer cache. Suddenly reads will go to disk again, no? Greetings, Andres Freund
В списке pgsql-hackers по дате отправления: