Re: Improvement of checkpoint IO scheduler for stable transaction responses

Поиск
Список
Период
Сортировка
От Greg Smith
Тема Re: Improvement of checkpoint IO scheduler for stable transaction responses
Дата
Msg-id 51E309D7.5030604@2ndQuadrant.com
обсуждение исходный текст
Ответ на Re: Improvement of checkpoint IO scheduler for stable transaction responses  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-hackers
On 7/3/13 9:39 AM, Andres Freund wrote:
> I wonder how much of this could be gained by doing a
> sync_file_range(SYNC_FILE_RANGE_WRITE) (or similar) either while doing
> the original checkpoint-pass through the buffers or when fsyncing the
> files.

The fsync calls decomposing into the queued set of block writes.  If 
they all need to go out eventually to finish a checkpoint, the most 
efficient way from a throughput perspective is to dump them all at once.

I'm not sure sync_file_range targeting checkpoint writes will turn out 
any differently than block sorting.  Let's say the database tries to get 
involved in forcing a particular write order that way.  Right now it's 
going to be making that ordering decision without the benefit of also 
knowing what blocks are being read.  That makes it hard to do better 
than the OS, which knows a different--and potentially more useful in a 
ready-heavy environment--set of information about all the pending I/O. 
And it would be very expensive to made all the backends start sharing 
information about what they read to ever pull that logic into the 
database.  It's really easy to wander down the path where you assume you 
must know more than the OS does, which leads to things like direct I/O.  I am skeptical of that path in general.  I
reallydon't want Postgres 
 
to be competing with the innovation rate in Linux kernel I/O if we can 
ride it instead.

One idea I was thinking about that overlaps with a sync_file_range 
refactoring is simply tracking how many blocks have been written to each 
relation.  If there was a rule like "fsync any relation that's gotten 
more than 100 8K writes", we'd never build up the sort of backlog that 
causes the worst latency issues.  You really need to start tracking the 
file range there, just to fairly account for multiple writes to the same 
block.  One of the reasons I don't mind all the work I'm planning to put 
into block write statistics is that I think that will make it easier to 
build this sort of facility too.  The original page write and the fsync 
call that eventually flushes it out are very disconnected right now, and 
file range data seems the right missing piece to connect them well.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Greg Smith
Дата:
Сообщение: Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Следующее
От: Noah Misch
Дата:
Сообщение: Re: Materialized views WIP patch