Обсуждение: WALWriter active during recovery

Поиск
Список
Период
Сортировка

WALWriter active during recovery

От
Simon Riggs
Дата:
Currently, WALReceiver writes and fsyncs data it receives. Clearly,
while we are waiting for an fsync we aren't doing any other useful
work.

Following patch starts WALWriter during recovery and makes it
responsible for fsyncing data, allowing WALReceiver to progress other
useful actions.

At present this is a WIP patch, for code comments only. Don't bother
with anything other than code questions at this stage.

Implementation questions are

* How should we wake WALReceiver, since it waits on a poll(). Should
we use SIGUSR1, which is already used for latch waits, or another
signal?

* Should we introduce some pacing delays if the WALreceiver gets too
far ahead of apply?

* Other questions you may have?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: WALWriter active during recovery

От
Andres Freund
Дата:
Hi,

On 2014-12-15 18:51:44 +0000, Simon Riggs wrote:
> Currently, WALReceiver writes and fsyncs data it receives. Clearly,
> while we are waiting for an fsync we aren't doing any other useful
> work.

Well, it can still buffer data on the network level, but there's
definitely limits to that. So I can see this as being useful.

> Following patch starts WALWriter during recovery and makes it
> responsible for fsyncing data, allowing WALReceiver to progress other
> useful actions.
> 
> At present this is a WIP patch, for code comments only. Don't bother
> with anything other than code questions at this stage.
> 
> Implementation questions are
> 
> * How should we wake WALReceiver, since it waits on a poll(). Should
> we use SIGUSR1, which is already used for latch waits, or another
> signal?

It's not entirely trivial, but also not hard, to make it use the latch
code for waiting. It'd probably end up requiring less code because then
we could just scratch libqpwalreceiver.c:libpq_select().

> * Should we introduce some pacing delays if the WALreceiver gets too
> far ahead of apply?

Hm. Why don't we simply start fsyncing in the receiver itself at regular
intervals? If already synced that's cheap, if not, it'll pace us.

Greetings,

Andres Freund



Re: WALWriter active during recovery

От
Heikki Linnakangas
Дата:
On 12/15/2014 08:51 PM, Simon Riggs wrote:
> Currently, WALReceiver writes and fsyncs data it receives. Clearly,
> while we are waiting for an fsync we aren't doing any other useful
> work.
>
> Following patch starts WALWriter during recovery and makes it
> responsible for fsyncing data, allowing WALReceiver to progress other
> useful actions.

What other useful actions can WAL receiver do while it's waiting? It 
doesn't do much else than receive WAL, and fsync it to disk.

- Heikki



Re: WALWriter active during recovery

От
Andres Freund
Дата:
On 2014-12-16 16:12:40 +0200, Heikki Linnakangas wrote:
> On 12/15/2014 08:51 PM, Simon Riggs wrote:
> >Currently, WALReceiver writes and fsyncs data it receives. Clearly,
> >while we are waiting for an fsync we aren't doing any other useful
> >work.
> >
> >Following patch starts WALWriter during recovery and makes it
> >responsible for fsyncing data, allowing WALReceiver to progress other
> >useful actions.
> 
> What other useful actions can WAL receiver do while it's waiting? It doesn't
> do much else than receive WAL, and fsync it to disk.

It can actually receive further data from the network and write it to
disk? On a relatively low latency network the buffers aren't that
large. Right now we generate quite a bursty IO pattern with the disks
alternating between idle and fully busy.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: WALWriter active during recovery

От
Simon Riggs
Дата:
On 16 December 2014 at 14:12, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 12/15/2014 08:51 PM, Simon Riggs wrote:
>>
>> Currently, WALReceiver writes and fsyncs data it receives. Clearly,
>> while we are waiting for an fsync we aren't doing any other useful
>> work.
>>
>> Following patch starts WALWriter during recovery and makes it
>> responsible for fsyncing data, allowing WALReceiver to progress other
>> useful actions.
>
>
> What other useful actions can WAL receiver do while it's waiting? It doesn't
> do much else than receive WAL, and fsync it to disk.

So now it will only need to do one of those two things.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: WALWriter active during recovery

От
didier
Дата:
Hi,

On Tue, Dec 16, 2014 at 6:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 16 December 2014 at 14:12, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> On 12/15/2014 08:51 PM, Simon Riggs wrote:
>>>
>>> Currently, WALReceiver writes and fsyncs data it receives. Clearly,
>>> while we are waiting for an fsync we aren't doing any other useful
>>> work.
>>>
>>> Following patch starts WALWriter during recovery and makes it
>>> responsible for fsyncing data, allowing WALReceiver to progress other
>>> useful actions.
On many Linux systems it may not do that much (2.6.32 and 3.2 are bad,
3.13 is better but still it slows the fsync).

If there's a fsync in progress WALReceiver will:
1- slow the fsync because its writes to the same file are grabbed by the fsync
2- stall until the end of fsync.

from 'stracing' a test program simulating this pattern:
two processes, one writes to a file the second fsync it.

20279 11:51:24.037108 fsync(5 <unfinished ...>
20278 11:51:24.053524 <... nanosleep resumed> NULL) = 0 <0.020281>
20278 11:51:24.053691 lseek(3, 1383612416, SEEK_SET) = 1383612416 <0.000119>
20278 11:51:24.053965 write(3, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"...,
8192) = 8192 <0.000111>
20278 11:51:24.054190 nanosleep({0, 20000000}, NULL) = 0 <0.020243>
....
20278 11:51:24.404386 lseek(3, 194772992, SEEK_SET <unfinished ...>
20279 11:51:24.754123 <... fsync resumed> ) = 0 <0.716971>
20279 11:51:24.754202 close(5 <unfinished ...>
20278 11:51:24.754232 <... lseek resumed> ) = 194772992 <0.349825>

Yes that's a 300ms lseek...

>>
>>
>> What other useful actions can WAL receiver do while it's waiting? It doesn't
>> do much else than receive WAL, and fsync it to disk.
>
> So now it will only need to do one of those two things.
>

Regards
Didier



Re: WALWriter active during recovery

От
Simon Riggs
Дата:
On 17 December 2014 at 11:27, didier <did447@gmail.com> wrote:

> If there's a fsync in progress WALReceiver will:
> 1- slow the fsync because its writes to the same file are grabbed by the fsync
> 2- stall until the end of fsync.

PostgreSQL already fsyncs files while they are being written to. Are
you saying we should stop doing that?

It would be possible to synchronize processes so that we don't write
to a file while it is being fsynced.

fsyncs are also made once the whole 16MB has been written, so in those
cases there is no simultaneous action.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: WALWriter active during recovery

От
Alvaro Herrera
Дата:
didier wrote:

> On many Linux systems it may not do that much (2.6.32 and 3.2 are bad,
> 3.13 is better but still it slows the fsync).
> 
> If there's a fsync in progress WALReceiver will:
> 1- slow the fsync because its writes to the same file are grabbed by the fsync
> 2- stall until the end of fsync.

Is this behavior filesystem-dependent?


-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WALWriter active during recovery

От
didier
Дата:
Hi

On Wed, Dec 17, 2014 at 2:39 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> didier wrote:
>
>> On many Linux systems it may not do that much (2.6.32 and 3.2 are bad,
>> 3.13 is better but still it slows the fsync).
>>
>> If there's a fsync in progress WALReceiver will:
>> 1- slow the fsync because its writes to the same file are grabbed by the fsync
>> 2- stall until the end of fsync.
>
> Is this behavior filesystem-dependent?
I don't know. I only tested  ext4

Attach the trivial code I used, there's a lot of junk in it.

Didier

Вложения

Re: WALWriter active during recovery

От
Fujii Masao
Дата:
On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Currently, WALReceiver writes and fsyncs data it receives. Clearly,
> while we are waiting for an fsync we aren't doing any other useful
> work.
>
> Following patch starts WALWriter during recovery and makes it
> responsible for fsyncing data, allowing WALReceiver to progress other
> useful actions.

+1

> At present this is a WIP patch, for code comments only. Don't bother
> with anything other than code questions at this stage.
>
> Implementation questions are
>
> * How should we wake WALReceiver, since it waits on a poll(). Should
> we use SIGUSR1, which is already used for latch waits, or another
> signal?

Probably we need to change libpqwalreceiver so that it uses the latch.
This is useful even for the startup process to report the replay location to
the walreceiver in real time.

> * Should we introduce some pacing delays if the WALreceiver gets too
> far ahead of apply?

I don't think so for now. Instead, we can support synchronous_commit = replay,
and the users can use that new mode if they are worried about the delay of
WAL replay.

> * Other questions you may have?

Who should wake the startup process so that it reads and replays the WAL data?
Current walreceiver. But if walwriter is responsible for fsyncing WAL data,
probably walwriter should do that. Because the startup process should not replay
the WAL data which has not been fsync'd yet.

Regards,

-- 
Fujii Masao



Re: WALWriter active during recovery

От
Michael Paquier
Дата:
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Thu, Dec 18, 2014 at 6:43 PM, Fujii
Masao<span dir="ltr"><<a href="mailto:masao.fujii@gmail.com" target="_blank">masao.fujii@gmail.com</a>></span>
wrote:<br/><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex"><spanclass="">On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs <<a
href="mailto:simon@2ndquadrant.com">simon@2ndquadrant.com</a>>wrote:<br /> > Currently, WALReceiver writes and
fsyncsdata it receives. Clearly,<br /> > while we are waiting for an fsync we aren't doing any other useful<br />
>work.<br /> ><br /> > Following patch starts WALWriter during recovery and makes it<br /> > responsible
forfsyncing data, allowing WALReceiver to progress other<br /> > useful actions.<br /><br /></span>+1<br /><span
class=""><br/> > At present this is a WIP patch, for code comments only. Don't bother<br /> > with anything other
thancode questions at this stage.<br /> ><br /> > Implementation questions are<br /> ><br /> > * How should
wewake WALReceiver, since it waits on a poll(). Should<br /> > we use SIGUSR1, which is already used for latch
waits,or another<br /> > signal?<br /><br /></span>Probably we need to change libpqwalreceiver so that it uses the
latch.<br/> This is useful even for the startup process to report the replay location to<br /> the walreceiver in real
time.<br/><span class=""><br /> > * Should we introduce some pacing delays if the WALreceiver gets too<br /> >
farahead of apply?<br /><br /></span>I don't think so for now. Instead, we can support synchronous_commit = replay,<br
/>and the users can use that new mode if they are worried about the delay of<br /> WAL replay.<br /><span class=""><br
/>> * Other questions you may have?<br /><br /></span>Who should wake the startup process so that it reads and
replaysthe WAL data?<br /> Current walreceiver. But if walwriter is responsible for fsyncing WAL data,<br /> probably
walwritershould do that. Because the startup process should not replay<br /> the WAL data which has not been fsync'd
yet.<br/></blockquote></div><br />Moved this patch to CF 2015-02 to not lose track of it and because it did not get any
reviews.<br/>-- <br /><div class="gmail_signature">Michael<br /></div></div></div> 

Re: WALWriter active during recovery

От
Fujii Masao
Дата:
On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Currently, WALReceiver writes and fsyncs data it receives. Clearly,
>> while we are waiting for an fsync we aren't doing any other useful
>> work.
>>
>> Following patch starts WALWriter during recovery and makes it
>> responsible for fsyncing data, allowing WALReceiver to progress other
>> useful actions.

With the patch, replication didn't work fine in my machine. I started
the standby server after removing all the WAL files from the standby.
ISTM that the patch doesn't handle that case. That is, in that case,
the standby tries to start up walreceiver and replication to retrieve
the REDO-starting checkpoint record *before* starting up walwriter
(IOW, before reaching the consistent point). Then since walreceiver works
without walwriter, no received WAL data cannot be fsync'd in the standby.
So replication cannot advance furthermore. I think that walwriter needs
to start before walreceiver starts.

I just marked this patch as Waiting on Author.

Regards,

-- 
Fujii Masao



Re: WALWriter active during recovery

От
Fujii Masao
Дата:
On Thu, Mar 5, 2015 at 5:22 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Currently, WALReceiver writes and fsyncs data it receives. Clearly,
>>> while we are waiting for an fsync we aren't doing any other useful
>>> work.
>>>
>>> Following patch starts WALWriter during recovery and makes it
>>> responsible for fsyncing data, allowing WALReceiver to progress other
>>> useful actions.
>
> With the patch, replication didn't work fine in my machine. I started
> the standby server after removing all the WAL files from the standby.
> ISTM that the patch doesn't handle that case. That is, in that case,
> the standby tries to start up walreceiver and replication to retrieve
> the REDO-starting checkpoint record *before* starting up walwriter
> (IOW, before reaching the consistent point). Then since walreceiver works
> without walwriter, no received WAL data cannot be fsync'd in the standby.
> So replication cannot advance furthermore. I think that walwriter needs
> to start before walreceiver starts.
>
> I just marked this patch as Waiting on Author.

This patch was moved to current CF with the status "Needs review".
But there are already some review comments which have not been addressed yet,
so I marked the patch as "Waiting on Author" again.

Regards,

-- 
Fujii Masao



Re: WALWriter active during recovery

От
Simon Riggs
Дата:
On 2 July 2015 at 14:31, Fujii Masao <masao.fujii@gmail.com> wrote:
On Thu, Mar 5, 2015 at 5:22 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Currently, WALReceiver writes and fsyncs data it receives. Clearly,
>>> while we are waiting for an fsync we aren't doing any other useful
>>> work.
>>>
>>> Following patch starts WALWriter during recovery and makes it
>>> responsible for fsyncing data, allowing WALReceiver to progress other
>>> useful actions.
>
> With the patch, replication didn't work fine in my machine. I started
> the standby server after removing all the WAL files from the standby.
> ISTM that the patch doesn't handle that case. That is, in that case,
> the standby tries to start up walreceiver and replication to retrieve
> the REDO-starting checkpoint record *before* starting up walwriter
> (IOW, before reaching the consistent point). Then since walreceiver works
> without walwriter, no received WAL data cannot be fsync'd in the standby.
> So replication cannot advance furthermore. I think that walwriter needs
> to start before walreceiver starts.
>
> I just marked this patch as Waiting on Author.

This patch was moved to current CF with the status "Needs review".
But there are already some review comments which have not been addressed yet,
so I marked the patch as "Waiting on Author" again.

This was pushed back from last CF and I haven't worked on it at all, nor will I.

Pushing back again.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: WALWriter active during recovery

От
Andres Freund
Дата:
On 2015-07-02 14:34:48 +0100, Simon Riggs wrote:
> This was pushed back from last CF and I haven't worked on it at all, nor
> will I.
> 
> Pushing back again.

Let's "return with feedback", not " move", it then.. Moving a entries
along which aren't expected to receive updates anytime soon isn't a good
idea, there's more than enough entries each CF.



Re: WALWriter active during recovery

От
Simon Riggs
Дата:
On 2 July 2015 at 14:38, Andres Freund <andres@anarazel.de> wrote:
On 2015-07-02 14:34:48 +0100, Simon Riggs wrote:
> This was pushed back from last CF and I haven't worked on it at all, nor
> will I.
>
> Pushing back again.

Let's "return with feedback", not " move", it then.. Moving a entries
along which aren't expected to receive updates anytime soon isn't a good
idea, there's more than enough entries each CF.

Although I agree, the interface won't let me do that, so will leave as-is.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services