Обсуждение: Re: [ADMIN] pg_basebackup blocking all queries with horrible performance

Поиск
Список
Период
Сортировка

Re: [ADMIN] pg_basebackup blocking all queries with horrible performance

От
Magnus Hagander
Дата:
On Mon, Jun 11, 2012 at 5:37 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Mon, Jun 11, 2012 at 3:24 AM, Magnus Hagander <magnus@hagander.net> wrote:
>> On Sun, Jun 10, 2012 at 6:08 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> On Sun, Jun 10, 2012 at 11:45 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>>> On Sun, Jun 10, 2012 at 4:29 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>>> On Sun, Jun 10, 2012 at 11:10 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>>>>> On Sun, Jun 10, 2012 at 4:08 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>>>>> On Sun, Jun 10, 2012 at 10:34 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>>>>>> On Sun, Jun 10, 2012 at 9:25 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>>>>>>> On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>>>>>>>>> On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>>>>>>>> Fujii Masao <masao.fujii@gmail.com> writes:
>>>>>>>>>>>> This seems a bug. I think we should prevent pg_basebackup from
>>>>>>>>>>>> becoming synchronous standby. Thought?
>>>>>>>>>>>
>>>>>>>>>>> Absolutely.  If we have replication clients that are not actually
>>>>>>>>>>> capable of being standbys, there *must* be a way for the master
>>>>>>>>>>> to know that.
>>>>>>>>>>
>>>>>>>>>> I thought we fixed this already by sending InvalidXlogRecPtr as flush
>>>>>>>>>> location? And that this only applied in 9.2?
>>>>>>>>>>
>>>>>>>>>> Are you saying we picked pg_basebackup *in backup mode* (not log
>>>>>>>>>> streaming) as synchronous standby?
>>>>>>>>>
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>>> If so then yes, that is
>>>>>>>>>> *definitely* a bug that should be fixed. We should never select a
>>>>>>>>>> connection that's not even streaming log as standby!
>>>>>>>>>
>>>>>>>>> Agreed. Attached patch prevents pg_basebackup from becoming sync
>>>>>>>>> standby. Also this patch fixes another problem: currently only walsender
>>>>>>>>> which reaches STREAMING state can become sync walsender. OTOH,
>>>>>>>>> sync walsender thinks that walsender with higher priority will be sync one
>>>>>>>>> whether its state is STREAMING, and switches to potential sync walsender.
>>>>>>>>> So when the standby with higher priority connects to the master, we
>>>>>>>>> might have no sync standby until it reaches the STREAMING state.
>>>>>>>>> To fix this problem, the patch switches walsender's state from sync to
>>>>>>>>> potential *after* walsender with higher priority has reached the
>>>>>>>>> STREAMING state.
>>>>>>>>>
>>>>>>>>> We also should not select (1) background stream process forked from
>>>>>>>>> pg_basebackup and (2) pg_receivexlog as sync standby because they
>>>>>>>>> don't send back replication progress. To address this, I'm thinking to
>>>>>>>>> introduce new option "NOSYNC" in "START_REPLICATION" command
>>>>>>>>> as follows, and to change (1) and (2) so that they specify NOSYNC.
>>>>>>>>>
>>>>>>>>>    START_REPLICATION XXX/XXX [NOSYNC]
>>>>>>>>>
>>>>>>>>> If the standby specifies NOSYNC option, it's never assigned as sync
>>>>>>>>> standby even if its name is in synchronous_standby_names. Thought?
>>>>>>>>
>>>>>>>> The standby which always sends InvalidXLogRecPtr back should not
>>>>>>>> become sync one. So instead of NOSYNC option, by checking whether
>>>>>>>> InvalidXLogRecPtr is sent, we can avoid problematic sync standby.
>>>>>>>
>>>>>>> We should not do this because Magnus is proposing the patch
>>>>>>> (http://archives.postgresql.org/pgsql-hackers/2012-06/msg00348.php)
>>>>>>> which breaks the above assumption at all. So we should introduce
>>>>>>> something like NOSYNC option.
>>>>>>
>>>>>> Wouldn't the better choice there in that case be to give a switch to
>>>>>> pg_receivexlog if you *want* it to be able to become a sync replica,
>>>>>> and by default disallow it? And then keep the backend just treating
>>>>>> InvalidXlogRecPtr as don't-become-sync-replica.
>>>>>
>>>>> I don't object to making pg_receivexlog as sync standby at all. So at least
>>>>> for me, that switch is not necessary. What I'm worried about is the
>>>>> background stream process forked from pg_basebackup. I think that
>>>>> it should not run as sync standby but sending back its replication progress
>>>>> seems helpful because a user can see the progress from pg_stat_replication.
>>>>> So I'm thinking that something like NOSYNC option is required.
>>>>
>>>> On principle, no. By default, yes.
>>>>
>>>> How about:
>>>> pg_basebackup background: *never* sends flush location, and therefor
>>>> won't become sync replica
>>>> pg_receivexlog *optionally* sends flush location. by defualt own't
>>>> become sync replica, but can be made so with a switch
>>>
>>> Wouldn't a user who sees NULL in flush_location from pg_stat_replication
>>> misunderstand that pg_receivexlog (in default mode) and pg_basebackup
>>> background don't flush WAL files at all?
>>
>> That sounds like a "documentable issue".
>>
>> But maybe you're right, and we need the "never become sync" as a flag.
>
> You agreed to add something like NOSYNC option into START_REPLICATION command?

I'm on the fence. I was hoping somebody else would chime in with an
opinion as well.

I just realized this thread is on -admin. Moving it to -hackers so
more of the right people will spot it.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [ADMIN] pg_basebackup blocking all queries with horrible performance

От
Fujii Masao
Дата:
On Tue, Jun 12, 2012 at 12:47 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Mon, Jun 11, 2012 at 5:37 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Mon, Jun 11, 2012 at 3:24 AM, Magnus Hagander <magnus@hagander.net> wrote:
>>> On Sun, Jun 10, 2012 at 6:08 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>> On Sun, Jun 10, 2012 at 11:45 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>>>> On Sun, Jun 10, 2012 at 4:29 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>>>> On Sun, Jun 10, 2012 at 11:10 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>>>>>> On Sun, Jun 10, 2012 at 4:08 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>>>>>> On Sun, Jun 10, 2012 at 10:34 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>>>>>>> On Sun, Jun 10, 2012 at 9:25 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>>>>>>>> On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>>>>>>>>>> On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>>>>>>>>> Fujii Masao <masao.fujii@gmail.com> writes:
>>>>>>>>>>>>> This seems a bug. I think we should prevent pg_basebackup from
>>>>>>>>>>>>> becoming synchronous standby. Thought?
>>>>>>>>>>>>
>>>>>>>>>>>> Absolutely.  If we have replication clients that are not actually
>>>>>>>>>>>> capable of being standbys, there *must* be a way for the master
>>>>>>>>>>>> to know that.
>>>>>>>>>>>
>>>>>>>>>>> I thought we fixed this already by sending InvalidXlogRecPtr as flush
>>>>>>>>>>> location? And that this only applied in 9.2?
>>>>>>>>>>>
>>>>>>>>>>> Are you saying we picked pg_basebackup *in backup mode* (not log
>>>>>>>>>>> streaming) as synchronous standby?
>>>>>>>>>>
>>>>>>>>>> Yes.
>>>>>>>>>>
>>>>>>>>>>> If so then yes, that is
>>>>>>>>>>> *definitely* a bug that should be fixed. We should never select a
>>>>>>>>>>> connection that's not even streaming log as standby!
>>>>>>>>>>
>>>>>>>>>> Agreed. Attached patch prevents pg_basebackup from becoming sync
>>>>>>>>>> standby. Also this patch fixes another problem: currently only walsender
>>>>>>>>>> which reaches STREAMING state can become sync walsender. OTOH,
>>>>>>>>>> sync walsender thinks that walsender with higher priority will be sync one
>>>>>>>>>> whether its state is STREAMING, and switches to potential sync walsender.
>>>>>>>>>> So when the standby with higher priority connects to the master, we
>>>>>>>>>> might have no sync standby until it reaches the STREAMING state.
>>>>>>>>>> To fix this problem, the patch switches walsender's state from sync to
>>>>>>>>>> potential *after* walsender with higher priority has reached the
>>>>>>>>>> STREAMING state.
>>>>>>>>>>
>>>>>>>>>> We also should not select (1) background stream process forked from
>>>>>>>>>> pg_basebackup and (2) pg_receivexlog as sync standby because they
>>>>>>>>>> don't send back replication progress. To address this, I'm thinking to
>>>>>>>>>> introduce new option "NOSYNC" in "START_REPLICATION" command
>>>>>>>>>> as follows, and to change (1) and (2) so that they specify NOSYNC.
>>>>>>>>>>
>>>>>>>>>>    START_REPLICATION XXX/XXX [NOSYNC]
>>>>>>>>>>
>>>>>>>>>> If the standby specifies NOSYNC option, it's never assigned as sync
>>>>>>>>>> standby even if its name is in synchronous_standby_names. Thought?
>>>>>>>>>
>>>>>>>>> The standby which always sends InvalidXLogRecPtr back should not
>>>>>>>>> become sync one. So instead of NOSYNC option, by checking whether
>>>>>>>>> InvalidXLogRecPtr is sent, we can avoid problematic sync standby.
>>>>>>>>
>>>>>>>> We should not do this because Magnus is proposing the patch
>>>>>>>> (http://archives.postgresql.org/pgsql-hackers/2012-06/msg00348.php)
>>>>>>>> which breaks the above assumption at all. So we should introduce
>>>>>>>> something like NOSYNC option.
>>>>>>>
>>>>>>> Wouldn't the better choice there in that case be to give a switch to
>>>>>>> pg_receivexlog if you *want* it to be able to become a sync replica,
>>>>>>> and by default disallow it? And then keep the backend just treating
>>>>>>> InvalidXlogRecPtr as don't-become-sync-replica.
>>>>>>
>>>>>> I don't object to making pg_receivexlog as sync standby at all. So at least
>>>>>> for me, that switch is not necessary. What I'm worried about is the
>>>>>> background stream process forked from pg_basebackup. I think that
>>>>>> it should not run as sync standby but sending back its replication progress
>>>>>> seems helpful because a user can see the progress from pg_stat_replication.
>>>>>> So I'm thinking that something like NOSYNC option is required.
>>>>>
>>>>> On principle, no. By default, yes.
>>>>>
>>>>> How about:
>>>>> pg_basebackup background: *never* sends flush location, and therefor
>>>>> won't become sync replica
>>>>> pg_receivexlog *optionally* sends flush location. by defualt own't
>>>>> become sync replica, but can be made so with a switch
>>>>
>>>> Wouldn't a user who sees NULL in flush_location from pg_stat_replication
>>>> misunderstand that pg_receivexlog (in default mode) and pg_basebackup
>>>> background don't flush WAL files at all?
>>>
>>> That sounds like a "documentable issue".
>>>
>>> But maybe you're right, and we need the "never become sync" as a flag.
>>
>> You agreed to add something like NOSYNC option into START_REPLICATION command?
>
> I'm on the fence. I was hoping somebody else would chime in with an
> opinion as well.

+1

Regards,

--
Fujii Masao


Re: [ADMIN] pg_basebackup blocking all queries with horrible performance

От
Magnus Hagander
Дата:
On Mon, Jun 11, 2012 at 6:06 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Jun 12, 2012 at 12:47 AM, Magnus Hagander <magnus@hagander.net> wrote:
>> On Mon, Jun 11, 2012 at 5:37 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> On Mon, Jun 11, 2012 at 3:24 AM, Magnus Hagander <magnus@hagander.net> wrote:
>>>> On Sun, Jun 10, 2012 at 6:08 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>>> On Sun, Jun 10, 2012 at 11:45 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>>>>> On Sun, Jun 10, 2012 at 4:29 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>>>>> On Sun, Jun 10, 2012 at 11:10 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>>>>>>> On Sun, Jun 10, 2012 at 4:08 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>>>>>>> On Sun, Jun 10, 2012 at 10:34 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>>>>>>>> On Sun, Jun 10, 2012 at 9:25 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>>>>>>>>> On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>>>>>>>>>>> On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>>>>>>>>>> Fujii Masao <masao.fujii@gmail.com> writes:
>>>>>>>>>>>>>> This seems a bug. I think we should prevent pg_basebackup from
>>>>>>>>>>>>>> becoming synchronous standby. Thought?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Absolutely.  If we have replication clients that are not actually
>>>>>>>>>>>>> capable of being standbys, there *must* be a way for the master
>>>>>>>>>>>>> to know that.
>>>>>>>>>>>>
>>>>>>>>>>>> I thought we fixed this already by sending InvalidXlogRecPtr as flush
>>>>>>>>>>>> location? And that this only applied in 9.2?
>>>>>>>>>>>>
>>>>>>>>>>>> Are you saying we picked pg_basebackup *in backup mode* (not log
>>>>>>>>>>>> streaming) as synchronous standby?
>>>>>>>>>>>
>>>>>>>>>>> Yes.
>>>>>>>>>>>
>>>>>>>>>>>> If so then yes, that is
>>>>>>>>>>>> *definitely* a bug that should be fixed. We should never select a
>>>>>>>>>>>> connection that's not even streaming log as standby!
>>>>>>>>>>>
>>>>>>>>>>> Agreed. Attached patch prevents pg_basebackup from becoming sync
>>>>>>>>>>> standby. Also this patch fixes another problem: currently only walsender
>>>>>>>>>>> which reaches STREAMING state can become sync walsender. OTOH,
>>>>>>>>>>> sync walsender thinks that walsender with higher priority will be sync one
>>>>>>>>>>> whether its state is STREAMING, and switches to potential sync walsender.
>>>>>>>>>>> So when the standby with higher priority connects to the master, we
>>>>>>>>>>> might have no sync standby until it reaches the STREAMING state.
>>>>>>>>>>> To fix this problem, the patch switches walsender's state from sync to
>>>>>>>>>>> potential *after* walsender with higher priority has reached the
>>>>>>>>>>> STREAMING state.
>>>>>>>>>>>
>>>>>>>>>>> We also should not select (1) background stream process forked from
>>>>>>>>>>> pg_basebackup and (2) pg_receivexlog as sync standby because they
>>>>>>>>>>> don't send back replication progress. To address this, I'm thinking to
>>>>>>>>>>> introduce new option "NOSYNC" in "START_REPLICATION" command
>>>>>>>>>>> as follows, and to change (1) and (2) so that they specify NOSYNC.
>>>>>>>>>>>
>>>>>>>>>>>    START_REPLICATION XXX/XXX [NOSYNC]
>>>>>>>>>>>
>>>>>>>>>>> If the standby specifies NOSYNC option, it's never assigned as sync
>>>>>>>>>>> standby even if its name is in synchronous_standby_names. Thought?
>>>>>>>>>>
>>>>>>>>>> The standby which always sends InvalidXLogRecPtr back should not
>>>>>>>>>> become sync one. So instead of NOSYNC option, by checking whether
>>>>>>>>>> InvalidXLogRecPtr is sent, we can avoid problematic sync standby.
>>>>>>>>>
>>>>>>>>> We should not do this because Magnus is proposing the patch
>>>>>>>>> (http://archives.postgresql.org/pgsql-hackers/2012-06/msg00348.php)
>>>>>>>>> which breaks the above assumption at all. So we should introduce
>>>>>>>>> something like NOSYNC option.
>>>>>>>>
>>>>>>>> Wouldn't the better choice there in that case be to give a switch to
>>>>>>>> pg_receivexlog if you *want* it to be able to become a sync replica,
>>>>>>>> and by default disallow it? And then keep the backend just treating
>>>>>>>> InvalidXlogRecPtr as don't-become-sync-replica.
>>>>>>>
>>>>>>> I don't object to making pg_receivexlog as sync standby at all. So at least
>>>>>>> for me, that switch is not necessary. What I'm worried about is the
>>>>>>> background stream process forked from pg_basebackup. I think that
>>>>>>> it should not run as sync standby but sending back its replication progress
>>>>>>> seems helpful because a user can see the progress from pg_stat_replication.
>>>>>>> So I'm thinking that something like NOSYNC option is required.
>>>>>>
>>>>>> On principle, no. By default, yes.
>>>>>>
>>>>>> How about:
>>>>>> pg_basebackup background: *never* sends flush location, and therefor
>>>>>> won't become sync replica
>>>>>> pg_receivexlog *optionally* sends flush location. by defualt own't
>>>>>> become sync replica, but can be made so with a switch
>>>>>
>>>>> Wouldn't a user who sees NULL in flush_location from pg_stat_replication
>>>>> misunderstand that pg_receivexlog (in default mode) and pg_basebackup
>>>>> background don't flush WAL files at all?
>>>>
>>>> That sounds like a "documentable issue".
>>>>
>>>> But maybe you're right, and we need the "never become sync" as a flag.
>>>
>>> You agreed to add something like NOSYNC option into START_REPLICATION command?
>>
>> I'm on the fence. I was hoping somebody else would chime in with an
>> opinion as well.
>
> +1

Nobody else with any opinion on this? :(

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [ADMIN] pg_basebackup blocking all queries with horrible performance

От
Simon Riggs
Дата:
On 11 June 2012 23:47, Magnus Hagander <magnus@hagander.net> wrote

>> You agreed to add something like NOSYNC option into START_REPLICATION command?
>
> I'm on the fence. I was hoping somebody else would chime in with an
> opinion as well.

Why would you add it to synchronous_standby_names and then explicitly ignore it?

I don't see why you'd want this.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [ADMIN] pg_basebackup blocking all queries with horrible performance

От
Robert Haas
Дата:
On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus@hagander.net> wrote:
>>>> You agreed to add something like NOSYNC option into START_REPLICATION command?
>>>
>>> I'm on the fence. I was hoping somebody else would chime in with an
>>> opinion as well.
>>
>> +1
>
> Nobody else with any opinion on this? :(

I don't think we really need a NOSYNC flag at this point.  Just not
setting the flush location in clients that make a point of flushing in
a timely fashion seems fine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [ADMIN] pg_basebackup blocking all queries with horrible performance

От
Fujii Masao
Дата:
On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus@hagander.net> wrote:
>>>>> You agreed to add something like NOSYNC option into START_REPLICATION command?
>>>>
>>>> I'm on the fence. I was hoping somebody else would chime in with an
>>>> opinion as well.
>>>
>>> +1
>>
>> Nobody else with any opinion on this? :(
>
> I don't think we really need a NOSYNC flag at this point.  Just not
> setting the flush location in clients that make a point of flushing in
> a timely fashion seems fine.

Okay, I'm in the minority, so I'm writing the patch that way. WIP
patch attached.

In the patch, pg_basebackup background process and pg_receivexlog always
return invalid location as flush one, and will never become sync standby even
if their name is in synchronous_standby_names. The timing of their sending
the reply depends on the standby_message_timeout specified in -s option. So
the write position may lag behind the true position.

pg_receivexlog accepts new option -S (better option character?). If this option
is specified, pg_receivexlog returns true flush position, and can become sync
standby. It sends back the reply to the master each time the write position
changes or the timeout passes. If synchronous_commit is set to remote_write,
synchronous replication to pg_receivexlog would work well.

The patch needs more documentation. But I think that it's worth reviewing the
code in advance, so I attached the WIP patch. Comments? Objections?

The patch is based on current HEAD, i.e., 9.3dev. If the patch is applied,
we need to write the backport version of the patch for 9.2.

Regards,

--
Fujii Masao

Вложения

Re: [ADMIN] pg_basebackup blocking all queries with horrible performance

От
Simon Riggs
Дата:
On 27 June 2012 18:24, Fujii Masao <masao.fujii@gmail.com> wrote:

> will never become sync standby even
> if their name is in synchronous_standby_names.

I don't understand why you'd want that.

What is wrong with removing the name from synchronous_standby_names if
you don't like it?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [ADMIN] pg_basebackup blocking all queries with horrible performance

От
Magnus Hagander
Дата:
On Wed, Jun 27, 2012 at 7:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 27 June 2012 18:24, Fujii Masao <masao.fujii@gmail.com> wrote:
>
>> will never become sync standby even
>> if their name is in synchronous_standby_names.
>
> I don't understand why you'd want that.
>
> What is wrong with removing the name from synchronous_standby_names if
> you don't like it?

I believe that's just fallout -the main point is that we don't want to
become a sync standby when synchronous_standby_names is set to '*'.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [ADMIN] pg_basebackup blocking all queries with horrible performance

От
Magnus Hagander
Дата:
On Wed, Jun 27, 2012 at 7:24 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus@hagander.net> wrote:
>>>>>> You agreed to add something like NOSYNC option into START_REPLICATION command?
>>>>>
>>>>> I'm on the fence. I was hoping somebody else would chime in with an
>>>>> opinion as well.
>>>>
>>>> +1
>>>
>>> Nobody else with any opinion on this? :(
>>
>> I don't think we really need a NOSYNC flag at this point.  Just not
>> setting the flush location in clients that make a point of flushing in
>> a timely fashion seems fine.
>
> Okay, I'm in the minority, so I'm writing the patch that way. WIP
> patch attached.
>
> In the patch, pg_basebackup background process and pg_receivexlog always
> return invalid location as flush one, and will never become sync standby even
> if their name is in synchronous_standby_names. The timing of their sending

That doesn't match with the patch, afaics. The patch always sets the
correct write location, which means it can become a remote_write
synchronous standby, no? It will only send it back when timeout
expires, but it will be sent back.

I wonder if that might actually be a more reasonable mode of operation
in general:

* always send back the write position, at the write interval
* always send back the flush position, when we're flushing (meaning
when we switch xlog)

have an option that makes it possible to:
* always send back the write position as soon as it changes (making
for a reasonable remote_write sync standby)
* actually flush the log after each write instead of end of file
(making for a reasonable full sync standby)

meaning you'd have something like "pg_receivexlog --sync=write" and
"pg_receivexlog --sync=flush" controlling it instead.

And deal with the "user put * in synchronous_standby_names and
accidentally got pg_receivexlog as the sync standby" by more clearly
warning people not to use * for that parameter... Since it's simply
dangerous :)


> the reply depends on the standby_message_timeout specified in -s option. So
> the write position may lag behind the true position.
>
> pg_receivexlog accepts new option -S (better option character?). If this option
> is specified, pg_receivexlog returns true flush position, and can become sync
> standby. It sends back the reply to the master each time the write position
> changes or the timeout passes. If synchronous_commit is set to remote_write,
> synchronous replication to pg_receivexlog would work well.

Yeah, I hadn't considered the remote_write mode, but I guess that's
why you have to track the current write position across loads, which
first confused me.

Looking at some other usecases for this, I wonder if we should also
force a status message whenever we switch xlog files, even if we
aren't running in sync mode, even if the timeout hasn't expired. I
think that would be a reasonable thing to do, since you often want to
track things based on files.

> The patch needs more documentation. But I think that it's worth reviewing the
> code in advance, so I attached the WIP patch. Comments? Objections?

Looking at the code, what exactly prompts the changes to the backend
side? That seems unrelated? Are we actually considering picking a
standby with InvalidXlogRecPtr as a sync standby today?

Isn't it enough to just send the proper write and flush locations from
the frontend?

> The patch is based on current HEAD, i.e., 9.3dev. If the patch is applied,
> we need to write the backport version of the patch for 9.2.

Oh, conflicts with Heikkis xlog patches, right? Ugh. But yeah.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [ADMIN] pg_basebackup blocking all queries with horrible performance

От
Fujii Masao
Дата:
Thanks for the review!

On Fri, Jun 29, 2012 at 7:22 PM, Magnus Hagander <magnus@hagander.net> wrote:
> On Wed, Jun 27, 2012 at 7:24 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus@hagander.net> wrote:
>>>>>>> You agreed to add something like NOSYNC option into START_REPLICATION command?
>>>>>>
>>>>>> I'm on the fence. I was hoping somebody else would chime in with an
>>>>>> opinion as well.
>>>>>
>>>>> +1
>>>>
>>>> Nobody else with any opinion on this? :(
>>>
>>> I don't think we really need a NOSYNC flag at this point.  Just not
>>> setting the flush location in clients that make a point of flushing in
>>> a timely fashion seems fine.
>>
>> Okay, I'm in the minority, so I'm writing the patch that way. WIP
>> patch attached.
>>
>> In the patch, pg_basebackup background process and pg_receivexlog always
>> return invalid location as flush one, and will never become sync standby even
>> if their name is in synchronous_standby_names. The timing of their sending
>
> That doesn't match with the patch, afaics. The patch always sets the
> correct write location, which means it can become a remote_write
> synchronous standby, no? It will only send it back when timeout
> expires, but it will be sent back.

No. Though correct write location is sent back, they don't become sync standby
because flush location is always invalid. While flush location is
invalid, the master
will never regard the remote server as sync one even if synchronous_commit is
set to remote_write.

>
> I wonder if that might actually be a more reasonable mode of operation
> in general:
>
> * always send back the write position, at the write interval
> * always send back the flush position, when we're flushing (meaning
> when we switch xlog)
>
> have an option that makes it possible to:
> * always send back the write position as soon as it changes (making
> for a reasonable remote_write sync standby)
> * actually flush the log after each write instead of end of file
> (making for a reasonable full sync standby)
>
> meaning you'd have something like "pg_receivexlog --sync=write" and
> "pg_receivexlog --sync=flush" controlling it instead.

Yeah, in this way, pg_receivexlog can become sync even if
synchronous_commit is on, which seems more useful. But
I'm thinking that the synchronous pg_receivexlog stuff should
be postponed to 9.3 because its patch seems to become too
big to apply at this beta stage. So, in 9.2, to fix the problem,
what about just applying the simple patch which prevents
pg_basebackup background process and pg_receivexlog from
becoming sync standby whatever synchronous_standby_names
and synchronous_commit are set to?

> And deal with the "user put * in synchronous_standby_names and
> accidentally got pg_receivexlog as the sync standby" by more clearly
> warning people not to use * for that parameter... Since it's simply
> dangerous :)

Yep.

>> the reply depends on the standby_message_timeout specified in -s option. So
>> the write position may lag behind the true position.
>>
>> pg_receivexlog accepts new option -S (better option character?). If this option
>> is specified, pg_receivexlog returns true flush position, and can become sync
>> standby. It sends back the reply to the master each time the write position
>> changes or the timeout passes. If synchronous_commit is set to remote_write,
>> synchronous replication to pg_receivexlog would work well.
>
> Yeah, I hadn't considered the remote_write mode, but I guess that's
> why you have to track the current write position across loads, which
> first confused me.

The patch has to track the current write location to decide whether to send
back the reply to the master, IOW to know whether the write location
has changed, IOW to know whether we've already sent the reply about
the latest write location.

> Looking at some other usecases for this, I wonder if we should also
> force a status message whenever we switch xlog files, even if we
> aren't running in sync mode, even if the timeout hasn't expired. I
> think that would be a reasonable thing to do, since you often want to
> track things based on files.

You mean that the pg_receivexlog should send back the correct flush
location whenever it switches xlog files?

>> The patch needs more documentation. But I think that it's worth reviewing the
>> code in advance, so I attached the WIP patch. Comments? Objections?
>
> Looking at the code, what exactly prompts the changes to the backend
> side? That seems unrelated? Are we actually considering picking a
> standby with InvalidXlogRecPtr as a sync standby today?
>
> Isn't it enough to just send the proper write and flush locations from
> the frontend?

No, unless I'm missing something.

The problem that we should address first is that the master can pick up
pg_basebackup background process and pg_receivexlog as a sync
standby even if they always return an invalid write and flush positions.
Since they don't send any correct write and flush positions, if they are
accidentally regarded as sync standby, all transactions can get blocked
infinitely. So the patch needed to change the walsender code so that it
doesn't pick up the remote server as sync one while its flush position
is invalid.

>> The patch is based on current HEAD, i.e., 9.3dev. If the patch is applied,
>> we need to write the backport version of the patch for 9.2.
>
> Oh, conflicts with Heikkis xlog patches, right? Ugh. But yeah.

Yep.

Regards,

-- 
Fujii Masao


Re: [ADMIN] pg_basebackup blocking all queries with horrible performance

От
Magnus Hagander
Дата:
On Sun, Jul 1, 2012 at 7:14 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Fri, Jun 29, 2012 at 7:22 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> On Wed, Jun 27, 2012 at 7:24 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus@hagander.net> wrote:
>>>>>>>> You agreed to add something like NOSYNC option into START_REPLICATION command?
>>>>>>>
>>>>>>> I'm on the fence. I was hoping somebody else would chime in with an
>>>>>>> opinion as well.
>>>>>>
>>>>>> +1
>>>>>
>>>>> Nobody else with any opinion on this? :(
>>>>
>>>> I don't think we really need a NOSYNC flag at this point.  Just not
>>>> setting the flush location in clients that make a point of flushing in
>>>> a timely fashion seems fine.
>>>
>>> Okay, I'm in the minority, so I'm writing the patch that way. WIP
>>> patch attached.
>>>
>>> In the patch, pg_basebackup background process and pg_receivexlog always
>>> return invalid location as flush one, and will never become sync standby even
>>> if their name is in synchronous_standby_names. The timing of their sending
>>
>> That doesn't match with the patch, afaics. The patch always sets the
>> correct write location, which means it can become a remote_write
>> synchronous standby, no? It will only send it back when timeout
>> expires, but it will be sent back.
>
> No. Though correct write location is sent back, they don't become sync standby
> because flush location is always invalid. While flush location is
> invalid, the master
> will never regard the remote server as sync one even if synchronous_commit is
> set to remote_write.

Oh. I wasn't aware of that part.


>> I wonder if that might actually be a more reasonable mode of operation
>> in general:
>>
>> * always send back the write position, at the write interval
>> * always send back the flush position, when we're flushing (meaning
>> when we switch xlog)
>>
>> have an option that makes it possible to:
>> * always send back the write position as soon as it changes (making
>> for a reasonable remote_write sync standby)
>> * actually flush the log after each write instead of end of file
>> (making for a reasonable full sync standby)
>>
>> meaning you'd have something like "pg_receivexlog --sync=write" and
>> "pg_receivexlog --sync=flush" controlling it instead.
>
> Yeah, in this way, pg_receivexlog can become sync even if
> synchronous_commit is on, which seems more useful. But
> I'm thinking that the synchronous pg_receivexlog stuff should
> be postponed to 9.3 because its patch seems to become too
> big to apply at this beta stage. So, in 9.2, to fix the problem,
> what about just applying the simple patch which prevents
> pg_basebackup background process and pg_receivexlog from
> becoming sync standby whatever synchronous_standby_names
> and synchronous_commit are set to?

Agreed.

With the addition that we should set the write location, because
that's very useful and per what you said above should be perfectly
safe.


>> And deal with the "user put * in synchronous_standby_names and
>> accidentally got pg_receivexlog as the sync standby" by more clearly
>> warning people not to use * for that parameter... Since it's simply
>> dangerous :)
>
> Yep.

What would be  good wording? Something along the line of "Using the *
entry is not recommended since it can lead to unexpected results when
new standbys are added" or something like that?


>>> the reply depends on the standby_message_timeout specified in -s option. So
>>> the write position may lag behind the true position.
>>>
>>> pg_receivexlog accepts new option -S (better option character?). If this option
>>> is specified, pg_receivexlog returns true flush position, and can become sync
>>> standby. It sends back the reply to the master each time the write position
>>> changes or the timeout passes. If synchronous_commit is set to remote_write,
>>> synchronous replication to pg_receivexlog would work well.
>>
>> Yeah, I hadn't considered the remote_write mode, but I guess that's
>> why you have to track the current write position across loads, which
>> first confused me.
>
> The patch has to track the current write location to decide whether to send
> back the reply to the master, IOW to know whether the write location
> has changed, IOW to know whether we've already sent the reply about
> the latest write location.

Yeha, makes perfect sense.


>> Looking at some other usecases for this, I wonder if we should also
>> force a status message whenever we switch xlog files, even if we
>> aren't running in sync mode, even if the timeout hasn't expired. I
>> think that would be a reasonable thing to do, since you often want to
>> track things based on files.
>
> You mean that the pg_receivexlog should send back the correct flush
> location whenever it switches xlog files?

No, I mean just send back a status message. Meaning that without
specifiying the sync modes per above, it would send back the *write*
location. This would be useful for tracking xlog filenames between
master and pg_receivexlog, without extra delay.

>>> The patch needs more documentation. But I think that it's worth reviewing the
>>> code in advance, so I attached the WIP patch. Comments? Objections?
>>
>> Looking at the code, what exactly prompts the changes to the backend
>> side? That seems unrelated? Are we actually considering picking a
>> standby with InvalidXlogRecPtr as a sync standby today?
>>
>> Isn't it enough to just send the proper write and flush locations from
>> the frontend?
>
> No, unless I'm missing something.
>
> The problem that we should address first is that the master can pick up
> pg_basebackup background process and pg_receivexlog as a sync
> standby even if they always return an invalid write and flush positions.
> Since they don't send any correct write and flush positions, if they are
> accidentally regarded as sync standby, all transactions can get blocked
> infinitely. So the patch needed to change the walsender code so that it
> doesn't pick up the remote server as sync one while its flush position
> is invalid.

Yeah, that is clearly wrong. I think I missed this behaviour, and got
confused by the fact that the patch was trying to fix two different
things - only one of which I was aware of.

So yes, per above, let's isolate out this part as one patch and get
that into 9.2, along with the "set the proper write location", but
leave everything else for 9.3.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [ADMIN] pg_basebackup blocking all queries with horrible performance

От
Fujii Masao
Дата:
On Mon, Jul 2, 2012 at 4:01 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Sun, Jul 1, 2012 at 7:14 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>
>> On Fri, Jun 29, 2012 at 7:22 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>> On Wed, Jun 27, 2012 at 7:24 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>> On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>>> On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus@hagander.net> wrote:
>>>>>>>>> You agreed to add something like NOSYNC option into START_REPLICATION command?
>>>>>>>>
>>>>>>>> I'm on the fence. I was hoping somebody else would chime in with an
>>>>>>>> opinion as well.
>>>>>>>
>>>>>>> +1
>>>>>>
>>>>>> Nobody else with any opinion on this? :(
>>>>>
>>>>> I don't think we really need a NOSYNC flag at this point.  Just not
>>>>> setting the flush location in clients that make a point of flushing in
>>>>> a timely fashion seems fine.
>>>>
>>>> Okay, I'm in the minority, so I'm writing the patch that way. WIP
>>>> patch attached.
>>>>
>>>> In the patch, pg_basebackup background process and pg_receivexlog always
>>>> return invalid location as flush one, and will never become sync standby even
>>>> if their name is in synchronous_standby_names. The timing of their sending
>>>
>>> That doesn't match with the patch, afaics. The patch always sets the
>>> correct write location, which means it can become a remote_write
>>> synchronous standby, no? It will only send it back when timeout
>>> expires, but it will be sent back.
>>
>> No. Though correct write location is sent back, they don't become sync standby
>> because flush location is always invalid. While flush location is
>> invalid, the master
>> will never regard the remote server as sync one even if synchronous_commit is
>> set to remote_write.
>
> Oh. I wasn't aware of that part.
>
>
>>> I wonder if that might actually be a more reasonable mode of operation
>>> in general:
>>>
>>> * always send back the write position, at the write interval
>>> * always send back the flush position, when we're flushing (meaning
>>> when we switch xlog)
>>>
>>> have an option that makes it possible to:
>>> * always send back the write position as soon as it changes (making
>>> for a reasonable remote_write sync standby)
>>> * actually flush the log after each write instead of end of file
>>> (making for a reasonable full sync standby)
>>>
>>> meaning you'd have something like "pg_receivexlog --sync=write" and
>>> "pg_receivexlog --sync=flush" controlling it instead.
>>
>> Yeah, in this way, pg_receivexlog can become sync even if
>> synchronous_commit is on, which seems more useful. But
>> I'm thinking that the synchronous pg_receivexlog stuff should
>> be postponed to 9.3 because its patch seems to become too
>> big to apply at this beta stage. So, in 9.2, to fix the problem,
>> what about just applying the simple patch which prevents
>> pg_basebackup background process and pg_receivexlog from
>> becoming sync standby whatever synchronous_standby_names
>> and synchronous_commit are set to?
>
> Agreed.
>
> With the addition that we should set the write location, because
> that's very useful and per what you said above should be perfectly
> safe.
>
>
>>> And deal with the "user put * in synchronous_standby_names and
>>> accidentally got pg_receivexlog as the sync standby" by more clearly
>>> warning people not to use * for that parameter... Since it's simply
>>> dangerous :)
>>
>> Yep.
>
> What would be  good wording? Something along the line of "Using the *
> entry is not recommended since it can lead to unexpected results when
> new standbys are added" or something like that?
>
>
>>>> the reply depends on the standby_message_timeout specified in -s option. So
>>>> the write position may lag behind the true position.
>>>>
>>>> pg_receivexlog accepts new option -S (better option character?). If this option
>>>> is specified, pg_receivexlog returns true flush position, and can become sync
>>>> standby. It sends back the reply to the master each time the write position
>>>> changes or the timeout passes. If synchronous_commit is set to remote_write,
>>>> synchronous replication to pg_receivexlog would work well.
>>>
>>> Yeah, I hadn't considered the remote_write mode, but I guess that's
>>> why you have to track the current write position across loads, which
>>> first confused me.
>>
>> The patch has to track the current write location to decide whether to send
>> back the reply to the master, IOW to know whether the write location
>> has changed, IOW to know whether we've already sent the reply about
>> the latest write location.
>
> Yeha, makes perfect sense.
>
>
>>> Looking at some other usecases for this, I wonder if we should also
>>> force a status message whenever we switch xlog files, even if we
>>> aren't running in sync mode, even if the timeout hasn't expired. I
>>> think that would be a reasonable thing to do, since you often want to
>>> track things based on files.
>>
>> You mean that the pg_receivexlog should send back the correct flush
>> location whenever it switches xlog files?
>
> No, I mean just send back a status message. Meaning that without
> specifiying the sync modes per above, it would send back the *write*
> location. This would be useful for tracking xlog filenames between
> master and pg_receivexlog, without extra delay.
>
>>>> The patch needs more documentation. But I think that it's worth reviewing the
>>>> code in advance, so I attached the WIP patch. Comments? Objections?
>>>
>>> Looking at the code, what exactly prompts the changes to the backend
>>> side? That seems unrelated? Are we actually considering picking a
>>> standby with InvalidXlogRecPtr as a sync standby today?
>>>
>>> Isn't it enough to just send the proper write and flush locations from
>>> the frontend?
>>
>> No, unless I'm missing something.
>>
>> The problem that we should address first is that the master can pick up
>> pg_basebackup background process and pg_receivexlog as a sync
>> standby even if they always return an invalid write and flush positions.
>> Since they don't send any correct write and flush positions, if they are
>> accidentally regarded as sync standby, all transactions can get blocked
>> infinitely. So the patch needed to change the walsender code so that it
>> doesn't pick up the remote server as sync one while its flush position
>> is invalid.
>
> Yeah, that is clearly wrong. I think I missed this behaviour, and got
> confused by the fact that the patch was trying to fix two different
> things - only one of which I was aware of.
>
> So yes, per above, let's isolate out this part as one patch and get
> that into 9.2, along with the "set the proper write location", but
> leave everything else for 9.3.

Agreed. The attached patch always sets the correct write location and
prevents the remote server sending back invalid flush location from
becoming sync standby.

Regards,

--
Fujii Masao

Вложения

Re: [ADMIN] pg_basebackup blocking all queries with horrible performance

От
Magnus Hagander
Дата:
On Mon, Jul 2, 2012 at 8:17 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Mon, Jul 2, 2012 at 4:01 AM, Magnus Hagander <magnus@hagander.net> wrote:
>> On Sun, Jul 1, 2012 at 7:14 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>
>>> On Fri, Jun 29, 2012 at 7:22 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>>> On Wed, Jun 27, 2012 at 7:24 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>>> On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>>>> On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus@hagander.net> wrote:
>>>>>>>>>> You agreed to add something like NOSYNC option into START_REPLICATION command?
>>>>>>>>>
>>>>>>>>> I'm on the fence. I was hoping somebody else would chime in with an
>>>>>>>>> opinion as well.
>>>>>>>>
>>>>>>>> +1
>>>>>>>
>>>>>>> Nobody else with any opinion on this? :(
>>>>>>
>>>>>> I don't think we really need a NOSYNC flag at this point.  Just not
>>>>>> setting the flush location in clients that make a point of flushing in
>>>>>> a timely fashion seems fine.
>>>>>
>>>>> Okay, I'm in the minority, so I'm writing the patch that way. WIP
>>>>> patch attached.
>>>>>
>>>>> In the patch, pg_basebackup background process and pg_receivexlog always
>>>>> return invalid location as flush one, and will never become sync standby even
>>>>> if their name is in synchronous_standby_names. The timing of their sending
>>>>
>>>> That doesn't match with the patch, afaics. The patch always sets the
>>>> correct write location, which means it can become a remote_write
>>>> synchronous standby, no? It will only send it back when timeout
>>>> expires, but it will be sent back.
>>>
>>> No. Though correct write location is sent back, they don't become sync standby
>>> because flush location is always invalid. While flush location is
>>> invalid, the master
>>> will never regard the remote server as sync one even if synchronous_commit is
>>> set to remote_write.
>>
>> Oh. I wasn't aware of that part.
>>
>>
>>>> I wonder if that might actually be a more reasonable mode of operation
>>>> in general:
>>>>
>>>> * always send back the write position, at the write interval
>>>> * always send back the flush position, when we're flushing (meaning
>>>> when we switch xlog)
>>>>
>>>> have an option that makes it possible to:
>>>> * always send back the write position as soon as it changes (making
>>>> for a reasonable remote_write sync standby)
>>>> * actually flush the log after each write instead of end of file
>>>> (making for a reasonable full sync standby)
>>>>
>>>> meaning you'd have something like "pg_receivexlog --sync=write" and
>>>> "pg_receivexlog --sync=flush" controlling it instead.
>>>
>>> Yeah, in this way, pg_receivexlog can become sync even if
>>> synchronous_commit is on, which seems more useful. But
>>> I'm thinking that the synchronous pg_receivexlog stuff should
>>> be postponed to 9.3 because its patch seems to become too
>>> big to apply at this beta stage. So, in 9.2, to fix the problem,
>>> what about just applying the simple patch which prevents
>>> pg_basebackup background process and pg_receivexlog from
>>> becoming sync standby whatever synchronous_standby_names
>>> and synchronous_commit are set to?
>>
>> Agreed.
>>
>> With the addition that we should set the write location, because
>> that's very useful and per what you said above should be perfectly
>> safe.
>>
>>
>>>> And deal with the "user put * in synchronous_standby_names and
>>>> accidentally got pg_receivexlog as the sync standby" by more clearly
>>>> warning people not to use * for that parameter... Since it's simply
>>>> dangerous :)
>>>
>>> Yep.
>>
>> What would be  good wording? Something along the line of "Using the *
>> entry is not recommended since it can lead to unexpected results when
>> new standbys are added" or something like that?
>>
>>
>>>>> the reply depends on the standby_message_timeout specified in -s option. So
>>>>> the write position may lag behind the true position.
>>>>>
>>>>> pg_receivexlog accepts new option -S (better option character?). If this option
>>>>> is specified, pg_receivexlog returns true flush position, and can become sync
>>>>> standby. It sends back the reply to the master each time the write position
>>>>> changes or the timeout passes. If synchronous_commit is set to remote_write,
>>>>> synchronous replication to pg_receivexlog would work well.
>>>>
>>>> Yeah, I hadn't considered the remote_write mode, but I guess that's
>>>> why you have to track the current write position across loads, which
>>>> first confused me.
>>>
>>> The patch has to track the current write location to decide whether to send
>>> back the reply to the master, IOW to know whether the write location
>>> has changed, IOW to know whether we've already sent the reply about
>>> the latest write location.
>>
>> Yeha, makes perfect sense.
>>
>>
>>>> Looking at some other usecases for this, I wonder if we should also
>>>> force a status message whenever we switch xlog files, even if we
>>>> aren't running in sync mode, even if the timeout hasn't expired. I
>>>> think that would be a reasonable thing to do, since you often want to
>>>> track things based on files.
>>>
>>> You mean that the pg_receivexlog should send back the correct flush
>>> location whenever it switches xlog files?
>>
>> No, I mean just send back a status message. Meaning that without
>> specifiying the sync modes per above, it would send back the *write*
>> location. This would be useful for tracking xlog filenames between
>> master and pg_receivexlog, without extra delay.
>>
>>>>> The patch needs more documentation. But I think that it's worth reviewing the
>>>>> code in advance, so I attached the WIP patch. Comments? Objections?
>>>>
>>>> Looking at the code, what exactly prompts the changes to the backend
>>>> side? That seems unrelated? Are we actually considering picking a
>>>> standby with InvalidXlogRecPtr as a sync standby today?
>>>>
>>>> Isn't it enough to just send the proper write and flush locations from
>>>> the frontend?
>>>
>>> No, unless I'm missing something.
>>>
>>> The problem that we should address first is that the master can pick up
>>> pg_basebackup background process and pg_receivexlog as a sync
>>> standby even if they always return an invalid write and flush positions.
>>> Since they don't send any correct write and flush positions, if they are
>>> accidentally regarded as sync standby, all transactions can get blocked
>>> infinitely. So the patch needed to change the walsender code so that it
>>> doesn't pick up the remote server as sync one while its flush position
>>> is invalid.
>>
>> Yeah, that is clearly wrong. I think I missed this behaviour, and got
>> confused by the fact that the patch was trying to fix two different
>> things - only one of which I was aware of.
>>
>> So yes, per above, let's isolate out this part as one patch and get
>> that into 9.2, along with the "set the proper write location", but
>> leave everything else for 9.3.
>
> Agreed. The attached patch always sets the correct write location and
> prevents the remote server sending back invalid flush location from
> becoming sync standby.

Thanks, applied.

I also put the prevent invalid flush location from becoming a sync
standby part on 9.1 (required only a minor change -
GetXLogReplayRecPtr() didn't take an argument back then).

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/