Обсуждение: pgsql: Use a latch to make startup process wake up and replay

Поиск
Список
Период
Сортировка

pgsql: Use a latch to make startup process wake up and replay

От
heikki@postgresql.org (Heikki Linnakangas)
Дата:
Log Message:
-----------
Use a latch to make startup process wake up and replay immediately when
new WAL arrives via streaming replication. This reduces the latency, and
also allows us to use a longer polling interval, which is good for energy
efficiency.

We still need to poll to check for the appearance of a trigger file, but
the interval is now 5 seconds (instead of 100ms), like when waiting for
a new WAL segment to appear in WAL archive.

Modified Files:
--------------
    pgsql/src/backend/access/transam:
        xlog.c (r1.434 -> r1.435)
        (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/transam/xlog.c?r1=1.434&r2=1.435)
    pgsql/src/backend/replication:
        walreceiver.c (r1.16 -> r1.17)
        (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/replication/walreceiver.c?r1=1.16&r2=1.17)
    pgsql/src/include/access:
        xlog.h (r1.116 -> r1.117)
        (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/access/xlog.h?r1=1.116&r2=1.117)

Re: pgsql: Use a latch to make startup process wake up and replay

От
Fujii Masao
Дата:
On Wed, Sep 15, 2010 at 7:35 PM, Heikki Linnakangas
<heikki@postgresql.org> wrote:
> Log Message:
> -----------
> Use a latch to make startup process wake up and replay immediately when
> new WAL arrives via streaming replication. This reduces the latency, and
> also allows us to use a longer polling interval, which is good for energy
> efficiency.
>
> We still need to poll to check for the appearance of a trigger file, but
> the interval is now 5 seconds (instead of 100ms), like when waiting for
> a new WAL segment to appear in WAL archive.

Good work!

+     /*
+      * Take ownership of the wakup latch if we're going to sleep during
+      * recovery.
+      */

I found a typo: s/wakup/wakeup/

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: pgsql: Use a latch to make startup process wake up and replay

От
Thom Brown
Дата:
On 15 September 2010 11:35, Heikki Linnakangas <heikki@postgresql.org> wrote:
> Log Message:
> -----------
> Use a latch to make startup process wake up and replay immediately when
> new WAL arrives via streaming replication. This reduces the latency, and
> also allows us to use a longer polling interval, which is good for energy
> efficiency.
>
> We still need to poll to check for the appearance of a trigger file, but
> the interval is now 5 seconds (instead of 100ms), like when waiting for
> a new WAL segment to appear in WAL archive.
>
> Modified Files:
> --------------
>    pgsql/src/backend/access/transam:
>        xlog.c (r1.434 -> r1.435)
>        (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/transam/xlog.c?r1=1.434&r2=1.435)
>    pgsql/src/backend/replication:
>        walreceiver.c (r1.16 -> r1.17)
>        (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/replication/walreceiver.c?r1=1.16&r2=1.17)
>    pgsql/src/include/access:
>        xlog.h (r1.116 -> r1.117)
>        (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/access/xlog.h?r1=1.116&r2=1.117)
>
> --

+      * We don't need the latch anymore. It's not strictly necessary to disown
+      * it, but let's do it for the sake of tidyness.
+      */

s/tidyness/tidiness/

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Re: pgsql: Use a latch to make startup process wake up and replay

От
Heikki Linnakangas
Дата:
On 15/09/10 14:14, Fujii Masao wrote:
 > +     /*
 > +      * Take ownership of the wakup latch if we're going to sleep during
 > +      * recovery.
 > +      */
 >
 > I found a typo: s/wakup/wakeup/
 >

On 15/09/10 14:48, Thom Brown wrote:
> +      * We don't need the latch anymore. It's not strictly necessary to disown
> +      * it, but let's do it for the sake of tidyness.
> +      */
>
> s/tidyness/tidiness/

Fixed both of these. Thank you.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: pgsql: Use a latch to make startup process wake up and replay

От
Simon Riggs
Дата:
On Wed, 2010-09-15 at 20:14 +0900, Fujii Masao wrote:
> On Wed, Sep 15, 2010 at 7:35 PM, Heikki Linnakangas
> <heikki@postgresql.org> wrote:
> > Log Message:
> > -----------
> > Use a latch to make startup process wake up and replay immediately when
> > new WAL arrives via streaming replication. This reduces the latency, and
> > also allows us to use a longer polling interval, which is good for energy
> > efficiency.
> >
> > We still need to poll to check for the appearance of a trigger file, but
> > the interval is now 5 seconds (instead of 100ms), like when waiting for
> > a new WAL segment to appear in WAL archive.
>
> Good work!

No, not good work.

You both know very well that I'm working on this area also and these
commits are not agreed... yet. They might not be contended but they are
very likely to break my patch, again.

Please desist while we resolve which are the good ideas and which are
not. We won't know that if you keep breaking other people's patches in a
stream of commits that prevent anybody completing other options.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


Re: [HACKERS] Re: pgsql: Use a latch to make startup process wake up and replay

От
David Fetter
Дата:
On Wed, Sep 15, 2010 at 03:35:30PM +0100, Simon Riggs wrote:
> On Wed, 2010-09-15 at 20:14 +0900, Fujii Masao wrote:
> > On Wed, Sep 15, 2010 at 7:35 PM, Heikki Linnakangas
> > <heikki@postgresql.org> wrote:
> > > Log Message:
> > > -----------
> > > Use a latch to make startup process wake up and replay immediately when
> > > new WAL arrives via streaming replication. This reduces the latency, and
> > > also allows us to use a longer polling interval, which is good for energy
> > > efficiency.
> > >
> > > We still need to poll to check for the appearance of a trigger file, but
> > > the interval is now 5 seconds (instead of 100ms), like when waiting for
> > > a new WAL segment to appear in WAL archive.
> >
> > Good work!
>
> No, not good work.
>
> You both know very well that I'm working on this area also and these
> commits are not agreed... yet. They might not be contended but they are
> very likely to break my patch, again.
>
> Please desist while we resolve which are the good ideas and which are
> not. We won't know that if you keep breaking other people's patches in a
> stream of commits that prevent anybody completing other options.

Simon,

No matter how many times you try, you are not going to get a license
to stop all work on anything you might chance to think about.  It is
quite simply never going to happen, so you need to back off.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: [HACKERS] Re: pgsql: Use a latch to make startup process wake up and replay

От
Simon Riggs
Дата:
On Wed, 2010-09-15 at 07:59 -0700, David Fetter wrote:
> On Wed, Sep 15, 2010 at 03:35:30PM +0100, Simon Riggs wrote:

> > Please desist while we resolve which are the good ideas and which are
> > not. We won't know that if you keep breaking other people's patches in a
> > stream of commits that prevent anybody completing other options.

> No matter how many times you try, you are not going to get a license
> to stop all work on anything you might chance to think about.  It is
> quite simply never going to happen, so you need to back off.

I agree that asking people to stop work is not OK. However, I haven't
asked for development work to stop, only that commits into that area
stop until proper debate has taken place. Those might be minor commits,
but they might not. Had I made those commits, they would have been
called premature by others also.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


Re: pgsql: Use a latch to make startup process wake up and replay

От
Tom Lane
Дата:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On Wed, 2010-09-15 at 20:14 +0900, Fujii Masao wrote:
>> Good work!

> No, not good work.

> You both know very well that I'm working on this area also and these
> commits are not agreed... yet. They might not be contended but they are
> very likely to break my patch, again.

Simon, you can't just command the tide to stop coming in while you do
something that no one else can see.  This patch is perfectly reasonable,
not at all invasive, and a clear improvement over the way the code was
before.  If you can improve on it further, fine, but we're not going to
declare the code off-limits while waiting for an unspecified patch with
no firm delivery date.

            regards, tom lane

Re: [HACKERS] Re: pgsql: Use a latch to make startup process wake up and replay

От
Robert Haas
Дата:
On Wed, Sep 15, 2010 at 11:24 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I agree that asking people to stop work is not OK. However, I haven't
> asked for development work to stop, only that commits into that area
> stop until proper debate has taken place. Those might be minor commits,
> but they might not. Had I made those commits, they would have been
> called premature by others also.

I do not believe that Heikki has done anything inappropriate.  We've
spent weeks discussing the latch facility and its various
applications.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: [HACKERS] Re: pgsql: Use a latch to make startup process wake up and replay

От
Simon Riggs
Дата:
On Wed, 2010-09-15 at 12:45 -0400, Robert Haas wrote:
> On Wed, Sep 15, 2010 at 11:24 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > I agree that asking people to stop work is not OK. However, I haven't
> > asked for development work to stop, only that commits into that area
> > stop until proper debate has taken place. Those might be minor commits,
> > but they might not. Had I made those commits, they would have been
> > called premature by others also.
>
> I do not believe that Heikki has done anything inappropriate.  We've
> spent weeks discussing the latch facility and its various
> applications.

Sounds reasonable, but my comments were about this commit, not the one
that happened on Saturday. This patch was posted about 32 hours ago, and
the commit need not have taken place yet. If I had posted such a patch
and committed it knowing other work is happening in that area we both
know that you would have objected.

It's not actually a major issue, but at some point I have to ask for no
more commits, so Fujii and I can finish our patches, compare and
contrast, so the best ideas can get into Postgres.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


Re: [HACKERS] Re: pgsql: Use a latch to make startup process wake up and replay

От
Robert Haas
Дата:
On Wed, Sep 15, 2010 at 1:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, 2010-09-15 at 12:45 -0400, Robert Haas wrote:
>> On Wed, Sep 15, 2010 at 11:24 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> > I agree that asking people to stop work is not OK. However, I haven't
>> > asked for development work to stop, only that commits into that area
>> > stop until proper debate has taken place. Those might be minor commits,
>> > but they might not. Had I made those commits, they would have been
>> > called premature by others also.
>>
>> I do not believe that Heikki has done anything inappropriate.  We've
>> spent weeks discussing the latch facility and its various
>> applications.
>
> Sounds reasonable, but my comments were about this commit, not the one
> that happened on Saturday. This patch was posted about 32 hours ago, and
> the commit need not have taken place yet. If I had posted such a patch
> and committed it knowing other work is happening in that area we both
> know that you would have objected.

I've often felt that we ought to have a bit more delay between when
committers post patches and when they commit them.  I was told 24
hours and I've seen cases where people haven't even waited that long.
On the other hand, if we get to strict about it, it can easily get to
the point where it just gets in the way of progress, and certainly
some patches are far more controversial than others.  So I don't know
what the best thing to do is.  Still, I have to admit that I feel
fairly positive about the direction we're going with this particular
patch.  Clearing away these peripheral issues should make it easier
for us to have a rational discussion about the core issues around how
this is going to be configured and actually work at the protocol
level.

> It's not actually a major issue, but at some point I have to ask for no
> more commits, so Fujii and I can finish our patches, compare and
> contrast, so the best ideas can get into Postgres.

I don't think anyone is prepared to agree to that.  I think that
everyone is prepared to accept a limited amount of further delay in
pressing forward with the main part of sync rep, but I expect that no
one will be willing to freeze out incremental improvements in the
meantime, even if it does induce a certain amount of rebasing.  It's
also worth noting that Fujii Masao's patch has been around for months,
and yours isn't finished yet.  That's not to say that we don't want to
consider your ideas, because we do: and you've had more than your
share of good ones.  At the same time, it would be unfair and
unreasonable to expect work on a patch that is done, and has been done
for some time, to wait on one that isn't.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: [HACKERS] Re: pgsql: Use a latch to make startup process wake up and replay

От
Heikki Linnakangas
Дата:
On 15/09/10 20:58, Robert Haas wrote:
> On Wed, Sep 15, 2010 at 1:30 PM, Simon Riggs<simon@2ndquadrant.com>  wrote:
>> On Wed, 2010-09-15 at 12:45 -0400, Robert Haas wrote:
>>> On Wed, Sep 15, 2010 at 11:24 AM, Simon Riggs<simon@2ndquadrant.com>  wrote:
>>>> I agree that asking people to stop work is not OK. However, I haven't
>>>> asked for development work to stop, only that commits into that area
>>>> stop until proper debate has taken place. Those might be minor commits,
>>>> but they might not. Had I made those commits, they would have been
>>>> called premature by others also.
>>>
>>> I do not believe that Heikki has done anything inappropriate.  We've
>>> spent weeks discussing the latch facility and its various
>>> applications.
>>
>> Sounds reasonable, but my comments were about this commit, not the one
>> that happened on Saturday. This patch was posted about 32 hours ago, and
>> the commit need not have taken place yet. If I had posted such a patch
>> and committed it knowing other work is happening in that area we both
>> know that you would have objected.
>
> I've often felt that we ought to have a bit more delay between when
> committers post patches and when they commit them.  I was told 24
> hours and I've seen cases where people haven't even waited that long.
> On the other hand, if we get to strict about it, it can easily get to
> the point where it just gets in the way of progress, and certainly
> some patches are far more controversial than others.  So I don't know
> what the best thing to do is.

With anything non-trivial, I try to "sleep on it" before committing.
More with complicated patches, but it's really up to your own comfort
level with the patch, and whether you think anyone might have different
opinions on it. I don't mind quick commits if it's something that has
been discussed in the past and the committer thinks it's
non-controversial. There's always the option of complaining afterwards.
If it comes to that, though, it wasn't really ripe for committing yet.
(That doesn't apply to gripes about typos or something like that,
because that happens to me way too often ;-) )

>  Still, I have to admit that I feel
> fairly positive about the direction we're going with this particular
> patch.  Clearing away these peripheral issues should make it easier
> for us to have a rational discussion about the core issues around how
> this is going to be configured and actually work at the protocol
> level.

Yeah, I don't think anyone has any qualms about the substance of these
patches.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: [HACKERS] Re: pgsql: Use a latch to make startup process wake up and replay

От
Simon Riggs
Дата:
On Wed, 2010-09-15 at 13:58 -0400, Robert Haas wrote:
> > It's not actually a major issue, but at some point I have to ask for
> no
> > more commits, so Fujii and I can finish our patches, compare and
> > contrast, so the best ideas can get into Postgres.
>
> I don't think anyone is prepared to agree to that.  I think that
> everyone is prepared to accept a limited amount of further delay in
> pressing forward with the main part of sync rep, but I expect that no
> one will be willing to freeze out incremental improvements in the
> meantime, even if it does induce a certain amount of rebasing.

> It's
> also worth noting that Fujii Masao's patch has been around for months,
> and yours isn't finished yet.  That's not to say that we don't want to
> consider your ideas, because we do: and you've had more than your
> share of good ones.  At the same time, it would be unfair and
> unreasonable to expect work on a patch that is done, and has been done
> for some time, to wait on one that isn't.

I understand your viewpoint there. I'm sure we all agree sync rep is a
very important feature that must get into the next release.

The only reason my patch exists is because debate around my ideas was
ruled out on various grounds. One of those was it would take so long to
develop we shouldn't risk not getting sync rep in this release. I am
amenable to such arguments (and I make the same one on MERGE, btw, where
I am getting seriously worried) but the reality is that there is
actually very little code here and we can definitely do this, whatever
ideas we pick. I've shown this by providing an almost working version in
about 4 days work. Will finishing it help?

We definitely have the time, so the question is, what are the best
ideas? We must discuss the ideas properly, not just plough forwards
claiming time pressure when it isn't actually an issue at all. We *need*
to put the tools down and talk in detail about the best way forwards.

Before, I had no patch. Now mine "isn't finished". At what point will my
ideas be reviewed without instant dismissal? If we accept your seniority
argument, then "never" because even if I finish it you'll say "Fujii was
there first".

If who mentioned it first was important, then I'd say I've been
discussing this for literally years (late 2006) and have regularly
explained the benefits of the master-side approach I've outlined on list
every time this has come up (every few months). I have also explained
the implementation details many times as well an I'm happy to say that
latches are pretty much exactly what I described earlier. (I called them
LSN queues, similar to lwlocks, IIRC). But thats not the whole deal.

If we simply wanted a patch that was "done" we would have gone with
Zoltan's wouldn't we, based on the seniority argument you use above?
Zoltan's patch didn't perform well at all. Fujii's performs much better.
However, my proposed approach offers even better performance, so
whatever argument you use to include Fujii's also applies to mine
doesn't it? But that's silly and divisive, its not about who's patch
"wins" is it?

Do we have to benchmark multiple patches to prove which is best? If
that's the criteria I'll finish my patch and demonstrate that.

But it doesn't make sense to start committing pieces of Fujii's patch,
so that I can't ever keep up and as a result "Simon never finished his
patch, but it sounded good".

Next steps should be: tools down, discuss what to do. Then go forwards.

We have time, so lets discuss all of the ideas on the table not just
some of them.

For me this is not about the number or names of parameters, its about
master-side control of sync rep and having very good performance.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


Re: [HACKERS] Re: pgsql: Use a latch to make startup process wake up and replay

От
Robert Haas
Дата:
On Wed, Sep 15, 2010 at 3:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Will finishing it help?

Yes, I expect that to help a lot.

> Before, I had no patch. Now mine "isn't finished". At what point will my
> ideas be reviewed without instant dismissal? If we accept your seniority
> argument, then "never" because even if I finish it you'll say "Fujii was
> there first".

I said very clearly in my previous email that "I think that everyone
is prepared to accept a limited amount of further delay in pressing
forward with the main part of sync rep".  In other words, I think
everyone is willing to consider your ideas provided that they are
submitted in a form which everyone can understand and think through
sometime soon.  I am not, nor do I think anyone is, saying that we
don't wish to consider your ideas.  I'm actually really pleased that
you are only a day or two from having a working patch.  It can be much
easier to conceptualize a patch than to find the time to finish it
(unfortunately, this problem has overtaken me rather badly in the last
few weeks, which is why I have no new patches in this CommitFest) and
if you can finish it up and get it out in front of everyone I expect
that to be a good thing for this feature and our community.

> Do we have to benchmark multiple patches to prove which is best? If
> that's the criteria I'll finish my patch and demonstrate that.

I was thinking about that earlier today.  I think it's definitely
possible that we'll need to do some benchmarking, although I expect
that people will want to read the code first.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: [HACKERS] Re: pgsql: Use a latch to make startup process wake up and replay

От
Fujii Masao
Дата:
On Thu, Sep 16, 2010 at 4:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> We definitely have the time, so the question is, what are the best
> ideas?

Before advancing the review of each patch, we must determine what
should be committed in 9.1, and what's in this CF.

"Synchronization level on per-transaction" feature is included in Simon's
patch, but not in mine. This is most important difference, which would
have wide-reaching impact on the implementation, e.g., protocol between
walsender and walreceiver. So, at first we should determine whether we'll
commit the feature in 9.1. Then we need to determine how far we should
implement in this CF. Thought?

Each patch provides "synchronization level on per-standby" feature. In
Simon's patch, that level is specified in the standbys's recovery.conf.
In mine, it's in the master's standbys.conf. I think that the former is simpler.
But if we support the capability to register the standbys, the latter would
be required. Which is the best?

Simon's patch seems to include simple quorum commit feature (correct
me if I'm wrong). That is, when there are multiple synchronous standbys,
the master waits until ACK has arrived from at least one standby. OTOH,
in my patch, the master waits until ACK has arrived from all the synchronous
standbys. Which should we choose? I think that we should commit my
straightforward approach first, and enable the quorum commit on that.
Thought?

Simon proposes to invoke walwriter in the standby. This is not included
in my patch, but looks good idea. ISTM that this is not essential feature
for synchronous replication, so how about detachmenting of the walwriter
part from the patch and reviewing it independently?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center