Обсуждение: [HACKERS] [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE to max_wal_send
[HACKERS] [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE to max_wal_send
От
Jonathon Nelson
Дата:
Вложения
Re: [HACKERS] [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZEto max_wal_send
От
Andres Freund
Дата:
Hi, On 2017-01-05 12:55:44 -0600, Jonathon Nelson wrote: > Attached please find a patch for PostgreSQL 9.4 which changes the maximum > amount of data that the wal sender will send at any point in time from the > hard-coded value of 128KiB to a user-controllable value up to 16MiB. It has > been primarily tested under 9.4 but there has been some testing with 9.5. > > In our lab environment and with a 16MiB setting, we saw substantially > better network utilization (almost 2x!), primarily over high bandwidth > delay product links. That's a bit odd - shouldn't the OS network stack take care of this in both cases? I mean either is too big for TCP packets (including jumbo frames). What type of OS and network is involved here? Greetings, Andres Freund
Re: [HACKERS] [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZEto max_wal_send
От
Jonathon Nelson
Дата:
On Thu, Jan 5, 2017 at 1:01 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2017-01-05 12:55:44 -0600, Jonathon Nelson wrote:
> Attached please find a patch for PostgreSQL 9.4 which changes the maximum
> amount of data that the wal sender will send at any point in time from the
> hard-coded value of 128KiB to a user-controllable value up to 16MiB. It has
> been primarily tested under 9.4 but there has been some testing with 9.5.
>
> In our lab environment and with a 16MiB setting, we saw substantially
> better network utilization (almost 2x!), primarily over high bandwidth
> delay product links.
That's a bit odd - shouldn't the OS network stack take care of this in
both cases? I mean either is too big for TCP packets (including jumbo
frames). What type of OS and network is involved here?
In our test lab, we make use of multiple flavors of Linux. No jumbo frames. We simulated anything from 0 to 160ms RTT (with varying degrees of jitter, packet loss, etc.) using tc. Even with everything fairly clean, at 80ms RTT there was a 2x improvement in performance.
--
Re: [HACKERS] [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZEto max_wal_send
От
Kevin Grittner
Дата:
On Thu, Jan 5, 2017 at 7:32 PM, Jonathon Nelson <jdnelson@dyn.com> wrote: > On Thu, Jan 5, 2017 at 1:01 PM, Andres Freund <andres@anarazel.de> wrote: >> On 2017-01-05 12:55:44 -0600, Jonathon Nelson wrote: >>> In our lab environment and with a 16MiB setting, we saw substantially >>> better network utilization (almost 2x!), primarily over high bandwidth >>> delay product links. >> >> That's a bit odd - shouldn't the OS network stack take care of this in >> both cases? I mean either is too big for TCP packets (including jumbo >> frames). What type of OS and network is involved here? > > In our test lab, we make use of multiple flavors of Linux. No jumbo frames. > We simulated anything from 0 to 160ms RTT (with varying degrees of jitter, > packet loss, etc.) using tc. Even with everything fairly clean, at 80ms RTT > there was a 2x improvement in performance. Is there compression and/or encryption being performed by the network layers? My experience with both is that they run faster on bigger chunks of data, and that might happen before the data is broken into packets. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZEto max_wal_send
От
Jonathon Nelson
Дата:
On Fri, Jan 6, 2017 at 8:52 AM, Kevin Grittner <kgrittn@gmail.com> wrote:
On Thu, Jan 5, 2017 at 7:32 PM, Jonathon Nelson <jdnelson@dyn.com> wrote:
> On Thu, Jan 5, 2017 at 1:01 PM, Andres Freund <andres@anarazel.de> wrote:
>> On 2017-01-05 12:55:44 -0600, Jonathon Nelson wrote:
>>> In our lab environment and with a 16MiB setting, we saw substantially
>>> better network utilization (almost 2x!), primarily over high bandwidth
>>> delay product links.
>>
>> That's a bit odd - shouldn't the OS network stack take care of this in
>> both cases? I mean either is too big for TCP packets (including jumbo
>> frames). What type of OS and network is involved here?
>
> In our test lab, we make use of multiple flavors of Linux. No jumbo frames.
> We simulated anything from 0 to 160ms RTT (with varying degrees of jitter,
> packet loss, etc.) using tc. Even with everything fairly clean, at 80ms RTT
> there was a 2x improvement in performance.
Is there compression and/or encryption being performed by the
network layers? My experience with both is that they run faster on
bigger chunks of data, and that might happen before the data is
broken into packets.
There is no compression or encryption. The testing was with and without various forms of hardware offload, etc. but otherwise there is no magic up these sleeves.
--
On 1/5/17 12:55 PM, Jonathon Nelson wrote: > Attached please find a patch for PostgreSQL 9.4 which changes the > maximum amount of data that the wal sender will send at any point in > time from the hard-coded value of 128KiB to a user-controllable value up > to 16MiB. It has been primarily tested under 9.4 but there has been some > testing with 9.5. To make sure this doesn't get lost, please add it to https://commitfest.postgresql.org. Please verify the patch will apply against current HEAD and pass make check-world. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On 5 January 2017 at 19:01, Andres Freund <andres@anarazel.de> wrote: > That's a bit odd - shouldn't the OS network stack take care of this in > both cases? I mean either is too big for TCP packets (including jumbo > frames). What type of OS and network is involved here? 2x may be plausible. The first 128k goes out, then the rest queues up until the first ack comes back. Then the next 128kB goes out again without waiting... I think this is what Nagle is supposed to actually address but either it may be off by default these days or our usage pattern may be defeating it in some way. -- greg
On 8 January 2017 at 17:26, Greg Stark <stark@mit.edu> wrote: > On 5 January 2017 at 19:01, Andres Freund <andres@anarazel.de> wrote: >> That's a bit odd - shouldn't the OS network stack take care of this in >> both cases? I mean either is too big for TCP packets (including jumbo >> frames). What type of OS and network is involved here? > > 2x may be plausible. The first 128k goes out, then the rest queues up > until the first ack comes back. Then the next 128kB goes out again > without waiting... I think this is what Nagle is supposed to actually > address but either it may be off by default these days or our usage > pattern may be defeating it in some way. Hm. That wasn't very clear. And the more I think about it, it's not right. The first block of data -- one byte in the worst case, 128kB in our case -- gets put in the output buffers and since there's nothing stopping it it immediately gets sent out. Then all the subsequent data gets put in output buffers but buffers up due to Nagle. Until there's a full packet of data buffered, the ack arrives, or the timeout expires, at which point the buffered data drains efficiently in full packets. Eventually it all drains away and the next 128kB arrives and is sent out immediately. So most packets are full size with the occasional 128kB packet thrown in whenever the buffer empties. And I think even when the 128kB packet is pending Nagle only stops small packets, not full packets, and the window should allow more than one packet of data to be pending. So, uh, forget what I said. Nagle should be our friend here. I think you should get network dumps and use xplot to understand what's really happening. c.f. https://fasterdata.es.net/assets/Uploads/20131016-TCPDumpTracePlot.pdf -- greg
Re: [HACKERS] [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE to max_wal_send
От
Jonathon Nelson
Дата:
On Sun, Jan 8, 2017 at 11:36 AM, Greg Stark <stark@mit.edu> wrote:
-- On 8 January 2017 at 17:26, Greg Stark <stark@mit.edu> wrote:
> On 5 January 2017 at 19:01, Andres Freund <andres@anarazel.de> wrote:
>> That's a bit odd - shouldn't the OS network stack take care of this in
>> both cases? I mean either is too big for TCP packets (including jumbo
>> frames). What type of OS and network is involved here?
>
> 2x may be plausible. The first 128k goes out, then the rest queues up
> until the first ack comes back. Then the next 128kB goes out again
> without waiting... I think this is what Nagle is supposed to actually
> address but either it may be off by default these days or our usage
> pattern may be defeating it in some way.
Hm. That wasn't very clear. And the more I think about it, it's not right.
The first block of data -- one byte in the worst case, 128kB in our
case -- gets put in the output buffers and since there's nothing
stopping it it immediately gets sent out. Then all the subsequent data
gets put in output buffers but buffers up due to Nagle. Until there's
a full packet of data buffered, the ack arrives, or the timeout
expires, at which point the buffered data drains efficiently in full
packets. Eventually it all drains away and the next 128kB arrives and
is sent out immediately.
So most packets are full size with the occasional 128kB packet thrown
in whenever the buffer empties. And I think even when the 128kB packet
is pending Nagle only stops small packets, not full packets, and the
window should allow more than one packet of data to be pending.
So, uh, forget what I said. Nagle should be our friend here.
[I have not done a rigid analysis, here, but...]
I *think* libpq is the culprit here.
I *think* libpq is the culprit here.
walsender says "Hey, libpq - please send (up to) 128KB of data!" and doesn't "return" until it's "sent". Then it sends more. Regardless of the underlying cause (nagle, tcp congestion control algorithms, umpteen different combos of hardware and settings, etc..) in almost every test I saw improvement (usually quite a bit). This was most easily observable with high bandwidth-delay product links, but my time in the lab is somewhat limited.
I calculated "performance" the most simple measurement possible: how long did it take for Y volume of data to get transferred, performed over a long-enough interval (typically 1800 seconds) for TCP windows to open up, etc...
On Sat, Jan 7, 2017 at 7:48 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 1/5/17 12:55 PM, Jonathon Nelson wrote:Attached please find a patch for PostgreSQL 9.4 which changes the
maximum amount of data that the wal sender will send at any point in
time from the hard-coded value of 128KiB to a user-controllable value up
to 16MiB. It has been primarily tested under 9.4 but there has been some
testing with 9.5.
To make sure this doesn't get lost, please add it to https://commitfest.postgresql.org. Please verify the patch will apply against current HEAD and pass make check-world.
Attached please find a revision of the patch, changed in the following ways:
1. removed a call to debug2.
2. applies cleanly against master (as of 8c5722948e831c1862a39da2bb5d79 3a6f2aabab)
3. one small indentation fix, one small verbiage fix.
4. switched to calculating the upper bound using XLOG_SEG_SIZE rather than hard-coding 16384.
5. the git author is - obviously - different.
make check-world passes.
I have added it to the commitfest.
I have verified with strace that up to 16MB sends are being used.
I have verified that the GUC properly grumps about values greater than XLOG_SEG_SIZE / 1024 or smaller than 4.
--
Jon
Вложения
[HACKERS] Re: [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE tomax_wal_send
От
David Steele
Дата:
On 1/9/17 11:33 PM, Jon Nelson wrote: > > On Sat, Jan 7, 2017 at 7:48 PM, Jim Nasby <Jim.Nasby@bluetreble.com > <mailto:Jim.Nasby@bluetreble.com>> wrote: > > On 1/5/17 12:55 PM, Jonathon Nelson wrote: > > Attached please find a patch for PostgreSQL 9.4 which changes the > maximum amount of data that the wal sender will send at any point in > time from the hard-coded value of 128KiB to a user-controllable > value up > to 16MiB. It has been primarily tested under 9.4 but there has > been some > testing with 9.5. > > > To make sure this doesn't get lost, please add it to > https://commitfest.postgresql.org > <https://commitfest.postgresql.org>. Please verify the patch will > apply against current HEAD and pass make check-world. > > > Attached please find a revision of the patch, changed in the following ways: > > 1. removed a call to debug2. > 2. applies cleanly against master (as of > 8c5722948e831c1862a39da2bb5d793a6f2aabab) > 3. one small indentation fix, one small verbiage fix. > 4. switched to calculating the upper bound using XLOG_SEG_SIZE rather > than hard-coding 16384. > 5. the git author is - obviously - different. > > make check-world passes. > I have added it to the commitfest. > I have verified with strace that up to 16MB sends are being used. > I have verified that the GUC properly grumps about values greater than > XLOG_SEG_SIZE / 1024 or smaller than 4. This patch applies cleanly on cccbdde and compiles. However, documentation in config.sgml is needed. The concept is simple enough though there seems to be some argument about whether or not the patch is necessary. In my experience 128K should be more than large enough for a chunk size, but I'll buy the argument that libpq is acting as a barrier in this case. I'm marking this patch "Waiting on Author" for required documentation. -- -David david@pgmasters.net
On Thu, Mar 16, 2017 at 9:59 AM, David Steele <david@pgmasters.net> wrote:
--
On 1/9/17 11:33 PM, Jon Nelson wrote:
>
> On Sat, Jan 7, 2017 at 7:48 PM, Jim Nasby <Jim.Nasby@bluetreble.com
> <mailto:Jim.Nasby@bluetreble.com>> wrote:
>
> On 1/5/17 12:55 PM, Jonathon Nelson wrote:
>
> Attached please find a patch for PostgreSQL 9.4 which changes the
> maximum amount of data that the wal sender will send at any point in
> time from the hard-coded value of 128KiB to a user-controllable
> value up
> to 16MiB. It has been primarily tested under 9.4 but there has
> been some
> testing with 9.5.
>
>
> To make sure this doesn't get lost, please add it to
> https://commitfest.postgresql.org
> <https://commitfest.postgresql.org>. Please verify the patch will
> apply against current HEAD and pass make check-world.
>
>
> Attached please find a revision of the patch, changed in the following ways:
>
> 1. removed a call to debug2.
> 2. applies cleanly against master (as of
> 8c5722948e831c1862a39da2bb5d793a6f2aabab)
> 3. one small indentation fix, one small verbiage fix.
> 4. switched to calculating the upper bound using XLOG_SEG_SIZE rather
> than hard-coding 16384.
> 5. the git author is - obviously - different.
>
> make check-world passes.
> I have added it to the commitfest.
> I have verified with strace that up to 16MB sends are being used.
> I have verified that the GUC properly grumps about values greater than
> XLOG_SEG_SIZE / 1024 or smaller than 4.
This patch applies cleanly on cccbdde and compiles. However,
documentation in config.sgml is needed.
The concept is simple enough though there seems to be some argument
about whether or not the patch is necessary. In my experience 128K
should be more than large enough for a chunk size, but I'll buy the
argument that libpq is acting as a barrier in this case.
(as
I'm marking this patch "Waiting on Author" for required documentation.
Thank you for testing and the comments. I have some updates:
- I set up a network at home and - in some very quick testing - was unable to observe any obvious performance difference regardless of chunk size
- Before I could get any real testing done, one of the machines I was using for testing died and won't even POST, which has put a damper on said testing (as you might imagine).
- There is a small issue with the patch: a lower-bound of 4 is not appropriate; it should be XLOG_BLCKSZ / 1024 (I can submit an updated patch if that is appropriate)- I am, at this time, unable to replicate the earlier results however I can't rule them out, either.
--
Jon
Re: [HACKERS] [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZEto max_wal_send
От
Robert Haas
Дата:
On Mon, Jan 9, 2017 at 4:27 PM, Jonathon Nelson <jdnelson@dyn.com> wrote: > [I have not done a rigid analysis, here, but...] > > I *think* libpq is the culprit here. > > walsender says "Hey, libpq - please send (up to) 128KB of data!" and doesn't > "return" until it's "sent". Then it sends more. Regardless of the > underlying cause (nagle, tcp congestion control algorithms, umpteen > different combos of hardware and settings, etc..) in almost every test I saw > improvement (usually quite a bit). This was most easily observable with high > bandwidth-delay product links, but my time in the lab is somewhat limited. This seems plausible to me. If it takes X amount of time for the upper layers to put Y amount of data into libpq's buffers, that imposes some limit on overall throughput. I mean, is it not sufficient to know that the performance improvement is happening? If it's happening, there's an explanation for why it's happening. It would be good if somebody else could try to reproduce these results, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
[HACKERS] Re: [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE tomax_wal_send
От
David Steele
Дата:
On 3/16/17 11:53 AM, Jon Nelson wrote: > > > On Thu, Mar 16, 2017 at 9:59 AM, David Steele <david@pgmasters.net > <mailto:david@pgmasters.net>> wrote: > > On 1/9/17 11:33 PM, Jon Nelson wrote: > > > > On Sat, Jan 7, 2017 at 7:48 PM, Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com> > > <mailto:Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>>> wrote: > > > > On 1/5/17 12:55 PM, Jonathon Nelson wrote: > > > > Attached please find a patch for PostgreSQL 9.4 which changes the > > maximum amount of data that the wal sender will send at any point in > > time from the hard-coded value of 128KiB to a user-controllable > > value up > > to 16MiB. It has been primarily tested under 9.4 but there has > > been some > > testing with 9.5. > > > > > > To make sure this doesn't get lost, please add it to > > https://commitfest.postgresql.org <https://commitfest.postgresql.org> > > <https://commitfest.postgresql.org > <https://commitfest.postgresql.org>>. Please verify the patch will > > apply against current HEAD and pass make check-world. > > > > > > Attached please find a revision of the patch, changed in the following ways: > > > > 1. removed a call to debug2. > > 2. applies cleanly against master (as of > > 8c5722948e831c1862a39da2bb5d793a6f2aabab) > > 3. one small indentation fix, one small verbiage fix. > > 4. switched to calculating the upper bound using XLOG_SEG_SIZE rather > > than hard-coding 16384. > > 5. the git author is - obviously - different. > > > > make check-world passes. > > I have added it to the commitfest. > > I have verified with strace that up to 16MB sends are being used. > > I have verified that the GUC properly grumps about values greater than > > XLOG_SEG_SIZE / 1024 or smaller than 4. > > This patch applies cleanly on cccbdde and compiles. However, > documentation in config.sgml is needed. > > The concept is simple enough though there seems to be some argument > about whether or not the patch is necessary. In my experience 128K > should be more than large enough for a chunk size, but I'll buy the > argument that libpq is acting as a barrier in this case. > (as > I'm marking this patch "Waiting on Author" for required documentation. > > > Thank you for testing and the comments. I have some updates: > > - I set up a network at home and - in some very quick testing - was > unable to observe any obvious performance difference regardless of chunk > size > - Before I could get any real testing done, one of the machines I was > using for testing died and won't even POST, which has put a damper on > said testing (as you might imagine). > - There is a small issue with the patch: a lower-bound of 4 is not > appropriate; it should be XLOG_BLCKSZ / 1024 (I can submit an updated > patch if that is appropriate) > - I am, at this time, unable to replicate the earlier results however I > can't rule them out, either. My recommendation is that we mark this patch "Returned with Feedback" to allow you time to test and refine the patch. You can resubmit once it is ready. Thanks, -- -David david@pgmasters.net
[HACKERS] Re: [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE tomax_wal_send
От
David Steele
Дата:
On 3/16/17 11:56 AM, David Steele wrote: > > My recommendation is that we mark this patch "Returned with Feedback" to > allow you time to test and refine the patch. You can resubmit once it > is ready. This submission has been marked "Returned with Feedback". Please feel free to resubmit to a future commitfest. Thanks, -- -David david@pgmasters.net