On Mon, Jul 15, 2013 at 9:29 AM, Paul Gear <span dir="ltr"><<a href="mailto:ausnog@libertysys.com.au" target="_blank">ausnog@libertysys.com.au</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br><div text="#000000" bgcolor="#FFFFFF"><blockquote type="cite"><div class="gmail_quote"><div>

          Most 1GE switches have anemic buffers which results in

          less-than-stellar performance if you drive them hard, have

          bursts or incast traffic.<br>

          Alas, this doesn't even figure in most people's

          knowledge/requests when it comes it networking.<br>

          <br>

          A good example of the issue you've described is at <<a href="http://dev.datasift.com/blog/big-data-bigger-networking" target="_blank">http://dev.datasift.com/blog/big-data-bigger-networking</a>>

          and <<a href="http://dev.datasift.com/blog" target="_blank">http://dev.datasift.com/blog</a>><br>

        </div>

      </div>

    </blockquote>

    <br>

    Just curious: when/where does one typically draw the line between

    big buffers being required, and big buffers causing latency issues

    due to buffer bloat?  The information i've read suggests that buffer

    bloat is not only caused by large buffers on edge routers, but at

    many points in the network.<br></div></blockquote><div><br>"Buffer bloat" is most often seen at PE <-> CE where the speed is relatively slow such as the uplink of a ADSL tail.<br>Best example of this would be uploading some large files or doing a 'scp' across said connection where it opens up large tcp windows but then hogs all available capacity.<br>

<br>In this case, its not uncommon to see it buffering 512KB+ of data.  Lets say your ADSL was 10/1, its that 1 Mbps upstream that is the issue. <br><br>ADSL at 1Mbps upstream equates to ~120KB/sec of usable upstream capacity.  512KB of buffering = just over 4 seconds of buffering.  THAT is buffer bloat.<br>

<br>100MB of buffer on a 10G link is only 10msec of buffering.  Hardly 'bloat'. :)<br><br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF">


    <br>

    Conventional wisdom on the one hand says that for high-volume

    environments (iSCSI storage is a typical example; high-bandwidth

    international links might be another - please correct me if i'm

    wrong), more buffers is better.  </div></blockquote><div><br>Indeed, these two environments have very different RTT characteristics.<br><br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div text="#000000" bgcolor="#FFFFFF">On a recent Packet Pushers show

    where Arista were talking about their new switches, they pointed out

    that their buffers seemed overly large, but at the high bandwidths

    they were serving, this was only 250 ms or thereabouts (my memory is

    a bit hazy, but i think it was about 512 MB per 10 Gbps port).  </div></blockquote><div><br>The podcast you're talking about was about switches that have ~125MB / 10G port, 4x/10x that for 40G/100G ports.. In all cases if you actually had something consuming all those buffers, its ~12msec.<br>

<br>Reality is that its not quite that simple, as the switches in question are VoQ based, so that buffer that is physically on ingress representing queueing on output and is in fact distributed queuing.<br>Switch buffers are also never 100% effective utilization either.  (silicon stores packets in 'cells' and those are not variable-sized cells.)<br>

<br>I could talk for days in this topic having done all analysis and simulation on the 'right' about of buffer on switches but suffice to say what is 'right' depends on the place in the network and # of simultaneous TCP flows going thru the box and degree of incast/oversubscription.<br>

<br>In the case of the company I work for yes we have done a lot of analysis on this, both by having telemetry data of actual buffer queue depths in production environments but also testing of various workflows of modern applications and traffic flows.<br>

Its how we determined (for example) to use 2Gbit DDR3-2166 rather than 1Gbit DDR3-2166 parts when building said switch.<br><br>If you were interested in theory/simulation/practice on this, its actually something I gave a talk at CAIA (<a href="http://caia.swin.edu.au/">http://caia.swin.edu.au/</a>) last year. More than happy to share the slides/content if there is no video recording of it.<br>

<br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF">How does one determine the optimal buffer size (and hence switch

    selection) for a particular environment?  Is there a rule of thumb

    based on the bandwidth of the link and the maximum time one wants a

    packet to queue?  (And how would one even determine what this

    maximum might be?  I would think that it varies depending upon the

    application.)  I guess this paragraph's questions are mostly

    directed to Greg & Lincoln - in the cases you've mentioned, how

    did you know that small buffers were the problem?<br></div></blockquote><div><br>Unfortunately most ethernet switches simply haven't provided the telemetry to indicate how close they are to running out of buffers until its too late and you've fallen off the cliff and have large numbers of drops going on.<br>

(its actually worse than this: many switches don't even provide accurate drop counters when they are dropping packets)<br>Historically many switches had 'dedicated' buffers/port and didn't have overflow/shared pools for dealing with <br>

<br>Even if accurate drop counters are available, how many people actually monitor those?<br><br>Thing about TCP is it still works even if you have drops. Just for many environments "working" isn't good enough, they want "working well."  The DataSift blog I pointed to is a good example.<br>

<br><br>The short answer is that there is no single 'right' answer for what is appropriate.  It depends on the traffic load, distribution and what the underlying apps/hosts/servers are doing.  Which may change over time too.<br>

<br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF">

    <br>

    If this is something well-covered in the literature, please feel

    free to point me in that direction.<br></div></blockquote><div><br>There isn't much good literature unfortunately. <br><br><br>cheers,<br><br>lincoln.<br></div></div>