On Mon, Jul 15, 2013 at 11:06 AM, Paul Gear <span dir="ltr"><<a href="mailto:ausnog@libertysys.com.au" target="_blank">ausnog@libertysys.com.au</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br><div text="#000000" bgcolor="#FFFFFF"><blockquote type="cite"><div class="gmail_quote"><div class="im"><div>

          Reality is that its not quite that simple, as the switches in

          question are VoQ based, so that buffer that is physically on

          ingress representing queueing on output and is in fact

          distributed queuing.<br>

          Switch buffers are also never 100% effective utilization

          either.  (silicon stores packets in 'cells' and those are not

          variable-sized cells.)<br>

        </div>

      </div></div>

    </blockquote>

    <br>

    Pardon my ignorance, but VoQ == Virtual Output Queuing?  Is the

    Wikipedia description more or less accurate?

    <a href="https://en.wikipedia.org/wiki/Virtual_Output_Queues" target="_blank">https://en.wikipedia.org/wiki/Virtual_Output_Queues</a></div></blockquote><div><br>Yes, VoQ is Virtual Output Queuing.<br>The wikipedia article talks about efficiency of a non-buffered crossbar, which to my mind no crossbar based switch fabric for an ethernet switch has ever used, so its a purely theoretical discussion of 75% efficiency as the starting point.<br>

If you were really interested in more of that aspect, there's likely plenty of article cited by google scholar on "input buffered crossbar switch fabric efficiency".<br>Well beyond the interest of most ausnogians i expect. :)<br>

  <br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><div class="im">

    <br>

    <blockquote type="cite">

      <div class="gmail_quote">

        <div>

          In the case of the company I work for yes we have done a lot

          of analysis on this, both by having telemetry data of actual

          buffer queue depths in production environments but also

          testing of various workflows of modern applications and

          traffic flows.<br>

          Its how we determined (for example) to use 2Gbit DDR3-2166

          rather than 1Gbit DDR3-2166 parts when building said switch.<br>

        </div>

      </div>

    </blockquote>

    <br></div>

    What sort of testing process/software did you use for those sorts of

    test flows?  I'm guessing the licensing costs and the time spent

    would vastly outweigh the budget for switching of a project like the

    one for which Joseph started this thread.</div></blockquote><div><br>we built the model of how a switch works in ns3 (<a href="http://www.nsnam.org/">http://www.nsnam.org/</a>) before we had switch silicon back.  ns3 is very neat for modelling this kind of thing as you can run 'real apps' with real tcp stacks on a model and simulate e.g. how a 1000 node hadoop cluster doing its shuffling phase actually uses TCP and model different switches with different buffers, have ns3 capture the tcp output in pcap format then feed that through tools analyzing loss/retransmission etc.<br>

 <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><div class="im"><br>

    <br>

    <blockquote type="cite">

      <div class="gmail_quote">

        <div>If you were interested in theory/simulation/practice on

          this, its actually something I gave a talk at CAIA (<a href="http://caia.swin.edu.au/" target="_blank">http://caia.swin.edu.au/</a>)

          last year. More than happy to share the slides/content if

          there is no video recording of it.<br>

        </div>

      </div>

    </blockquote>

    <br></div>

    It appears they don't keep much at all:

    <a href="http://caia.swin.edu.au/seminars/details/120223A.html" target="_blank">http://caia.swin.edu.au/seminars/details/120223A.html</a>  Share away! <span><span> :-) </span></span><br></div></blockquote><div><br><a href="https://arista.box.com/s/52b8c0585034ced70514">https://arista.box.com/s/52b8c0585034ced70514</a> <br>

</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><div class="im">

    <br>

    <blockquote type="cite">

      <div class="gmail_quote">

        <div>

          <br>

        </div>

        <blockquote class="gmail_quote">

          <div>How does one determine the optimal buffer size (and hence

            switch selection) for a particular environment?  Is there a

            rule of thumb based on the bandwidth of the link and the

            maximum time one wants a packet to queue?  (And how would

            one even determine what this maximum might be?  I would

            think that it varies depending upon the application.)  I

            guess this paragraph's questions are mostly directed to Greg

            & Lincoln - in the cases you've mentioned, how did you

            know that small buffers were the problem?<br>

          </div>

        </blockquote>

        <div><br>

          Unfortunately most ethernet switches simply haven't provided

          the telemetry to indicate how close they are to running out of

          buffers until its too late and you've fallen off the cliff and

          have large numbers of drops going on.<br>

          (its actually worse than this: many switches don't even

          provide accurate drop counters when they are dropping packets)<br>

          Historically many switches had 'dedicated' buffers/port and

          didn't have overflow/shared pools for dealing with <br>

          <br>

          Even if accurate drop counters are available, how many people

          actually monitor those?<br>

          <br>

          Thing about TCP is it still works even if you have drops. Just

          for many environments "working" isn't good enough, they want

          "working well."  The DataSift blog I pointed to is a good

          example.<br>

        </div>

      </div>

    </blockquote>

    <br></div>

    My fear with all of this as that for most environments working <i>is</i>

    good enough, and working well costs more than they are prepared to

    spend.<div class="im"><br>

    <br>

    <blockquote type="cite">

      <div class="gmail_quote">

        <div>The short answer is that there is no single 'right' answer

          for what is appropriate.  It depends on the traffic load,

          distribution and what the underlying apps/hosts/servers are

          doing.  Which may change over time too.<br>

        </div>

      </div>

    </blockquote>

    <br></div>

    Are there any quick wins available to the budget end of town?  Are

    there rules of thumb or quick estimates that can help determine if

    it's an issue?<span class="HOEnZb"><font color="#888888"><br></font></span></div></blockquote><div><br>Cost that people pay for equipment doesn't necessarily equate to quality. :)<br><br><br>cheers,<br><br>lincoln.<br>

 </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><span class="HOEnZb"><font color="#888888">

    <br>

    Paul<br>

    <br>

  </font></span></div>

<br>_______________________________________________<br>

AusNOG mailing list<br>

<a href="mailto:AusNOG@lists.ausnog.net">AusNOG@lists.ausnog.net</a><br>

<a href="http://lists.ausnog.net/mailman/listinfo/ausnog" target="_blank">http://lists.ausnog.net/mailman/listinfo/ausnog</a><br>

<br></blockquote></div><br>