[AusNOG] Buffers (was Re: Switching Recommendations)

Lincoln Dale ltd at aristanetworks.com
Mon Jul 15 14:23:48 EST 2013


On Mon, Jul 15, 2013 at 11:06 AM, Paul Gear <ausnog at libertysys.com.au>wrote:

>
> Reality is that its not quite that simple, as the switches in question are
> VoQ based, so that buffer that is physically on ingress representing
> queueing on output and is in fact distributed queuing.
> Switch buffers are also never 100% effective utilization either.  (silicon
> stores packets in 'cells' and those are not variable-sized cells.)
>
>
> Pardon my ignorance, but VoQ == Virtual Output Queuing?  Is the Wikipedia
> description more or less accurate?
> https://en.wikipedia.org/wiki/Virtual_Output_Queues
>

Yes, VoQ is Virtual Output Queuing.
The wikipedia article talks about efficiency of a non-buffered crossbar,
which to my mind no crossbar based switch fabric for an ethernet switch has
ever used, so its a purely theoretical discussion of 75% efficiency as the
starting point.
If you were really interested in more of that aspect, there's likely plenty
of article cited by google scholar on "input buffered crossbar switch
fabric efficiency".
Well beyond the interest of most ausnogians i expect. :)



>
>   In the case of the company I work for yes we have done a lot of
> analysis on this, both by having telemetry data of actual buffer queue
> depths in production environments but also testing of various workflows of
> modern applications and traffic flows.
> Its how we determined (for example) to use 2Gbit DDR3-2166 rather than
> 1Gbit DDR3-2166 parts when building said switch.
>
>
> What sort of testing process/software did you use for those sorts of test
> flows?  I'm guessing the licensing costs and the time spent would vastly
> outweigh the budget for switching of a project like the one for which
> Joseph started this thread.
>

we built the model of how a switch works in ns3 (http://www.nsnam.org/)
before we had switch silicon back.  ns3 is very neat for modelling this
kind of thing as you can run 'real apps' with real tcp stacks on a model
and simulate e.g. how a 1000 node hadoop cluster doing its shuffling phase
actually uses TCP and model different switches with different buffers, have
ns3 capture the tcp output in pcap format then feed that through tools
analyzing loss/retransmission etc.


>
>
>  If you were interested in theory/simulation/practice on this, its
> actually something I gave a talk at CAIA (http://caia.swin.edu.au/) last
> year. More than happy to share the slides/content if there is no video
> recording of it.
>
>
> It appears they don't keep much at all:
> http://caia.swin.edu.au/seminars/details/120223A.html  Share away! :-)
>

https://arista.box.com/s/52b8c0585034ced70514

>
>
>  How does one determine the optimal buffer size (and hence switch
>> selection) for a particular environment?  Is there a rule of thumb based on
>> the bandwidth of the link and the maximum time one wants a packet to
>> queue?  (And how would one even determine what this maximum might be?  I
>> would think that it varies depending upon the application.)  I guess this
>> paragraph's questions are mostly directed to Greg & Lincoln - in the cases
>> you've mentioned, how did you know that small buffers were the problem?
>>
>
> Unfortunately most ethernet switches simply haven't provided the telemetry
> to indicate how close they are to running out of buffers until its too late
> and you've fallen off the cliff and have large numbers of drops going on.
> (its actually worse than this: many switches don't even provide accurate
> drop counters when they are dropping packets)
> Historically many switches had 'dedicated' buffers/port and didn't have
> overflow/shared pools for dealing with
>
> Even if accurate drop counters are available, how many people actually
> monitor those?
>
> Thing about TCP is it still works even if you have drops. Just for many
> environments "working" isn't good enough, they want "working well."  The
> DataSift blog I pointed to is a good example.
>
>
> My fear with all of this as that for most environments working *is* good
> enough, and working well costs more than they are prepared to spend.
>
>
>  The short answer is that there is no single 'right' answer for what is
> appropriate.  It depends on the traffic load, distribution and what the
> underlying apps/hosts/servers are doing.  Which may change over time too.
>
>
> Are there any quick wins available to the budget end of town?  Are there
> rules of thumb or quick estimates that can help determine if it's an issue?
>

Cost that people pay for equipment doesn't necessarily equate to quality. :)


cheers,

lincoln.


>
> Paul
>
>
> _______________________________________________
> AusNOG mailing list
> AusNOG at lists.ausnog.net
> http://lists.ausnog.net/mailman/listinfo/ausnog
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ausnog.net/pipermail/ausnog/attachments/20130715/40a74d44/attachment.html>


More information about the AusNOG mailing list