[AusNOG] QoS on Internet traffic

Fri Aug 18 14:46:28 EST 2017

Being an advocate of peering, I tend to agree with Mark on this one, the cloud services providers make themselves very accessible by peering with open policies (most of the time). 

I'd suggest you might want to find out exactly what your customers are accessing and look at ways of increasing your level of connectivity with them. 

Regards,

Tim Raphael

> On 18 Aug 2017, at 11:08 am, Mark Newton <newton at atdot.dotat.org> wrote:
> 
> It seems to me that this is a problem you’ve created for yourself, by limiting the firewall outside interface to (in your example) 50 Mbps.
> 
> I think you should go back to basics with your product definition: Is what you’re selling fit for purpose? Is a VPN service which is bottlenecked into the cloud an appropriate service offering for 2017?
> 
> If what you’re describing is “a typical example,” then maybe it isn’t an appropriate service offering, and the reason you’re feeling pain is because your business is being disrupted and you haven’t realized it yet. I note that none of the options you’ve considered involve “removing bandwidth limits on the firewall,” yet perhaps that’s what your customers are indicating they require?
> 
> Peering with the big cloud providers is cheap and easy. If you’re reaching them over costly transit, perhaps there are some opportunities to rearchitect your own network so that uncongested access to cloud at full-rate is feasible.
> 
> Ask yourself what your customers want; then design something sustainable that fulfills that need, then price it accordingly. 
> 
>   - mark
> 
> 
> 
>> On Aug 15, 2017, at 10:14 AM, Tony Miles <tmiles42 at gmail.com> wrote:
>> 
>> Hi all,
>> 
>> 
>> I'm not sure if anyone else is having this issue, but we are recieving an increasing number of request to give priority/preference to specific Internet traffic.
>> 
>> Apologies in advance for the lengthy post.
>> 
>> The typical example might be a customer that has five sites that we provide a 20Mbps private WAN tail into (per site) and then we have a centralised hosted firewall that all sites access the internet via. The speed on the central firewall might be capped to something like 50M (all abbreviations using "M" refer to "Mbps" hereafter). The WAN we provide supports QoS so that if a client has an application that is important to them it can be tagged and put in an appropriate queue and treated accordingly. Examples of this might be that they have an RDP server at the head office site or they have VoIP PBX gear at each location. The central Internet access is oversubscriber 2:1 in this example (100M of WAN tails on 50M of Internet). At this point I think this is all fairly standard stuff that a lot of the people on this list would be familiar with (hopefully?). When I am using this example, it is just an example, this is of course multiplied by the number of clients we have, who are all generically fairly similar, but with each one having different specific details (different speeds, different things they consider important).
>> 
>> With the move to cloud everything clients are moving from hosting stuff themselves (ie. on their own servers/WAN) to things that are hosted generically on the Internet. This might be their accounting application, might be video conferencing or voip services or any number of other things that for whatever reason they have chosen to procure "as a service" rather than buying the thing and hosting it locally on premises.
>> 
>> When everything is running normally and there is no excess volume of traffic nobody complains, but the first time $someone_important is on a video conference call to an interstate office and the quality is crap because Windows updates are sucking all of the Internet bandwidth the question then becomes "please fix this, we purchase a WAN with QoS". The VC one is particularly nasty because the conference bridge is in the cloud and so a VC session between three locations that are all on the same private WAN (with potentially plenty of bandwidth) is effectively 3x VC session to the Internet.
>> 
>> Historically our answer has been "it's the Internet, there is no QoS", which has sufficed for a while, but it's gotten to the stage where EVERYTHING is now "in the cloud" and that answer is slowly losing traction. This combined with the fact that others out there are promising (rightly or wrongly) that they can solve the problem for the client and we can continue to ignore it at our peril.
>> 
>> I should probably add that we DO provide on-net VoIP & VC services for clients that we can (and do) support properly with QoS but clients are free to use or not use them as they wish and there are any number of reasons why they might choose a different Internet based provider of these services (price, features, integration, historical, etc). There is also the whole range of other hosted applications that a client might want to access that we don't host internally and can't get some sort of cross connect or other arrangement in place to bring the traffic in via something other than Internet transit.
>>  
>> Our Internet topology is like this (arrows indicating inbound/downstream traffic flow):
>> 
>> [$transit_provider] ---> [border router] ---> [core router] ---> [firewall] ---> {private WAN}
>> 
>> 
>> Right now we shape outbound/egress on the core router towards the firewall to the speed that is purchased by the client (eg. in above example 50M). It makes no difference what sort of policy we apply, right now it's just a plain "shape default queue to x". We COULD in theory apply a proper QoS policy that puts stuff in queues and provides the required bandwidth to those queues. The only thing preventing this is the classification of the traffic (ie. how to decide what goes in each queue). To do this effectively would (I imagine) require something that can do L7 inspection of traffic to see that something is "https://important_site.com" and apply appropriate DSCP marking to the packets. This is of course something that our core routers can not do (L7 classification).
>> 
>> Options that I've considered:
>> 
>> 1. Continue with "Internet => no QoS" - the whole point of this post is that this position is becoming less viable as everything moves to being "cloud based" or as we like to call it "Internet hosted". We can continue this stance at our own peril, but we all know that it is 10x easier to retain existing clients than try and find new ones so to retain existing clients.
>> 
>> 2. increase bandwidth to the firewalls - in the above example the firewall bandwidth is 50M and the total of the WAN tails is 100M. We could (ignoring the screams coming from the accountants for now) simply increase the bandwidth to each firewall so that there is no longer any oversubscription (eg. 100M in my example). This wouldn't solve the problem however as the entirety of the bandwidth to the firewall could still be consumed and not enough left for the "important" things. All we've done is give the clients more Internet bandwidth, but not actually solved the problem. It also doesn't help if there is WAN congestion between the sites as all Internet traffic is still going to be treated equally in the case of congestion.
>> 
>> 3. Not shape/police to the firewall - instead use a firewall that can classify traffic and shape/queue outbound on it's LAN interface (ie. towards the private WAN cloud). This seems attractive in the first instance, but there are a couple of things going against it. The first is that a lot of the firewalls are provided as managed firewalls by us and so we control them, BUT a number of clients (mostly the larger ones with their own IT resources) have their own firewall (hosted in our racks) that they manage. Telling clients that they are required to shape their firewall to <speed> and not shaping it for them (upstream) seems like a very trusting thing to do and I don't think that would go well (surely nobody would abuse it ?!). The way of preventing the abuse is simplt to police inbound on the core router the LAN of the firewall is connected to, so that if client doesn't shape to (eg.) 50M, then it gets policed to 50M anyway and their QoS becomes broken by the policer.
>> 
>> 4. Find some device to classify traffic - ideally if we could stick a device of some sort between the border routers and core routers that could do L7 calssification of traffic and tag DSCP appropriately then we could do what we need without too many other changes. Does such a "thing" exist ? Can anyone point me in the direction of something that would do this ?
>> 
>> 
>> Having the traffic classified and tagged (DSCP) is the ideal solution as this then allows the QoS on the WAN portion to work as well. No point eliminating the firewall/Internet as the problem only to have the VC session be crappy because there is a file transfer happening between two sites.
>> 
>> 
>> Talking about firewalls, can anyone recommand a firewall that do what is required for option #3 above. Need something that can classify traffic, tag DSCP on it and then shape/queue outbound on the LAN interface appropriately. Needs to be a VM device or something the supports proper virtualisation for separate individual clients properly (and can manage clients individually as well). This possibly seems like it might be the best option if we can find the appropriate platform to do what we require that fits all of the other requirements as well.
>> 
>> 
>> I think that's all I've got for now. Thanks for your patience in even reading this far. Happy to discuss privately with people if you don't want to post something publicly.
>> 
>> 
>> Thanks again,
>> Tony.
>> 
>> _______________________________________________
>> AusNOG mailing list
>> AusNOG at lists.ausnog.net
>> http://lists.ausnog.net/mailman/listinfo/ausnog
> 
> _______________________________________________
> AusNOG mailing list
> AusNOG at lists.ausnog.net
> http://lists.ausnog.net/mailman/listinfo/ausnog
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ausnog.net/pipermail/ausnog/attachments/20170818/f51e6dc0/attachment.html>