[AusNOG] High availability options for terminating point-to-point Ethernet (on Cisco CE)

Tue Jun 13 14:54:13 EST 2017

I know I’m a bit late to the conversation here, and this is less relevant to the OP than it is to Dave, but I just wanted to point out that this is kind of what NBN do, at least as far as I understand it.

Quote from the product tech spec:
6.5.1.6.3 Customer Network Restrictions
All service frames exiting the NNI (i.e. from the NBN Co Network to the Customer Network through the NNI) must traverse an IP device before being injected back into the NBN Co Network. This is necessary to avoid CPE MAC addresses from appearing as source addresses on traffic ingress to the NNI. This operating restriction must be observed by Customer even if service frames are being switched between VLANs or forwarded via other service provider networks.

My understanding of this is that any MAC address learned from an NTD/DSLAM can never be seen incoming from the NNI. In their case, I think they shut down the NNI if they see such a MAC address.

Dave – I think your solution is more elegant, in that it won’t drop services if there is an issue, but the overhead is probably too high for a setup on the scale of NBN. However I know at least 2 common peering networks used in Australia that limit peers to a single MAC address each (possibly more on negotiation), which is more similar to what you describe. So it’s been done before, quite successfully as far as I’m aware, so you’re probably heading down a good path.

From: AusNOG [mailto:ausnog-bounces at lists.ausnog.net] On Behalf Of David Smith
Sent: Friday, 26 May 2017 4:25 PM
To: Michael J. Carmody <michael at opusv.com.au>; Sam Silvester <sam.silvester at gmail.com>; AusNOG at lists.ausnog.net
Subject: Re: [AusNOG] High availability options for terminating point-to-point Ethernet (on Cisco CE)

A timely discussion, I was starting plan for this myself.

We provide wholesale L2 access for RSPs to customer ports via (state based) NNI. No one has asked for a redundant setup  as yet but it is only a matter of time and I’d be far more comfortable having two physical handoff devices anyway.

Given that my customer (i.e. RSP) is generally selling an internet service with one or more aggregation routers on their side, I was planning on putting an ingress and egress MAC ACL on the NNI and adding the MACs of the RSP aggregation servers such that I will only accept ingress packets with src-MAC of the RSP routers and allow any src-MAC except the RSP router to egress.

Given that no src-MAC can both enter and leave my network, there shouldn’t be a loop possible under normal operation. Well, that’s my hope anyway.

Regards,

Dave Smith

Chief Engineer

_____________________________________________________________________

1 / 171 Victoria Road, Gladesville NSW 2111
p. 61 2 9719 0900

[logo_new_LBNCO]

e. davids at lbnco.com.au<mailto:username at lbnco.com.au> w. www.lbnco.com.au<http://www.lbnco.com.au/>

From: AusNOG [mailto:ausnog-bounces at lists.ausnog.net<mailto:ausnog-bounces at lists.ausnog.net>] On Behalf Of Michael J. Carmody
Sent: Friday, 26 May 2017 3:55 PM
To: Sam Silvester; AusNOG at lists.ausnog.net<mailto:AusNOG at lists.ausnog.net>
Subject: Re: [AusNOG] High availability options for terminating point-to-point Ethernet (on Cisco CE)

More of a general sense, we get Layer 2 handoff as VLAN’s at POI’s from PIPE/AAPT/Vocus/Amcom/Intellipath/Megaport.

I just want two of them for redundancy.

Again assuming network as weakest point, is not our issue here, I just want to handle switch failure at my end. So I want 2 x POI’s going to 2 different switches, with some dump as hell loop prevention as braindead as (R)STP in place.

Am I being too KISS here?

-Michael

From: Sam Silvester [mailto:sam.silvester at gmail.com]
Sent: Friday, 26 May 2017 3:48 PM
To: Michael J. Carmody <michael at opusv.com.au<mailto:michael at opusv.com.au>>; AusNOG at lists.ausnog.net<mailto:AusNOG at lists.ausnog.net>
Subject: Re: [AusNOG] High availability options for terminating point-to-point Ethernet (on Cisco CE)

Idle curiousity - what's wrong with Layer 3 redundancy & why would you want L2 spanning sites instead?

How would you propose to handle loop prevention between the wholesaler and yourself?

On Friday, 26 May 2017, Michael J. Carmody <michael at opusv.com.au<mailto:michael at opusv.com.au>> wrote:
I always wanted to have duplicate POI’s and have the layer-2 VLAN appear on both of them, then just different switches for each POI.

This though is a product feature I have never been able to find.

Fear of loops from the wholesaler?

-Michael

From: AusNOG [mailto:ausnog-bounces at lists.ausnog.net<javascript:_e(%7B%7D,'cvml','ausnog-bounces at lists.ausnog.net');>] On Behalf Of Matt Selbst
Sent: Friday, 26 May 2017 10:56 AM
To: Paul Holmanskikh <ausnog at pkholm.com<javascript:_e(%7B%7D,'cvml','ausnog at pkholm.com');>>
Cc: AusNOG at lists.ausnog.net<javascript:_e(%7B%7D,'cvml','AusNOG at lists.ausnog.net');>
Subject: Re: [AusNOG] High availability options for terminating point-to-point Ethernet (on Cisco CE)

I'm surprised that everyone's default answer is basically "Don't worry about the hardware, the network is the most likely thing to fail". I totally get that and agree. But in a carrier environment you want to be able to honestly say to customers "we're full redundant". If a point-to-point ethernet service terminates on a single piece of hardware then you can't really make that statement. How are the bigger carriers handling this? I'm especially interested in this as it relates to a Cisco environment. At what level and what cost can you have a true HA solution?

On Fri, May 26, 2017 at 10:21 AM, Paul Holmanskikh <ausnog at pkholm.com<javascript:_e(%7B%7D,'cvml','ausnog at pkholm.com');>> wrote:

HI,

ASR seamless fail-over is not as seamless as it marketed.  There are lots of caveats.  For PE redundancy we just run two BGP sessions between CE and two different PE.  But PE is hardly a weakest link, services usually fails due to access link.
---
NEXON - I.T. FOR THE DYNAMIC BUSINESS
Paul Holmanskikh
Senior Network Engineer

Disclaimer: The contents of this email represent my own views and not necessarily the views of my employer

On 25/05/2017 21:13, Ryan Tucker wrote:
I'd be interested in an answer to this as well.

The ASR1006 apparently does multiple physical route processors with fast failover for seemingly this purpose, but I'm not aware of anything smaller/cheaper/more vendor agnostic (and VRRP just doesn't scale to "many" interfaces as mentioned above).

On Thu, 25 May 2017 at 21:05 Sam Silvester <sam.silvester at gmail.com<javascript:_e(%7B%7D,'cvml','sam.silvester at gmail.com');>> wrote:
Doesn't give you a specific answer so apologies if not useful to your situation but in past teams I've seen the following kind of things done.

- We matched the customer SLA to the 'lowest common denominator' of the access link, or the aggregation router (generally we had 24x7x4 hour hardware replacement, so we doubled that to give time to install and reconfigure e.g. 8 hours restoration ETA). Often there was a switching layer between the assorted backhaul providers and the aggregation PE so the option also existed to re-provision customers but that was never really something we planned to do.

- We ran multiple boxes, so we spread the impact of hardware outages (and upgrades). If a customer wanted higher availability, we provisioned them two links on two different aggregation boxes and ran HSRP or BGP sessions with them.

Single boxes failing wasn't something that kept me up at night to be honest, it's empirical but we had more failures with backhaul providers and customer premises losing power than we ever had routers shit themselves in either a hardware or software fashion. We tended to not run lots of complicated features on the one box, again we tended to build out at least a pair of aggregation edge devices for each type of service (PPP, colocation, business services etc)

Sam
On Thu, May 25, 2017 at 8:21 PM, Matt Selbst <matt.j.selbst at gmail.com<javascript:_e(%7B%7D,'cvml','matt.j.selbst at gmail.com');>> wrote:
Yes indeed I'm talking about the aggregation router failing.

Perhaps clustering multiple chassis although I don't know any Cisco agg routers that can do that.

On Thu, May 25, 2017 at 8:46 PM, Sam Silvester <sam.silvester at gmail.com<javascript:_e(%7B%7D,'cvml','sam.silvester at gmail.com');>> wrote:
Hi Matt,

On Thu, May 25, 2017 at 8:05 PM, Matt Selbst <matt.j.selbst at gmail.com<javascript:_e(%7B%7D,'cvml','matt.j.selbst at gmail.com');>> wrote:
Hi,

Hoping for some advice. What is everyone doing for terminating point-to-point Ethernet services like AAPT's e-Line in a high availability environment? Cisco environment.

With PPPoE, high availability was much easier as you could just have multiple LNS's and failover easily when the client would re-auth. With terminating a VLAN handoff on a /30 or /31 it makes HA much harder. If the customer edge router dies, failover seems pretty hard. VRRP doesn't seem to be an option especially with hundreds of customer sub-interfaces.

Do you mean HA on the customer side or on your side?

e.g. I assume you mean you want to protect against when your aggregation router dies, as obviously the P2P Ethernet service is kind of a single point of failure in and of itself, as is the CPE...
_______________________________________________
AusNOG mailing list
AusNOG at lists.ausnog.net<javascript:_e(%7B%7D,'cvml','AusNOG at lists.ausnog.net');>
http://lists.ausnog.net/mailman/listinfo/ausnog

_______________________________________________
AusNOG mailing list
AusNOG at lists.ausnog.net<javascript:_e(%7B%7D,'cvml','AusNOG at lists.ausnog.net');>
http://lists.ausnog.net/mailman/listinfo/ausnog

_______________________________________________
AusNOG mailing list
AusNOG at lists.ausnog.net<javascript:_e(%7B%7D,'cvml','AusNOG at lists.ausnog.net');>
http://lists.ausnog.net/mailman/listinfo/ausnog

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ausnog.net/pipermail/ausnog/attachments/20170613/c76e4bec/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 48141 bytes
Desc: image001.jpg
URL: <http://lists.ausnog.net/pipermail/ausnog/attachments/20170613/c76e4bec/attachment-0001.jpg>