[AusNOG] High availability options for terminating point-to-point Ethernet (on Cisco CE)

Fri May 26 10:56:14 EST 2017

I'm surprised that everyone's default answer is basically "Don't worry
about the hardware, the network is the most likely thing to fail". I
totally get that and agree. But in a carrier environment you want to be
able to honestly say to customers "we're full redundant". If a
point-to-point ethernet service terminates on a single piece of hardware
then you can't really make that statement. How are the bigger carriers
handling this? I'm especially interested in this as it relates to a Cisco
environment. At what level and what cost can you have a true HA solution?

On Fri, May 26, 2017 at 10:21 AM, Paul Holmanskikh <ausnog at pkholm.com>
wrote:

> HI,
>
> ASR seamless fail-over is not as seamless as it marketed.  There are lots
> of caveats.  For PE redundancy we just run two BGP sessions between CE and
> two different PE.  But PE is hardly a weakest link, services usually fails
> due to access link.
> ---
> NEXON - I.T. FOR THE DYNAMIC BUSINESS
> Paul Holmanskikh
> Senior Network Engineer
>
> Disclaimer: The contents of this email represent my own views and not
> necessarily the views of my employer
>
>
> On 25/05/2017 21:13, Ryan Tucker wrote:
>
> I'd be interested in an answer to this as well.
>
> The ASR1006 apparently does multiple physical route processors with fast
> failover for seemingly this purpose, but I'm not aware of anything
> smaller/cheaper/more vendor agnostic (and VRRP just doesn't scale to "many"
> interfaces as mentioned above).
>
>
> On Thu, 25 May 2017 at 21:05 Sam Silvester <sam.silvester at gmail.com>
> wrote:
>
>> Doesn't give you a specific answer so apologies if not useful to your
>> situation but in past teams I've seen the following kind of things done.
>>
>> - We matched the customer SLA to the 'lowest common denominator' of the
>> access link, or the aggregation router (generally we had 24x7x4 hour
>> hardware replacement, so we doubled that to give time to install and
>> reconfigure e.g. 8 hours restoration ETA). Often there was a switching
>> layer between the assorted backhaul providers and the aggregation PE so the
>> option also existed to re-provision customers but that was never really
>> something we planned to do.
>>
>> - We ran multiple boxes, so we spread the impact of hardware outages (and
>> upgrades). If a customer wanted higher availability, we provisioned them
>> two links on two different aggregation boxes and ran HSRP or BGP sessions
>> with them.
>>
>> Single boxes failing wasn't something that kept me up at night to be
>> honest, it's empirical but we had more failures with backhaul providers and
>> customer premises losing power than we ever had routers shit themselves in
>> either a hardware or software fashion. We tended to not run lots of
>> complicated features on the one box, again we tended to build out at least
>> a pair of aggregation edge devices for each type of service (PPP,
>> colocation, business services etc)
>>
>>
>> Sam
>>
>>
>> On Thu, May 25, 2017 at 8:21 PM, Matt Selbst <matt.j.selbst at gmail.com>
>> wrote:
>>
>>> Yes indeed I'm talking about the aggregation router failing.
>>>
>>> Perhaps clustering multiple chassis although I don't know any Cisco agg
>>> routers that can do that.
>>>
>>>
>>>
>>> On Thu, May 25, 2017 at 8:46 PM, Sam Silvester <sam.silvester at gmail.com>
>>> wrote:
>>>
>>>> Hi Matt,
>>>>
>>>> On Thu, May 25, 2017 at 8:05 PM, Matt Selbst <matt.j.selbst at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Hoping for some advice. What is everyone doing for
>>>>> terminating point-to-point Ethernet services like AAPT's e-Line in a high
>>>>> availability environment? Cisco environment.
>>>>>
>>>>> With PPPoE, high availability was much easier as you could just have
>>>>> multiple LNS's and failover easily when the client would re-auth. With
>>>>> terminating a VLAN handoff on a /30 or /31 it makes HA much harder. If the
>>>>> customer edge router dies, failover seems pretty hard. VRRP doesn't seem to
>>>>> be an option especially with hundreds of customer sub-interfaces.
>>>>>
>>>>>
>>>> Do you mean HA on the customer side or on your side?
>>>>
>>>> e.g. I assume you mean you want to protect against when your
>>>> aggregation router dies, as obviously the P2P Ethernet service is kind of a
>>>> single point of failure in and of itself, as is the CPE...
>>>>
>>> _______________________________________________
>> AusNOG mailing list
>> AusNOG at lists.ausnog.net
>> http://lists.ausnog.net/mailman/listinfo/ausnog
>
>
> _______________________________________________
> AusNOG mailing list
> AusNOG at lists.ausnog.net
> http://lists.ausnog.net/mailman/listinfo/ausnog
>
>
> _______________________________________________
> AusNOG mailing list
> AusNOG at lists.ausnog.net
> http://lists.ausnog.net/mailman/listinfo/ausnog
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ausnog.net/pipermail/ausnog/attachments/20170526/590d1c42/attachment.html>