ECMP for dual ISP internet access: going down the rabbit hole

Linus Raes – December 2019
With help from Frederic De Vlieger

At SecureLink, continuous technical growth via self-learning is highly encouraged. This includes spending time in the lab and getting your hands ‘dirty’.

The following is a technical dive into ECMP on Palo Alto Networks firewalls. Although it started with the intention of finding a useful setup to implement at our customers, it quickly grew into a fun side-project to see which creative ways we could imagine to get it working. Please join me down the rabbit hole…

Setup

Imagine you have 2 ISPs and would like to have:

  • Internet access from the LAN over both lines simultaneously (active-active)
  • 2 IPSec tunnels to a remote site, one from each ISP for redundancy

The logic next step is to configure 3 virtual routers and create a flexible setup. An Internal-vr, ISP1-vr and ISP2-vr. Since each ISP-vr has its own default route, we can have IPSec tunnels initiated from both ISPs.

We can even add virtual-routers for more ISPs, and scale the setup.

However, we will have to add Policy-Based Forwarding policies to use both internet lines for outbound traffic from the LAN. This allows you to have redundancy (in case 1 ISP is down), but not the ability to use both internet lines “simultaneously” and increase your throughput.

Here comes ECMP (equal cost multipath), and it looks like everything you ever wanted.

Design 1: One Virtual router

In this setup we create just 1 virtual router, add our 2 ISPs, add 2 default routes and enable ECMP. This is by far the simplest setup.


1 virtual router setup


2 default routes: over eth1/2 and eth1/3, ECMP enabled

GOOD:

This setup does work for internet access from the LAN. Both lines will be used for outbound traffic simultaneously. (simultaneously meaning sessions are “evenly” distributed between the 2 internet lines)

Outbound sessions to both eth1-2 and eth1-3
Outbound sessions to both eth1/2 and eth1/3

NO GOOD:

Traffic originating from the firewall itself does not use both ISPs.

IPSec tunnels initiated from the firewall use the routing table to look up a route towards the peer address. This is why, in normal circumstances, we configure 3 virtual routers.

Now, however (with ECMP enabled), we do have 2 default routes, both active. This could – in theory – allow the firewall to bring up the IPSec tunnels over both internet lines.

This is not the case. Only 1 route is used to bring up both tunnels.


The tunnel to 10.32.25.172 originating from eth1/3 still uses eth1/2 as its outbound interface.

Design 2: Three Virtual routers and BGP

For this setup, we are retaining our 3 VRs. This makes sure that the IPSec tunnels can be initiated from both ISPs, since they each have their own VR, with their own default route configured.

GOOD:

The IPSec tunnels originate in their own VR and thus use the correct outbound interface by following their unique default route.

The IPSec tunnels originate in their own VR and thus use the correct outbound interface by following their unique default route.

 

NO GOOD:

We now need to configure internet access from the internal-vr. To be able to use both internet lines we will use ECMP.

First, we try by adding 2 static routes to next-vr ISP1 and next-vr ISP2 + enable ECMP. This, however, is not supported. An error is displayed upon commit.

Add 2 default routes to next-vr IPS1 and IPS2
Add 2 default routes to next-vr IPS1 and IPS2


This config is invalid

Okay, let’s try and work around this:

Create loopback interfaces in each VR, and setup BGP between the internal-vr and the VR of ISP1 and ISP2. Now use BGP to import the default routes of ISP1 and ISP2 into the internal-vr.

Setup BGP with loopback IPs in each VR
Setup BGP with loopback IPs in each VR

Internal-vr has 2 BGP neighbors
Internal-vr has 2 BGP neighbors

Default route is imported into local RIB from both ISPs
Default route is imported into local RIB from both ISPs

Finally: enabling ECMP should allow us to use both internet lines.

The routing table of Internal-vr includes both default routes: E (ecmp) flag; B (bgp) flag and A (active) flag
The routing table of Internal-vr includes both default routes: E (ecmp) flag; B (bgp) flag and A (active) flag

However: the forwarding table has only 1 default route!

This means that only 1 line is used for outbound internet traffic.


Only ethernet1/3 is used for outbound connections from the LAN

NOTE:

A support ticket at Palo Alto Networks TAC has been opened for this behavior. After a thorough investigation by the engineering team, the answer was: not supported.

“Based on the analysis this behavior is as expected – our developers confirmed that ECMP is not supported on inter-VR routes – next-hop multipath in similar conditions will not work, and FIB/routing table content will not be proper with inter-VR ECMP configuration.”

NOTE 2:

If we add interfaces and physically “leave” and “re-enter” the firewall between Internal-vr and ISP1-vr and ISP2-vr, we do get this setup to work. Obviously, this is a dirty workaround.


Physically leaving the firewall between the virtual-routers and setting up BGP over these interfaces


Routing table contains both default routes (ACTIVE, BGP, ECMP)


Forwarding table DOES contain both default routes


Both outbound interfaces ethernet1/4 and ethernet1/7 are used for outbound connections from the LAN

Design 3: Three Virtual routers and IPSEC + OSPF

GOOD:
Again we start with 3 virtual routers, so IPSec tunnels to the outside world are working.

NO GOOD:
To get internet access for the internal-vr, we will try and use OSPF.

OSPF cannot be set up between loopbacks (Loopbacks are /32 and will not be able to reach each other). Solution: add IPSec tunnels between the loopbacks and talk OSPF over the tunnels (add IP’s to the tunnel interfaces).

However, this does not work either.
The tunnels do come up, but the OSPF itself fails. No neighbors become visible, and even a ping between VPN tunnels fails.

NOTE:
Since TAC confirms inter-vr ECMP is not supported, this as well is “expected behavior.”
Again, if we add interfaces and physically leave the firewall between the different VRs, the OSPF does come up, and the setup works.

Conclusion

Although I was unable to get a working scenario that met all requirements using ECMP, the project did give me more insight into Palo Alto Networks routing mechanisms, and challenged me to come up with alternative solutions to a problem, and that in itself was a great success.

Palo Alto Networks’ new 9.1.x beta has SD-WAN functionality built-in and may prove to be the solution I was looking for. Want to know more about Palo Alto Networks, and other SD-WAN solutions? Don’t hesitate to contact us!

Want to join the fun? We are always searching for enthusiastic colleagues. Go to our job site to find a list of all our vacancies.