I was trying to setup the co-existing gateways as the first step in establishing the Branch To Branch Connectivity through Azure. The reference architecture is available in the following article.

The full implementation of this scenario is available in my GitHub Repository –

Target State Architecture

Observations and Learnings

Refer to the architecture diagram from the previous section while trying to make sense out of the following concepts

  • Behavior of the co-existing gateways- When the azure deployment has both the VPN and the express route Gateways with each being connected to a different on-premise site, the routes advertised by the site connected to the VPN gateway would not be advertised directly to the azure virtual networks.
    • In such a setup, the Express Route Gateway learns the routes from the VPN gateway and then advertises the collective routes (learnt from VPN-Gw and also from the site that it connects azure to) to the Azure Virtual Network
    • The behavior mentioned is one-way (ER-Gw learns from the VPN-Gw and does not happen the other way round). This is because the ER gateway is prioritized over the VPN Gateway and the prioritized gateway handles the route propagation to the azure virtual networks
  • If Route Server is added to the mix, as suggested in the reference architecture, Route Server learns the routes from both the gateways through iBGP
  • I initially made a mistake of advertising only the IP address of the RRAS machine that was used to setup the S2S Connection, i.e.
    • Azure VPN Gateway adds a static route to the VPN device IP(s) (RRAS in this case) in its routes list. If routes are not learnt through eBGP then the same are not advertised to the azure virtual network
    • In our scenario, the ER-Gw would not be able to learn the advertised route from the VPN gateway and so the same would not be advertised to the Azure Vnets
  • To fix the issue stated in the previous point, I had to advertise a broader address space to the VPN Gateway, i.e. This way, the VPN gateway would have learnt the route to the advertised address space through eBGP. In this case the Azure Virtual Networks would be able to learn the same
  • The second mistake was not establishing separate VPN tunnels to the Azure VPN Gateway instances. As deploying the Azure VPN-Gw in an active-active mode is a requirement, creating separate tunnels from the RRAS to these instances is a good practice
    • In absence of dedicated tunnels between the RRAS to the gateway instances, the second instance of azure VPN gateway would
      • Learn the site routes from the first instance (to which the connection is created)
      • Use the first instance as the next hop if Azure chooses the second instance’s tunnel to send traffic. This would create an additional hop
      • Despite having the VPN-Gw deployed in an active-active mode, azure would be able to use the only available tunnel to send/receive traffic from the on-premise site thereby not using the actual capabilities of active-active
  • Azure VPN-Gw required to be deployed in an active-active mode is currently a design limitation and is done to ensure high-availability and make the instances compatible with the Azure Route Server which would also be deployed in an active-active mode
  • If we need to favor one tunnel over the other, then we will have to assign a higher weight to the primary VPN connection while configuring the same from the RRAS

Information Source

Information on the internal behavior of the gateways and rest of the troubleshooting steps were obtained after working with a Microsoft support professional. On the whole, it was an amazing experience trying a fairly complicated scenario, testing the different possibilities, failing, troubleshooting with a Microsoft guy and finally closing it with a lot of learning.

Hope you found this article informative.