PVE VXLAN NXOS Integration Continued Oh My!

Introduction

In part 2 we will continue with integration of a PVE cluster with a Cisco 9kv running NXOS.

Figure 1. shows the network diagram.

Fig. 1. Network Diagram

 We are using the 9000v as an eBGP peer for the PVE cluster. Any device that supports BGP can replace the Cisco 9k.

SETUP

We have a PVE cluster. However, we are now connecting to two tenants via eBGP and we are accessing them via eBGP peering, This neatly separates the functions of the cluster (internal machines) and access to external clients.

A few notes:

1) We have two zones configured in the cluster.

a) Zone 1 for tenant A (TNA), 10.10.100.0/24

b) Zone 2 for tenant B (TNB), 10.10.200.0/24

c) Make sure any container in each zone configuration can ping each other.

2) Once again we use a manual fabric. This time though we configure OSPF so the cluster and the Cisco are reachable sans static routes.

Configuration

PVE

Configure a standard PVE cluster.

1) Configure the DATA segment. Add a new vmbr1 bridge on each node and give it the corresponding IP address.

2) Create an evpn controller (DC1) and add each IP configured in (1) to it: 192.168.1.41,42,43. Use ASN 65000 the default.

3) Create an eBGP controller and add 192.168.1.10 (IP of the Cisco). Use PX3 as the source of the peering.

4) Create two zones:

a) v100 and v200. Put them under the controller that you just created.

b) Assign vxlan:10000 to v100, vxlan:20000 to v200.

c) Create two VNETS:

1. TEST1: vni 11000 on v100

2. TEST2: vni 21000 on v200

d) Assign 10.10.100.0/24 (GW 10.10.100.1) to TEST1, 10.10.200.0/24 (GW 10.10.200.1) to TEST2.

e) Configure SNAT if you wish. IMPORTANT: make PX3 the exit node for the zones. This will guarantee that BGP routes are advertised correctly.

Create file “snd-local” and put it under “/etc/network/interface.d” and add the following to it:

auto dummy_DC1
iface dummy_DC1 inet static
        address 10.10.10.3/32
        link-type dummy
        ip-forward 1

auto vmbr1
iface vmbr1 inet static
        address 192.168.1.43/24
        ip-forward 1

Replace above on all the nodes making sure to enter the correct IP addresses for each node. Here DC1 refers to the controller name created earlier.

Create “frr.conf.local” in the “/etc/frr/” directory:

router ospf
 ospf router-id 10.10.10.3
exit
!
interface dummy_DC1
 ip ospf area 0
 ip ospf passive
exit
!
interface vmbr1
 ip ospf area 0
exit

Do the same on each node and enter the correct router-id.

Create “/etc/default/frr” and add “ospfd=yes”. This will prevent PVE from disabling “ospfd” under the daemons setting every time you apply new SDN settings.

Apply SDN settings and make sure there are no issues, correct them if so.

Finally create containers in each zone and make sure they can ping each other. Check that OSPF is running and peering is up. Use “vtysh” to do so, the syntax is similar to that of Cisco’s CLI.

Instead of creating dummy interfaces you could also have used the standard loopback interface “lo” that Linux uses. I did it this way to show how to manipulate the creation on interfaces outside the GUI.

Cisco 9000v

Configuration

Below are the pertinent parts of the 9k configuration.

nv overlay evpn
feature ospf
feature bgp
feature pim
feature fabric forwarding
feature interface-vlan
feature vn-segment-vlan-based
feature nv overlay

Next the  vrf part of the setup.

ip prefix-list tna seq 10 permit 10.10.100.0/24 
ip prefix-list tna-export seq 10 permit 172.16.10.0/24 
ip prefix-list tna-export seq 20 permit 192.168.0.0/30 
ip prefix-list tnb seq 10 permit 10.10.200.0/24 
ip prefix-list tnb-export seq 10 permit 172.16.20.0/24 
ip prefix-list tnb-export seq 20 permit 192.168.4.0/30 
route-map tna-export permit 10
  match ip address prefix-list tna-export 
route-map tna-import permit 10
  match ip address prefix-list tna 
route-map tnb-export permit 10
  match ip address prefix-list tnb-export 
route-map tnb-import permit 10
  match ip address prefix-list tnb 
vrf context management
vrf context tna
  rd 10.10.10.10:100
  address-family ipv4 unicast
    route-target import 10.10.10.10:100
    route-target export 10.10.10.10:100
    import vrf default map tna-import
    export vrf default map tna-export
vrf context tnb
  rd 10.10.10.10:200
  address-family ipv4 unicast
    route-target import 10.10.10.10:200
    route-target export 10.10.10.10:200
    import vrf default map tnb-import
    export vrf default map tnb-export

We need to leak routes between the global table route (GRT) and the vrfs TNA and TNB. The reason for this should be apparent. The eBGP controller from the cluster is sending routes from the VNETS, they need to be injected into the corresponding tenants.

We need to make sure we do not leak routes from VRF TNA onto VRF TNB or vice versa, when the routes from the tenants are send back back to the cluster, PVE will put them in the correct VNET.

Next we set the correct IP addresses.

interface Ethernet1/1
  no switchport
  mac-address 0000.1111.1111
  ip address 192.168.1.10/24
  no shutdown

interface Ethernet1/2

interface Ethernet1/3

interface Ethernet1/4

interface Ethernet1/5

interface Ethernet1/6
  no switchport
  mac-address 0000.2222.1234
  vrf member tna
  ip address 192.168.0.1/30
  no shutdown

interface Ethernet1/7
  no switchport
  mac-address 0000.3333.1234
  vrf member tnb
  ip address 192.168.0.5/30
  no shutdown

Finally we set the mgmt vrf address (if we want to access via ssh) and the loopback address.

Then we configure eBGP back to the cluster and the vrf BGP to the tenants

interface mgmt0
  vrf member management
  ip address 192.168.100.10/24

interface loopback0
  ip address 10.10.10.10/32
icam monitor scale

line console
  exec-timeout 0
line vty
  exec-timeout 0
boot nxos bootflash:/nxos.9.3.5.bin sup-1
router bgp 65100
  router-id 10.10.10.10
  address-family ipv4 unicast
    network 10.10.200.0/24
  neighbor 192.168.1.43
    remote-as 65000
    address-family ipv4 unicast
      soft-reconfiguration inbound
    address-family l2vpn evpn
      send-community
      send-community extended
  vrf tna
    address-family ipv4 unicast
      network 10.10.100.0/24
    neighbor 192.168.0.2
      remote-as 64000
      address-family ipv4 unicast
        send-community
        send-community extended
  vrf tnb
    address-family ipv4 unicast
      network 10.10.200.0/24
    neighbor 192.168.0.6
      remote-as 63000
      address-family ipv4 unicast
        send-community
        send-community extended
!

 Remarks:

1) Two tenants are created tenant A (TNA) and tenant B (TNB).

2) In Fig 1. TNA and TNB are routers capable of peering via BGP.

3) As stated two vrfs are created tna and tnb.

4) By default the global routing table (GRT) and each for the tenants (TNA), (TNB) will only hold routes for the cluster and each tenant. We need to leak routes so each other is aware of the other. From the security perspective this is not an issue, if your are the provider you need access to all routes.

a) However, you need to have prefix lists and route maps to prevent accidentally leaking routes between tenants.

b) Pay attention under the VRF definitions and how we apply route maps to inject routes to the GRT, TNA and TNB.

5) The peering to PX3 brings the routes from and to the cluster.

Testing

Cisco 9000v

We should see now that both the cluster and and 9k should have exchange routes.

On the 9k the standard IOS commands should show you the status of BGP peering.

nxos-9k# sh ip bgp summary 
BGP summary information for VRF default, address family IPv4 Unicast
BGP router identifier 10.10.10.10, local AS number 65100
BGP table version is 9, IPv4 Unicast config peers 1, capable peers 1
5 network entries and 6 paths using 980 bytes of memory
BGP attribute entries [5/860], BGP AS path entries [3/18]
BGP community entries [0/0], BGP clusterlist entries [0/0]

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
192.168.1.43 4 65000 41 38 9 0 0 00:01:41 2

This of course shows peering to PX3 because we are doing IPV4 unicast to it.

Next we can see routes that the 9k is getting from the cluster and what we will be sending to it.

nxos-9k# sh ip bgp 
BGP routing table information for VRF default, address family IPv4 Unicast
BGP table version is 9, Local Router ID is 10.10.10.10
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-i
njected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup, 2 - b
est2

Network Next Hop Metric LocPrf Weight Path
*>e10.10.100.0/24 192.168.1.43 0 0 65000 ?
 l10.10.200.0/24 0.0.0.0 100 32768 i
*>e 192.168.1.43 0 0 65000 ?
*>e172.16.10.0/24 192.168.0.2 0 0 64000 i
*>e172.16.20.0/24 192.168.0.6 0 0 63000 i
*>e192.168.0.0/30 192.168.0.2 0 0 64000 ?

As you can see 10.10.x.0/24 are the containers, we also see the 172.16.x.0 routes we injected from the tenants, They should appear within the cluster.

Finally if we do:

nxos-9k# sh bgp l2vpn evpn 
nxos-9k# sh bgp l2vpn evpn summary 
BGP summary information for VRF default, address family L2VPN EVPN
BGP router identifier 10.10.10.10, local AS number 65100
BGP table version is 2, L2VPN EVPN config peers 1, capable peers 0
0 network entries and 0 paths using 0 bytes of memory
BGP attribute entries [0/0], BGP AS path entries [0/0]
BGP community entries [0/0], BGP clusterlist entries [0/0]

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
192.168.1.43 4 65000 220 216 0 0 0 00:10:37 0 (No Cap)

You may think this an error that needs to be resolved. It is not, we are trying to do EVPN to a device that is not configured for it (the cluster). If you notice we left the commands for the address family L2VPN EVPN in the configuration, they are not needed. They should be taken out.

PVE Cluster

At a command prompt issue “vtysh”, this is a command that you should know by heart if you use FRR (which PVE uses).

We should see routes.

pve-3# sh bgp pv4 unicast
BGP table version is 5, local router ID is 192.168.1.43, vrf id 0
Default local pref 100, local AS 65000
Status codes: s suppressed, d damped, h history, u unsorted, * valid, > best, = multipath,
 i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

Network Next Hop Metric LocPrf Weight Path
 *> 10.10.100.0/24 0.0.0.0(pve-3)@10<
 0 32768 ?
 *> 10.10.200.0/24 0.0.0.0(pve-3)@15<
 0 32768 ?
 *> 172.16.10.0/24 192.168.1.10 0 65100 64000 i
 *> 172.16.20.0/24 192.168.1.10 0 65100 63000 i
 *> 192.168.0.0/30 192.168.1.10 0 65100 64000 ?

Displayed 5 routes and 5 total paths

As you can see we have the 10.x.x.x from our containers and in addition the 176.16.x.x routes from the external tenants.

Tenants

You should be able to ping from the tenants (tenant b) here:

tnb#sh ip int bri
Interface IP-Address OK? Method Status Protocol
FastEthernet0/0 192.168.0.6 YES NVRAM up up 
FastEthernet0/1 172.16.20.1 YES NVRAM up up 
Loopback0 172.16.1.2 YES NVRAM up up 
tnb#ping 10.10.200.20 so
tnb#ping 10.10.200.20 source 172.16.20.1

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.10.200.20, timeout is 2 seconds:
Packet sent with a source address of 172.16.20.1 
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 8/12/20 ms
tnb#ping 10.10.200.10 source 172.16.20.1

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.10.200.10, timeout is 2 seconds:
Packet sent with a source address of 172.16.20.1 
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 8/10/12 ms
tnb#

Notice we need to source the ping from the right interface. And for those from Missouri the show me state, to show were not pulling a fast one.

From the container of tenant a:

 test100:~# ifconfig
eth0 Link encap:Ethernet HWaddr BC:24:11:79:34:C1 
 inet addr:10.10.100.10 Bcast:0.0.0.0 Mask:255.255.255.0
 inet6 addr: fe80::be24:11ff:fe79:34c1/64 Scope:Link
 UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1
 RX packets:8 errors:0 dropped:0 overruns:0 frame:0
 TX packets:9 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:1000 
 RX bytes:656 (656.0 B) TX bytes:726 (726.0 B)

lo Link encap:Local Loopback 
 inet addr:127.0.0.1 Mask:255.0.0.0
 inet6 addr: ::1/128 Scope:Host
 UP LOOPBACK RUNNING MTU:65536 Metric:1
 RX packets:0 errors:0 dropped:0 overruns:0 frame:0
 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:1000 
 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

test100:~# ping 172.16.10.1
PING 172.16.10.1 (172.16.10.1): 56 data bytes
64 bytes from 172.16.10.1: seq=0 ttl=252 time=8.330 ms
64 bytes from 172.16.10.1: seq=1 ttl=252 time=9.146 ms
^C
--- 172.16.10.1 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 8.330/8.738/9.146 ms

We can ping router TNA!

Conclusion

This lab shows how to add an eBGP peer so you can connect external clients to the PVE cluster. Hope you enjoyed it.

Take care,

Ciao.

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *