yap/README.rst
Ben Tremblay 7ce9202e5a Note minimum version for migrating off YAP
Specifically mention all aggregators carrying space traffic must be
version 6.5 or later to migrate a space off YAP.

DEV-3068
2020-01-06 15:03:25 -08:00

565 lines
17 KiB
ReStructuredText

============================
YAP: Yet Another Private WAN
============================
This is an alternative method of providing private WAN in bonding. Instead of
funneling traffic into private WAN routers via GRE tunnels, it peers space
tables directly on VLANs off the aggregators using OSPF. This allows for the
following improvemements over standard private WAN:
* Custom, more efficient backhauls can be used, improving speed in most cases
* Tables can be peered with any switches or routers in the data centers
* Reduces processing load on aggregators due to simplified rulesets
If a backhaul is not already set up in a data center, additional "VXR" boxes
can be added to each data center to provide an overlay backhaul using
VXLAN-over-IPSEC.
.. contents::
Installation and setup
----------------------
Initial installation
====================
First, install the software on the bondingadmin server::
make install
.. note:: The rest of the yap commands are run on the management server, unless
otherwise stated.
Then add a read-only user in the Bondingadmin web interface allow the tool to
query the API. Add the user details using the ``yap`` tool::
yap auth-set user@example.com mypassword
Upgrading
==========
From the directory containing the YAP checkout, usually ~/yap, perform the
following::
git pull
make install
yap upgrade [region]
region can be left blank if you wish to upgrade all regions at once.
Setting up regions
==================
Each region will have a series of aggregators and VLAN assignments for the
spaces. To add a region::
yap region-add yvr
Adding spaces
=============
To add the space with key ``foo``::
yap space-add foo
Setting VLAN region associations
================================
If a VLAN is not associated to a space in a region, none of the nodes in that
region will set up peering for the space. To add a VLAN association for space
``foo`` in region ``yvr`` on vlan ``1234``::
yap vlan-set foo yvr 1234
Enabling IPSEC
==============
To enable IPSEC::
yap ipsec-enable
Setting up a VXR
================
If using VXR hosts to provide a backhaul overlay, install the latest openSUSE
Leap distribution on a host, set up the base networking, then install and setup
salt-minion.
Assuming we are going to call the node ``yvr-vxr01`` and our Bondingadmin host
is bondingadmin.mydomain.com::
zypper in salt-minion
echo yvr-vxr01 > /etc/salt/minion_id
echo "master: bondingadmin.mydomain.com" > /etc/salt/minion.d/yap.conf
echo -e "grains:\n type: vxr" >> /etc/salt/minion.d/yap.conf
systemctl enable --now salt-minion
On the Bondingadmin server, accept the salt key for the box::
salt-key -a yvr-vxr01
Then add a record using ``yap``, with the name, ip, region, and VLAN trunk
port::
yap vxr-add yvr-vxr01 1.2.3.4 yvr eth1
The necessary software will be installed automatically.
If you want to add global OSPF to the VXR in order to transit non-private WAN
traffic::
yap vxr-enable-global yvr-xvr01
If it's enabled and you want to disable it::
yap vxr-disable-global yvr-xvr01
Adding aggregators
==================
To add an aggregator, get the ID from Bondingadmin, select a region for it,
setup a vlan trunk interface, then add it::
yap agg-add 1 yvr eth1
This will install some software on the aggregator to maintain the VLANs and
OSPF peering on the ``eth1`` trunk port.
To add a space-specific VLAN IP, you need the aggregator ID, the space key,
and the VLAN IP with the subnet mask. If unset, a default address will be used::
yap agg-set-space-ip 1 foo 10.7.7.7/30
Adding custom BIRD configuration
================================
To inject custom BIRD configuration through yap for a specific space on an
aggregator, first write the configuration to a file. To apply the configuration,
specify the aggregator ID, space key, and the filename::
yap agg-set-space-bird-config 1 foo bird.conf
Showing status
==============
On each Aggregator and VXR, there is a ``yap`` command that manages the local
state. to show the state of space ``foo``::
yap status foo
From the bondingadmin server, you can check state on multiple hosts
simultaneously by specifying a node list to the salt ``cmd.run`` command. For
example, to show the state of space ``foo`` on the VXR ``yvr-vxr01`` and the
aggregator with ID 1::
salt -C 'L@yvr-vxr01,node-1' cmd.run "yap status foo"
Architectural overview
----------------------
The following diagram shows an overview of the various nodes involved in a
typical YAP deployment for a space. This fictional space has a firewall in
YVR only, but bonds in both YVR and TOR.
The red circles denote details and troubleshooting commands that can be run
on each respective node.
.. image:: VXLAN-backhaul.png
:scale: 30 %
:alt: VXLAN backhaul diagram
.. This diagram may be updated at the following link:
https://www.lucidchart.com/invitations/accept/27dfc950-e351-4511-b42a-d1f08fe26833
Adding spaces
-------------
Prerequisites
=============
* All bonds are moved to yap-enabled aggregators.
* A VLAN is designated for each region that will host bonds. For example, for
a space that has bonds on aggregators in two regions, YVR and TOR, you must
designate a VLAN for both regions.
Migrating existing private WAN spaces
=====================================
The following commands are all to be run on the management server.
.. warning:: There will be a brief outage when migrating a space.
1. Add the space::
yap space-add <key>
This can be run in advance as it does not make any runtime changes.
2. To calculate the subnet for each region/space, you can run the following
command. This only returns the network that will be designated for the VLAN
on the aggregators in the region, it does not apply any changes::
yap subnet-get <key> <region>
This will return the base subnet for this space-region pair, as well as the
specific IPs of the aggregators in that region. The first IP in the subnet
is reserved for the firewall::
Subnet: 100.31.88.0/21
Firewall: 100.31.88.1
Aggregators:
agg03: 100.31.88.5
3. Configure the firewall with the IP shown in step 2 on the VLAN interface and
configure OSPF. While the exact settings will be vendor-specific, here are
the general details:
* area 0.0.0.0
* subnet <from step 2>
* redistribute connected
* hello interval 10s
* dead interval 40s
4. Add a VLAN association for each region::
yap vlan-set <key> <region> <vlan_id>
This will start the VLAN interfaces on each yap-enabled aggregator in the
region using the same subnet reflected in step 2.
.. caution:: This is the start of an outage for the space, as the private
WAN router's BGP protocols for the space are brought down to prevent
routing loops/conflicts.
5. Confirm OSPF is up in each region by running this command on the
aggregators::
yap status <key>
If the OSPF protocol is not 'Running', jump to troubleshooting
`B: Aggregator`_.
6. Once OSPF is up and the routes have propagated both ways, you can disable
the outbound gateway configured in the existing space to finish cleanup.
Adding new private WAN spaces
=============================
Follow the same steps as for migrating an existing space, with these two
exceptions:
* Enable private WAN on the space through the management server interface.
* An outbound gateway should not be enabled in the space's private WAN tab,
however, you may wish to add a disabled gateway for record-keeping of the
firewall's IP.
Troubleshooting
---------------
A: Bond
=======
While YAP doesn't directly affect bonds, it can be useful to troubleshoot
private WAN routes at the bond level, by inspecting their routing table::
ip route show table bonding-pwan
B: Aggregator
=============
YAP-enabled aggregators have a ``yap`` command installed that can be used
to show information about the spaces currently running on the aggregator.
The most useful command is ``yap status <space key>``, which shows the status
of the bird protocols and the current routing table for that space::
agg:~# yap status bammya
spcbammya BGP krt8251 up 2018-12-06 Established
ospf_bammya OSPF krt8251 up 07:21:22 Running
default via 100.109.152.1 dev vl-bammya proto bird
10.10.1.0/24 via 100.109.152.8 dev vl-bammya proto bird
192.168.33.0/24 via 100.109.152.8 dev vl-bammya proto bird
You can also directly check the status of the systemd service for any given
space::
agg:~# systemctl status yap-space@bammya.service
● yap-space@bammya.service - YAP space bammya
Loaded: loaded (/etc/systemd/system/yap-space@.service; disabled; vendor preset: enabled)
Active: active (exited) since Fri 2019-07-12 21:56:56 UTC; 1s ago
Process: 1210665 ExecStart=/usr/local/bin/yap check-policy-rules %i (code=exited, status=0/SUCCESS)
Process: 1210603 ExecStartPre=/usr/local/bin/yap service-start %i (code=exited, status=0/SUCCESS)
Main PID: 1210665 (code=exited, status=0/SUCCESS)
Jul 12 21:56:56 root-agg yap[1210665]: BIRD 2.0.2 ready.
Jul 12 21:56:56 root-agg yap[1210665]: spcbammya_pwr1_ipv6: disabled
You can also use a wildcard to see the status of all spaces, or perform other
operations on the services::
agg:~# systemctl restart yap-space@*.service
The BGP protocol for the space is controlled by bonding and should be in
'Established' state. The ``ospf_<key>`` protocol is the one managed by YAP and
should be in 'Running' state. If the status is 'Alone' instead, it means there
are no OSPF neighbors.
If you want to, you can show the current OSPF neighbors for a space::
pwanbirdc - show ospf neighbor ospf_<key>
An aggregator has one VLAN interface per space, which follows the naming
convention of ``vl-<key>``. You can use this command to show the VLAN id::
ip -d link show dev vl-bammya
Lastly, you can look at the VLAN interface to see the aggregator's IP, as well
as the subnet designated for the space and routing group::
agg:~# ip address show dev vl-bammya
440: vl-bammya@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether d0:43:1e:c5:1b:44 brd ff:ff:ff:ff:ff:ff
inet 100.109.152.7/21 scope global vl-bammya
In the example above, the firewall would be configured with ``100.109.152.1/21``.
Knowing the subnet, you can test ICMP connectivity to the firewall IP::
ping <gateway IP>
When troubleshooting OSPF it may be useful to run a packet capture on the VLAN
interface to see which options are set::
tcpdump -ni vl-<key> proto 89 -vvv
D: VXR
======
The most useful command is ``yap status <space key>``, which shows the status
of the bird protocol and the current routing table for that space::
agg:~# yap status bammya
ospf_bammya OSPF bammya up 07:21:23.175 Running
default via 100.109.152.1 dev vl-bammya proto bird metric 32
10.10.1.0/24 via 100.109.152.8 dev vl-bammya proto bird metric 32
Otherwise, the same troubleshooting steps apply as on the aggregator.
If you need to troubleshoot the VXLAN as well, you can view the interface
details with the standard linux utilities::
agg:~# ip -d l show dev vx-<key>
191: vx-bammya: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1432 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/ether 66:da:5c:17:37:38 brd ff:ff:ff:ff:ff:ff promiscuity 0
vxlan id 59 srcport 0 0 dstport 4789 ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
E: Firewall
===========
Out of YAP's control. Here be dragons.
F: bondingadmin
===============
Like all the nodes, there is a command in the path called ``yap`` that serves
as the entry point for all things backhauled. Most of the commands are
described above in their relevant sections. You can always run ``yap`` with
no arguments to see what actions are available::
root@bondingadmin:~# yap
/usr/local/bin/yap <action> [args]
Actions:
region-list
region-show <region>
region-add <region>
...
Migrating a YAP space to a managed mesh space
-----------------------------------------------
As of 6.5, a successor to YAP is properly available in bonding in the form
of the new private WAN modes (without PWRs) along with aggregator
interfaces, addresses, and protocols.
Migrating to managed mesh or unmanaged private WAN is required for continued
support, and can be done with minimal downtime given the appropriate preparation.
.. note::
To migrate a space to YAP, all aggregators carrying space traffic must be
upgraded to bonding version 6.5 or later.
Preface
============
Recall that YAP has the following sets of objects::
A (aggregators)
D (device names)
R (regions)
S (spaces)
VID (VLAN IDs)
IP (PWAN IPs)
and that these objects are related by the following functions::
r: A → R
d: A → D
v: S x R → VID
i: S x A → IP
Given these sets and maps, YAP works by doing the following for each space *s*
and aggregator *a*:
#. Create a VLAN interface on *d(a)* having VLAN ID *v(s, r(a))*
#. Add address *i(s, a)* to that VLAN interface.
#. Run OSPF on that VLAN interface.
Additionally, optional custom BIRD configuration can be defined for a space on a
particular aggregator, i.e. there is an optional YAP object::
B (Custom space BIRD configuration)
with relation::
b: S x A → B
To migrate from YAP to a managed mesh, we need to recreate the same objects,
i.e. for each space *s* and aggregator *a* we need to:
0. Create trunk interface *d(a)* on aggregator *a*
(this only needs to be done once for *a*).
#. Create a VLAN interface on *d(a)* with VID *v(s, r(a))*.
#. Add interface IP *i(s, a)* to that VLAN interface.
#. Create an OSPF protocol configured to have an area with that VLAN interface.
Preparation
================
The instructions in this section are for preparing to migrate from YAP to
managed mesh for a single private WAN space, one aggregator at a time.
Let **S** be the YAP space to be migrated,
let **A** be the aggregator to be migrated,
and let **R** be the region **A** belongs to.
.. tip::
All YAP commands given are run on the management server,
and all aggregator objects (interfaces, addresses, and protocols)
are created through the management server on the aggregator details
page.
1. Create an Ethernet interface on **A** for the trunk interface configured in
YAP (if it is not already created).
.. tip::
You can find the configured trunk interface for **A** with the YAP command::
yap agg-show <agg ID>
Look for the *trunk* value.
2. Create a VLAN device on aggregator **A** having the interface created in the
previous step as the trunk, and having the VLAN ID configured in YAP for
**S** in **R** as the ID.
Configure the interface to be associated with space **S**.
.. tip::
You can find the configured VLAN ID for **S** in **R** with the following YAP
command::
yap space-show <S key>
Below *VLAN associations*, look for **R** followed by the VLAN ID.
3. Add an address to the VLAN interface created in the previous step,
using the IP configured by YAP for **S** on **A**.
.. tip::
You can find the configured IP for **S** on **A** with the following YAP
command::
yap subnet-get <S key> <R>
Below `Aggregators`, look for **A** followed by the IP.
4. Create an OSPF protocol on aggregator **A** with the following configuration.
Anything not specified should be left to its default value in the form.
- Name: mm_<space key>
- Space: <space>
- Protocol: OSPF
- Enable: Off
- IPv4 import: All
- IPv4 export All
- Channel: IPv4
- Area:
- Area ID: 0.0.0.0
- Interface:
- Pattern: <name of VLAN created in step 2>
Click 'add area' to open the area form for configuring the Area ID,
and click 'add interface' to open the interface form to add the interface
pattern.
.. warning::
If you do not set *Enabled* off, you may unintentionally affect private
WAN traffic prematurely.
Migration
=========================
One the prepartion steps have been done for every aggregator carrying space
traffic, the space is ready to be migrated to managed mesh.
.. warning::
There will be a brief space outage during the migration.
To actually perform the migration, three things must be done:
#. Delete the space in YAP:
#. For each region, run *yap vlan-remove <space> <region>*
#. Run *yap space-delete <space>*
#. Change the space mode from 'with private WAN routers' to 'managed mesh'
#. Enable all the protocols created during the preparation phase.
Confirm these protocols peer with any upstream neighbors in each region and that
private WAN routes are being propogated.