Posts Tagged ‘convergence’

OSPF comparison with RIP

August 19, 2012

Problems with RIP

RIP is a very basic routing protocol with slow convergence time and primitive best route computation based on the number of hops. Router configured to use RIP, sends route updates to its neighbors every 30 seconds. If you have many routers in your network, which is quite common with modern Layer 2/3 switches, then each time you reconfigure routes, changes propagate for unacceptable amount of time. In worst case each router waits for 30 seconds to send an update to the next router in a chain. Network failures make things even worse. Router considers link as failed if it doesn’t receive updates from it for 180 seconds. Then RIP uses a number of loop avoidance techniques to advertise the failed route. For the end user it means network is unreachable for ages in networking terms. More or less critical infrastructures cannot tolerate such delays. Additionally, RIP calculates best route depending on the hop count to the network and doesn’t account for link speeds, which sometimes becomes inappropriate.

OSPF Solution

Open Shortest Path First (OSPF) protocol was developed to solve RIP’s problems. Neighbor routers in OSPF send topology changes to each other immediately. It became achievable because OSPF sends only changes, not all routes as RIP does. In OSPF routers maintain a so called Link-State Database (LSDB), which contains Link-State Advertisements (LSA). In fact, LSDB doesn’t contain routes themselves, but topology. LSA is either a link record, which has information about a subnet and routers connected to it, or router record which contains information on router’s IPs and masks. Each link in OSPF has a metric. Metrics are weighted based on link speeds. Then OSPF needs to calculate shortest paths and fill routing table. Dijkstra Shortest Path First (SPF) algorithm is applied to LSDB to find best routes.

Link failures is another story. Link failure timer in OSPF is 40 seconds, in comparison to 180 for RIP. But the main issue is that there are a number of routing loop problems inherent to RIP. On link failures RIP uses loop avoidance features, such as “split horizon”, “route poisoning”, “poison reverse”, as well as holddown timer, which take considerable amount of time for RIP to converge. In OSPF routers avoid loops by first asking its neighbors if they lack any LSAs. If router has all LSAs in its LSDB, neighbors do not exchange any information. This allows OSPF to converge much more quickly.

How STP and RSTP converge

July 20, 2012

In my previous post I described how STP works in normal circumstances. Each 2 seconds root switch sends BPDU Hello packets on all of its ports (since they are all designated) with cost to reach the root which is equal to 0, with root ID (RID) equal to root switch ID and bridge ID equal to ID of the sending switch, which in this case is the same as RID. When non-root switch receives Hello BPDU from its root port (RP) it adds its cost to reach the root, changes BID and send further. Now, what happens if a switch’s link with the shortest path to reach the root fails? STP starts to converge.

STP convergence process

Switch waits for the Max Age time before considering link as failed.  Max Age timer is equal to 10 times of Hello timer. And time between Hellos is usually 2 seconds.  First step in convergence process is re-evaluating a root switch. If the original root switch still has connection to the network, then the switch in question will receive Hello BPDU from it and nothing will change. Otherwise switches will elect a new root.

Next, switch needs to choose new RP. It’s simple. Look through costs to reach the root of all available links and choose the cheapest. Additionally, switch selects which ports are now DPs.

After the port roles are identified, switch transition RP from Blocking state to Forwarding. However, it implies two transitional states: Listening and Learning. Listening state is 15 seconds and is necessary for old MAC table entries to timeout. Otherwise temporary loops are possible. In Learning state switch begins to gather MAC addresses from received packets (for the same 15 seconds). In Listening and Learning states switch do not forward packets. After both transitional states have been finished, port is transitioned to a forwarding state. So during STP convergence, port can be inaccessible for 50 seconds.

RSTP convergence

The key difference between STP and RSTP is rapid convergence of the latter. Hence the name Rapid STP. First of all, RSTP waits for 3 times of Hello timer. So it’s 6 seconds instead of 20. Apart from that, when RP link fails RSTP block all its ports, eliminating loops. It means that Listening state is not needed in this case, which saves us another 15 seconds. And in Learning phase switch sends RSTP proposal message to the neighboring switch right away. And quickly receives agreement, which implies that link is established and is in Forwarding state. As a result, RSTP convergence time is shortened from 50 seconds to 1-10 seconds timeframe.

Spanning Tree Protocol Overview

July 16, 2012

When it comes to switching it is recommended to understand how STP works. STP was developed to prevent loops. For example, you connect 3 switches in a ring, some host sends a broadcast packet. Since broadcast packet is flooded to all ports (forget about VLANs for a moment) it will travel several times around the ring until its TTL is equal to 0. This situation will never happen if you work on Cisco switches. They have STP enabled by default. Some low-budget switches do not support STP at all.

To prevent loops STP disables some ports or in other words put them in a blocking state. Ports that are left to forward traffic are in a forwarding state. To exchange STP information switches use Bridge Protocol Data Units (BPDU). They contain three main fields: root switch ID, sender switch ID and cost to reach the root. ID is almost random and are based on priorities and MACs. Cost depends on link speed. 100Mb port’s priority equals to 19, 1Gb is 4, etc.

STP starts from electing a root switch. All switches exchange their IDs and switch with the lowest ID becomes a root switch. As stated above root switch is almost a random choice, but you can manually assign priority if needed. Then spanning tree algorithm (STA) searches for root ports (RP) and designated ports (DP). RP is a port with the shortest path to the root switch. Shortest path is founded based on link weights and if they are equal on switch IDs. DP is a port with the lowest cost to the root on that Ethernet segment. Ethernet segment here is a collision domain, which in its turn in switched network is simply an Ethernet link between two switches. Basically, that means that you will have one shortest path from each non-root switch to the root switch. On one side of each link will be a RP and on the other a DP port. All non-shortest paths will have DP on one side and non-DP non-RP  (blocked) port on the other side. Traffic will not traverse through this port to prevent loops.

You may ask, what’s the point of such distinction between DP and RP in this concept if the only thing that matters is the shortest path. Even though RP and DP lies on the shortest path to the root, just from the opposite sides, there is one significant distinction between them. DP is the port from which Hello BPDUs are continuously sent. Hello BPDU simply indicates that link between switches is working and contains information which allows switch on the other side of the link to find the new shortest path to the root in case an old link brakes. Another difference is that DPs exist not only on root paths, but on each of the Ethernet links.

Along with STP, there is a RSTP, which stands for Rapid Spanning Tree Protocol. The reason for RSTP is that STP converges slowly. Convergence is a process which happens when network topology changes and switches need to reevaluate port statuses (blocking/forwarding). STP converges for approximately 50 seconds. RSTP convergence time is 1 to 10 seconds.

STP and RSTP have several implementations. Cisco by default uses PVST+ (or simply PVST) which is an abbrevation for Per-VLAN Spanning Tree Plus, instead o pure IEEE’s STP. PVST creates one STP topology per VLAN. Instead of using one link for all VLANs and block all other links, you can use first link for even VLANs and second for odd. PVST allows you to do that. Cisco’s implementation of RSTP is called PVRST (Per-VLAN Rapid Spanning Tree) or RPVST (Rapid Per-VLAN Spanning Tree). There is an IEEE implementation of protocol similar to PVRST. It’s called MIST – Multiple Instances of Spanning Trees. MIST is an implementation of RSTP. MIST’s difference from PVRST is that it doesn’t create separate STP for each VLAN as PVRST does by design, but lets you create one STP for multiple VLANs.