Archive for the ‘Networking’ Category

Force10 MXL Switch: Port Numbering

February 26, 2015

This is a quick cheat sheet fro MXL port numbering schema, which might seem a bit confusing if you see a MXL switch for the first time.

force10_mxl_10-40gbe_dsc0666

Above is the picture of the switches that I’ve worked with. On the right we have a 2-Port 40GbE built-in module. And then there’re two expansion slots – slot 0 in the middle and slot 1 on the left. Each module has 8 ports allocated to it. The reason being that you can have 2-Port 40-GbE QSFP+ modules in each of the slots, which can operate in 8x10GbE mode. You will need QSFP+ to 4xSFP+ breakout cables, but that’s not the most common scenario anyway.

As we have 8 ports per slot, it would look something like this:

mxl-external-port-mappings

This picture is more for switch stacking, but the rightmost section should give you a basic idea. One of the typical MXL configurations is when you have a built-in 40GbE module for stacking and one or two 4-Port SFP+ expansion modules in slots 0 and 1. In that case your port numbers will be: 33 and 37 for 40GbE ports, 41 to 44 in expansion slot 0 and 49 to 52 in expansion slot 1.

11-01-05-hybrid-qsfp-plus4-port-SFP-module

As you can see for QSFP+ module switch breaks 8 ports in two sets of 4 ports and picks the first number in each set for 40GbE ports. And for SFP+ modules it uses consecutive numbers within each slot and then has a 4 port gap.

Port numbering is described in more detail in MXL’s switch configuration guide, which you can use for your reference. But this short note might help someone to quickly knock that off instead of browsing through a 1000 page document.

Also, I’ve seen pictures of MXL switches with a slightly different port numbering: 41 to 48 in slot 0 and 33 to 40 in slot 1. Which seems like a mirrored version of the switch with a built-in module on the opposite side of it. I’m not sure if it’s just an older version of the same switch, but keep in mind that you might actually have the other variation of the MXL in your blade chassis.

Advertisements

EIGRP enhancements

August 19, 2012

Enhanced Interior Gateway Routing Protocol (EIGRP) is a Cisco proprietary IGP. So if you have several vendors inside your corporate LAN like HP or Juniper then it’s probably not your choice. However, EIGRP has several enhancements that make it even faster in convergence time in comparison to OSPF.

One of the main drawbacks of OSPF is that it consumes considerable amount of memory to maintain LSDB and CPU power to run Dijkstra on it. EIGRP doesn’t do that. Routers with EIGRP enabled on their interfaces exchange only partial information with their neighbors, as OSPF does. But EIGRP routers don’t maintain the whole topology. On that matters they behave more like RIP. Each router holds information about networks and next hop routers to reach them. But unlike RIP, for each network EIGRP finds primary and secondary (if possible) routes. So that in case of link failure router could immediately switch to the backup route. In EIGRP terminology main route is called successor route and alternative route is feasible successor route.

Also, EIGRP has more sophisticated metric calculation. It considers not only bandwidth, but also delay. The formula is:

metric  = (10^7 / least-bandwidth + cumulative-delay) * 256

Here least-bandwidth is the slowest link speed in kbps along the path and cumulative-delay is sum of all delays from the network to the router in tens of microseconds.

To understand how EIGRP preventsloops there is a need for another two terms. Feasible Distance (FD) is a metric of the best route to reach a subnet, as calculated on a router. And Reported Distance (RD) is a metric as calculated on a neighboring router and then reported and learned in an EIGRP update. The trick here is that route can be a feasible successor route only if its RD is less than FD. It guarantees that this route doesn’t go through this router. Because otherwise it would obviously be greater than FD.

Again, EIGRP is better IGP from all perspectives. The only barrier that restricts its proliferation is proprietary nature of the protocol.

OSPF comparison with RIP

August 19, 2012

Problems with RIP

RIP is a very basic routing protocol with slow convergence time and primitive best route computation based on the number of hops. Router configured to use RIP, sends route updates to its neighbors every 30 seconds. If you have many routers in your network, which is quite common with modern Layer 2/3 switches, then each time you reconfigure routes, changes propagate for unacceptable amount of time. In worst case each router waits for 30 seconds to send an update to the next router in a chain. Network failures make things even worse. Router considers link as failed if it doesn’t receive updates from it for 180 seconds. Then RIP uses a number of loop avoidance techniques to advertise the failed route. For the end user it means network is unreachable for ages in networking terms. More or less critical infrastructures cannot tolerate such delays. Additionally, RIP calculates best route depending on the hop count to the network and doesn’t account for link speeds, which sometimes becomes inappropriate.

OSPF Solution

Open Shortest Path First (OSPF) protocol was developed to solve RIP’s problems. Neighbor routers in OSPF send topology changes to each other immediately. It became achievable because OSPF sends only changes, not all routes as RIP does. In OSPF routers maintain a so called Link-State Database (LSDB), which contains Link-State Advertisements (LSA). In fact, LSDB doesn’t contain routes themselves, but topology. LSA is either a link record, which has information about a subnet and routers connected to it, or router record which contains information on router’s IPs and masks. Each link in OSPF has a metric. Metrics are weighted based on link speeds. Then OSPF needs to calculate shortest paths and fill routing table. Dijkstra Shortest Path First (SPF) algorithm is applied to LSDB to find best routes.

Link failures is another story. Link failure timer in OSPF is 40 seconds, in comparison to 180 for RIP. But the main issue is that there are a number of routing loop problems inherent to RIP. On link failures RIP uses loop avoidance features, such as “split horizon”, “route poisoning”, “poison reverse”, as well as holddown timer, which take considerable amount of time for RIP to converge. In OSPF routers avoid loops by first asking its neighbors if they lack any LSAs. If router has all LSAs in its LSDB, neighbors do not exchange any information. This allows OSPF to converge much more quickly.

Routing Basics

August 6, 2012

Interfaces and Default routes

Routers use Layer 3 IP addressing when deciding where packets should go to. Hence each router interface should have an IP address, otherwise interface won’t be used at all. You simply go:

configure terminal
interface Fa0/0
ip address 10.1.1.1 255.255.255.0

Now router knows about 10.1.1.0/24 corporate network (it’s called “connected route”) and route packets destined to it through Fa0/0 interface. It could be a number of switches behind Fa0/0.

From the opposite site router is usually connected to the Internet (links between routers are usually /30 networks with 2 useable addresses):

configure terminal
interface Fa0/1
ip address 172.16.3.2 255.255.255.252

To tell the router that Fa0/1 is the outside interface where packets to all other networks go, you configure a default route (which is defined as route to network 0.0.0.0):

ip route 0.0.0.0 0.0.0.0 172.16.3.2

Static routes and RIP

Now the reasonable question here is what if we have several networks/routers behind the border router. How will they know about each other’s networks?

One answer is static routes. You can tell router1 that router2 has network2 behind it by adding a static route to the network2 on the router1:

ip route 10.1.2.0 255.255.255.0 10.1.128.254

Here routers are connected using network 10.1.128.252/30 and router2 has network 10.1.2.0/24 behind it. 10.1.128.254 is the router2 ip address (next hop) where router1 should send packets for network 10.1.2.0. If you have many networks in organization, then static routes are obviously not a solution. It’s nearly impossible to configure all routers with static routes to all networks. That is where routing protocols come into picture.

The most primitive routing protocol which is common in LANs is Routing Information Protocol or simply RIP. Using RIP all routers exchange information about routes they know. As a result of RIP convergence all routers know about all networks which exist in corporate LAN. RIP is not meant to be used in WANs due to excessive amount of traffic. Each router sends RIP updates in 30 seconds. Since receiving router in its turn forwards this update to all its interfaces, it would simply paralyze the Internet. To enable RIP updates do the following:

configure terminal
router rip
version 2
network 199.1.1.0
network 10.0.0.0

This tells router to send RIP updates about all its networks on interfaces where networks 199.1.1.0 and 10.0.0.0 are configured.

RIP updates propagate as a broadcast storm. So if router has redundant links, it can receive RIP information about the same network from several interfaces. RIP uses distance in that case. Each time packet comes to a router, link with the shortest path is used to forward it.

How STP and RSTP converge

July 20, 2012

In my previous post I described how STP works in normal circumstances. Each 2 seconds root switch sends BPDU Hello packets on all of its ports (since they are all designated) with cost to reach the root which is equal to 0, with root ID (RID) equal to root switch ID and bridge ID equal to ID of the sending switch, which in this case is the same as RID. When non-root switch receives Hello BPDU from its root port (RP) it adds its cost to reach the root, changes BID and send further. Now, what happens if a switch’s link with the shortest path to reach the root fails? STP starts to converge.

STP convergence process

Switch waits for the Max Age time before considering link as failed.  Max Age timer is equal to 10 times of Hello timer. And time between Hellos is usually 2 seconds.  First step in convergence process is re-evaluating a root switch. If the original root switch still has connection to the network, then the switch in question will receive Hello BPDU from it and nothing will change. Otherwise switches will elect a new root.

Next, switch needs to choose new RP. It’s simple. Look through costs to reach the root of all available links and choose the cheapest. Additionally, switch selects which ports are now DPs.

After the port roles are identified, switch transition RP from Blocking state to Forwarding. However, it implies two transitional states: Listening and Learning. Listening state is 15 seconds and is necessary for old MAC table entries to timeout. Otherwise temporary loops are possible. In Learning state switch begins to gather MAC addresses from received packets (for the same 15 seconds). In Listening and Learning states switch do not forward packets. After both transitional states have been finished, port is transitioned to a forwarding state. So during STP convergence, port can be inaccessible for 50 seconds.

RSTP convergence

The key difference between STP and RSTP is rapid convergence of the latter. Hence the name Rapid STP. First of all, RSTP waits for 3 times of Hello timer. So it’s 6 seconds instead of 20. Apart from that, when RP link fails RSTP block all its ports, eliminating loops. It means that Listening state is not needed in this case, which saves us another 15 seconds. And in Learning phase switch sends RSTP proposal message to the neighboring switch right away. And quickly receives agreement, which implies that link is established and is in Forwarding state. As a result, RSTP convergence time is shortened from 50 seconds to 1-10 seconds timeframe.

Spanning Tree Protocol Overview

July 16, 2012

When it comes to switching it is recommended to understand how STP works. STP was developed to prevent loops. For example, you connect 3 switches in a ring, some host sends a broadcast packet. Since broadcast packet is flooded to all ports (forget about VLANs for a moment) it will travel several times around the ring until its TTL is equal to 0. This situation will never happen if you work on Cisco switches. They have STP enabled by default. Some low-budget switches do not support STP at all.

To prevent loops STP disables some ports or in other words put them in a blocking state. Ports that are left to forward traffic are in a forwarding state. To exchange STP information switches use Bridge Protocol Data Units (BPDU). They contain three main fields: root switch ID, sender switch ID and cost to reach the root. ID is almost random and are based on priorities and MACs. Cost depends on link speed. 100Mb port’s priority equals to 19, 1Gb is 4, etc.

STP starts from electing a root switch. All switches exchange their IDs and switch with the lowest ID becomes a root switch. As stated above root switch is almost a random choice, but you can manually assign priority if needed. Then spanning tree algorithm (STA) searches for root ports (RP) and designated ports (DP). RP is a port with the shortest path to the root switch. Shortest path is founded based on link weights and if they are equal on switch IDs. DP is a port with the lowest cost to the root on that Ethernet segment. Ethernet segment here is a collision domain, which in its turn in switched network is simply an Ethernet link between two switches. Basically, that means that you will have one shortest path from each non-root switch to the root switch. On one side of each link will be a RP and on the other a DP port. All non-shortest paths will have DP on one side and non-DP non-RP  (blocked) port on the other side. Traffic will not traverse through this port to prevent loops.

You may ask, what’s the point of such distinction between DP and RP in this concept if the only thing that matters is the shortest path. Even though RP and DP lies on the shortest path to the root, just from the opposite sides, there is one significant distinction between them. DP is the port from which Hello BPDUs are continuously sent. Hello BPDU simply indicates that link between switches is working and contains information which allows switch on the other side of the link to find the new shortest path to the root in case an old link brakes. Another difference is that DPs exist not only on root paths, but on each of the Ethernet links.

Along with STP, there is a RSTP, which stands for Rapid Spanning Tree Protocol. The reason for RSTP is that STP converges slowly. Convergence is a process which happens when network topology changes and switches need to reevaluate port statuses (blocking/forwarding). STP converges for approximately 50 seconds. RSTP convergence time is 1 to 10 seconds.

STP and RSTP have several implementations. Cisco by default uses PVST+ (or simply PVST) which is an abbrevation for Per-VLAN Spanning Tree Plus, instead o pure IEEE’s STP. PVST creates one STP topology per VLAN. Instead of using one link for all VLANs and block all other links, you can use first link for even VLANs and second for odd. PVST allows you to do that. Cisco’s implementation of RSTP is called PVRST (Per-VLAN Rapid Spanning Tree) or RPVST (Rapid Per-VLAN Spanning Tree). There is an IEEE implementation of protocol similar to PVRST. It’s called MIST – Multiple Instances of Spanning Trees. MIST is an implementation of RSTP. MIST’s difference from PVRST is that it doesn’t create separate STP for each VLAN as PVRST does by design, but lets you create one STP for multiple VLANs.

VLANs, trunking and VTP

July 3, 2012

Virtual LANs

If you would think of pure Level 2 switch then all hosts connected to it are considered as a single LAN, even though they might be in several different LANs. It means that when a broadcast frame (or frame to a host with an unknown MAC) comes in, it is flooded to all ports. It’s insecure and it can overwhelm mid-size to large networks. And this is the reason why concept of VLANs, as well as IEEE 802.1Q, ISL and VTP protocols were developed. VLAN segments Ethernet traffic to a number of particular ports. In almost all cases VLAN consists of hosts from one network. To create VLAN you run:

configure terminal interface range Fastethernet 0/15 – 16 switchport access vlan 2

Since VLAN 2 doesn’t exist it is created. Ports 15 and 16 are included in the VLAN 2. VLAN 1 is a default VLAN where all ports initially are and is reserved.

VLAN Trunking

Now lets consider situation when you have hosts from one network connected to two switches. It’s rare, but possible. For example you have a network with 100 Mbit devices (tape library, UPS NMC) and 1000 Mbit devices (storage, servers) and you don’t want to waste 1000 ports on 100 devices and connect them to a second 100 Mbit switch. Now when host from one switch sends packet to the unknown host from another switch (or send a broadcast frame) and packet is flooded, switch on the other side needs to know what VLAN it goes from. Otherwise, switch has to discard it, since it floods frames only inside VLANs and VLAN ID is unknown in this case. Here you need to configure the link between the switches as trunk. It means that before sending the packet switch will mark it with VLAN ID and the other switch will forward it only to ports from this VLAN. There are two VLAN trunking protocols: proprietary Cisco ISL (outdated) and IEEE 802.1Q (most used). By default Cisco switches are configured to negotiate to use trunking if asked to do so. But you need to configure switch from either side to initiate negotiating:

configure terminal interface gigabit 0/1 switchport mode dynamic desirable

Rationale behind trunking

Networks splitted between switches is not that frequent case. Say, you want to use VLANs for security and/or efficiency reasons but each particular network is bounded to one switch.  All broadcast and unicast traffic to hosts within the same network do not travel outside the switch where it is connected. And unicast traffic to other networks can travel right to the router (according to basic routing rules) and from the router down to the particular host. Corresponding port where destination host is connected can be identified using destination MAC. It seems that nobody needs to know VLAN IDs in this case. And the question is: “Do you need trunking here?”. And the answer is – yes.

It’s worth starting by saying that ports on Cisco switch can be either access – where end hosts are connected and trunk – links between switches or routers. So when packet travel through trunk port it’s marked using tag by design. There are several reasons behind that. The most simple answer to this question is ARP requests. When router receives packet to route to another network it first needs to know MAC of the destination host. To find it out, router sends ARP request which is a broadcast packet. If there is no VLAN tag on this ARP request it would have to be flooded on all ports on all switches along the path to the destination. And it would break VLAN concept in its core – broadcast traffic has to be limited to the particular VLAN.

Another reason for marking each packet with VLAN ID is efficiency. When switch receives packet and looks up for destination in its MAC address table it’s faster to find MAC, when MAC addresses are grouped by VLAN ID. Switch doesn’t need to look through all MACs, but only those which are in the same VLAN.

In fact, there are many other reasons for using VLAN tags by default. I gave two, which answer the question without digging into details.

VLAN Trunking Protocol

There is an another Cisco proprietary feature called VTP. VTP exchanges information about VLAN IDs and names. It means you configure particular VLAN once on one switch and then all switches will pull this information from it. Not frequently used feature, so I won’t describe it in detail.

Initial Cisco switch configuration

June 28, 2012

First steps you need to do when you unpack your Cisco switch, for example Catalyst 2960, are configuring passwords and IP access via telnet and ssh. Cisco networking switches and routers have two primary operation modes: User (unprivileged) and Enable (privileged). In User mode you can simply look around, but in Enable mode you can reboot a switch, change configuration info, as well as screw everything up. You are safe in User mode. Switch also has tons of hierarchical configuration modes where you perform actual configuration.

Switch has three passwords: two for User mode (for connection from serial console and for external telnet and ssh connections) and one for Enable mode. Here is how you configure passwords after you unpack your switch and connect the serial cable.

Enter configuration mode:

enable
configure terminal

Configure console password:

line console 0
password pass1
login
exit

Configure ssh and telnet password:

line vty 0 15
password pass2
login
exit

Configure Enable password:

enable secret pass3
exit

‘login’ command tells switch to ask for User mode password. It doesn’t do that by default. Switch has 16 virtual (ssh and telent) consoles, that is why you see ‘0 15’ range in ‘line vty 0 15’ command.

Now to get IP access to the switch you need to configure so-called ‘VLAN 1 interface’:

enable
configure terminal
interface vlan 1
ip address 192.168.1.200 255.255.255.0
no shutdown
exit

ip default-gateway 192.168.1.1
exit

VLANs are not subject of this topic. But to make it a bit more clear, VLAN 1 is a special VLAN where all switch ports are connected. It’s done so that you could connect to the switch by telnet/ssh from any port. ‘no shutdown’ command here brings interface up. It’s disabled by default.

After you’ve made an initial configuration, your changes are active but not saved. After a reload you will have empty switch configuration. To save the configuration changes run:

copy running-config startup-config

Cheers!

Switching Logic

June 8, 2012

If you are a junior admin in a small to medium organization then building campus network is simple. Buy several switches, connect desktops and switches together and that’s it. You don’t need any additional configuration, all switches work right out of the box. However, it’s important to understand how packet switching work to troubleshoot problems that can show up later in your work.

Switching works on TCP/IP Layer 2. It means that networking hardware logic operates with MAC addresses. Each time switch receives a packet from any workstation or server it remembers its MAC address and port it was received from. It’s called MAC address or switching table. When somebody wants to send a packet to an other host with particular IP address he sends an ARP request packet. Like tell me who has 12.34.56.78 IP address. Host replies with its MAC address and sender can form a package to it.

Initially switch has empty switching table and does not know where to send packets. When switch doesn’t have particular MAC address in its table it forwards (floods) the packet to all ports. If the next switch doesn’t know this MAC, it further forwards the packet. When packet finally reaches its destination, host answers and switch adds its MAC address into the table.

If you don’t use VLANs, all switches in your network form a broadcast domain. It means that when host sends a broadcast message, ARP request for example, and host with this IP address is powered off then this ARP request will traverse the whole network. It’s important to bear in mind that if you have many hosts in your network, broadcast messages can eventually slow it down. VLANs are usually a solution here.

Basic TCP operation

June 6, 2012

The main purpose of TCP is error recovery and flow control. TCP is a connection-oriented protocol. It means that before sending any data it establishes connection and terminates it upon completion.

During connection establishment server and client agree upon sequence and acknowledgment numbers. Implicitly client also notifies server of its source port. Sequence is a characteristic of TCP data segment. Sequence starts with a random number and then each time a new packet is sent, sequence is incremented by the number of bytes sent in the previous TCP segment. Acknowledgment segment is almost the same but from the receiver side. It does not contain data and is equal to sender’s sequence number incremented by the number of bytes received (you will see example below). ACK segment acknowledges that host has received sent data.

Client-server handshake is performed in three steps:

  1. Client sends packet to the server with the SYN flag set, indicating that it’s willing to establish a connection. Client sets its sequence to a random number and sends the segment to the server.
  2. Server acknowledges that it agrees to establish connection, sets its sequence to a random number, acknowledgment to the client sequence + 1 and send them to the client.
  3. In the third message client sets its acknowledgment to the server’s sequence + 1 and send back to the server.

Now when both client and server know each other’s sequence and acknowledgment numbers, they can start sending data. Here it’s important to point out that TCP uses “windows” to send data. Window essentially is a number of bytes host can send before it receives acknowledgment from the recipient. Lets say window equals 3000 and server sends three segments 1000 bytes each. Initially we pick random SEQ number which equals to 1000 and increment it by 1000, which is the segment size, with each next segment. When client has received all three packets it answers with the ACK equal to the last SEQ number + size of the last packet. And so on. If no errors occur receiver usually increases its window.

Finally, when PC1 wants to close the connection it sends a FIN segment. PC2 on the opposite side notifies the application that the connection is closing.  But since it takes some time for the application to complete its operation, PC2 sends an ACK to the PC1, to notify it of an agreement to finish the connection. Otherwise after a timeout PC1 will continue to retransmit the FIN segment thinking that it has been lost. When application is terminated PC2 sends its FIN segment, PC1 replies with ACK and connection is closed.

PS: All rights to the pictures go to Wendell Odom