Posts Tagged ‘loop’

Beginner’s Guide to HPE 5000 Series Switches

October 14, 2017

I don’t closely track the popularity of my blog. If what I share helps people in their day to day job, it’s already good enough to me. But I do look at site statistics now and then just out of curiosity and it seems that network-related posts get a lot of popularity. A blog post I wrote a while ago on Dell N4000 switches has quickly got in top five over the last year.

So it seems that there is a demand for entry-level switch configuration guides. I’ve worked with a quite a few different switch brands over the years, so I thought I will build on the success of the Dell blog post and this time write about HPE FlexNetwork/FlexFabric 5000 switch series.

Operating Systems

HPE has several network switch product lines. I won’t even try to cover all of them in this post. But it’s important to know that there are a few different operating systems you can encounter, while working with HPE network switches. There is a familiar ProCurve product portfolio (now merged with Aruba), which is based on ProVision operating system.

HPE FlexNetwork/FlexFabric 5000 series, on the other hand, is based on Comware operating system. It has a different CLI command set and can be a complete surprise if you’ve worked only with ProCurve switches before. So this blog post will be particularly valuable for those who’re dealing with HPE 5000 for the first time.

The following guide has been tested on a pair of HPE FlexFabric 5700-series switches. Even though commands are mostly the same, on other switch series, like FlexNetwork 5800, there might be some minor differences.

Initial Configuration

When the switch is booted for the first time it will start automatic configuration by trying to obtain settings over DHCP, which you can interrupt by Ctrl+C to get straight to CLI.

You start in user view where you can run display commands to review switch settings. To start the configuration, change to system view:

> system-view

Let’s start by configuring remote access to the switch. There are two ways you can do that. You either use the out-of-band management port:

> interface M-GigabitEthernet 0/0/0
> ip address
> ip route-static

Or you can configure a VLAN interface IP address:

> interface vlan-interface 1
> ip address
> ip route-static

Then configure switch name, enable SSH, set passwords and you can start managing the switch over SSH:

> sysname switchname

> public-key local create rsa
> ssh server enable
> user-interface vty 0 15
> authentication-mode scheme
> protocol inbound ssh

> super password simple yourpassword
> local-user admin
> password simple yourpassword
> authorization-attribute user-role level-0
> service-type ssh

User “admin” will have an unprivileged role. You will need to run the following command and enter password once logged in, to elevate to network admin rights:

> super

Intelligent Resilient Framework

In small non-business-critical environments one standalone switch is usually sufficient. In larger environments switches are typically deployed in pairs for redundancy. To simplify management and to avoid network loops most switches support some sort of MLAG or stacking. IRF is HPE’s version of it.

Determine what ports you’re going to use for IRF. There are two QSFP+ ports on 5700-series dedicated for it. And then on on the first switch (master) run the following commands (it’s recommended to shut down the ports before you set them up as IRF):

> irf member 1 priority 32
> int range FortyGigE 1/0/41 to FortyGigE 1/0/42
> shutdown
> irf-port 1/1
> port group interface FortyGigE 1/0/41
> irf-port 1/2
> port group interface FortyGigE 1/0/42
> int range FortyGigE 1/0/41 to FortyGigE 1/0/42
> undo shut
> save
> irf-port-configuration active

On the second switch (slave) run the following commands to change the IRF ID to 2:

> irf member 1 renumber 2
> reboot

When the switch comes up, configure IRF ports:

> irf member 2 priority 30
> int range FortyGigE 2/0/41 to FortyGigE 2/0/42
> shutdown
> irf-port 2/1
> port group interface FortyGigE 2/0/41
> irf-port 2/2
> port group interface FortyGigE 2/0/42
> int range FortyGigE 2/0/41 to FortyGigE 2/0/42
> undo shut
> save
> irf-port-configuration active

Now you can connect the physical IRF ports. IRF is a ring topology, that means (in my case) port 1/0/41 should connect to 2/0/42 and port 1/0/42 should connect to 2/0/41.

Second switch will automatically reboot and if all is configured correctly, you should see both switches join the IRF fabric. Member switch 1 has the highest priority of 32 and becomes the master:

> display irf

Firmware Upgrade

Firmware upgrade is the next logical step after you set up IRF. The latest firmware revision for the switches can be download from HPE web-site. Keep in mind you will need a HPE passport account, with a valid service agreement (SAID) added to it.

You will also need a TFTP server to upgrade the firmware. There are a few of them out there, but the most commonly used is probably Tftpd64.

When you get the TFTP server up and running and copy the firmware file to it, perform an upgrade:

> tftp get 5700-CMW710-R2432P03.ipe
> boot-loader file flash:/5700-CMW710-R2432P03.ipe slot 1 main
> boot-loader file flash:/5700-CMW710-R2432P03.ipe slot 2 main
> irf auto-update enable
> reboot

Confirm firmware has been updated:

> display version

VLANs, Aggregation Groups and Tagging

In Comware the term “aggregation group” is used to describe what is a “port channel” in Cisco world. Trunk/access ports are also called tagged/untagged ports throughout the documentation.

In this section we will discuss a few common port configuration scenarios:

  • Untagged ports, which can be your iSCSI storage array ports
  • Tagged ports, such as your VMware host uplinks
  • Aggregation groups, typically used for LAGs to upstream switches

First of all create all VLANs and give them descriptions:

> vlan 10
> description iSCSI
> vlan 20
> description Server
> vlan 30
> description Dev and test

Then specify untagged ports:

> vlan 10
> port te 1/0/1
> port te 2/0/1

To configure tagged ports and allow certain VLANs (ports will be added to the VLANs automatically):

> int te 1/0/2
> description ESX01 vmnic0
> port link-type trunk
> port trunk permit vlan 20 30
> int te 2/0/2
> description ESX02 vmnic0
> port link-type trunk
> port trunk permit vlan 20 30

And to create an LACP aggregation group:

> interface bridge-aggregation 1
> description Trunk to upstream switch
> link-aggregation mode dynamic
> port link-type trunk
> port trunk permit vlan 20 30

> interface te 1/0/3
> port link-aggregation group 1
> interface te 2/0/3
> port link-aggregation group 1

Common Commands

Other useful commands that don’t fall under any specific category, but handy to know.

Display switch configuration:

> display current-configuration

Save switch configuration:

> save

Shut down a port:

> int te 1/0/27
> shutdown

Undo a command:

> undo shutdown


Whether you are a network engineer new to the Comware operating system or a VMware administrator looking for a quick cheat sheet for FlexNetwork/FlexFabric switches, I hope this guide has helped you get the job done.

If this blog post gets the same amount of popularity, maybe it will turn into another series. But for now – over and out.


EIGRP enhancements

August 19, 2012

Enhanced Interior Gateway Routing Protocol (EIGRP) is a Cisco proprietary IGP. So if you have several vendors inside your corporate LAN like HP or Juniper then it’s probably not your choice. However, EIGRP has several enhancements that make it even faster in convergence time in comparison to OSPF.

One of the main drawbacks of OSPF is that it consumes considerable amount of memory to maintain LSDB and CPU power to run Dijkstra on it. EIGRP doesn’t do that. Routers with EIGRP enabled on their interfaces exchange only partial information with their neighbors, as OSPF does. But EIGRP routers don’t maintain the whole topology. On that matters they behave more like RIP. Each router holds information about networks and next hop routers to reach them. But unlike RIP, for each network EIGRP finds primary and secondary (if possible) routes. So that in case of link failure router could immediately switch to the backup route. In EIGRP terminology main route is called successor route and alternative route is feasible successor route.

Also, EIGRP has more sophisticated metric calculation. It considers not only bandwidth, but also delay. The formula is:

metric  = (10^7 / least-bandwidth + cumulative-delay) * 256

Here least-bandwidth is the slowest link speed in kbps along the path and cumulative-delay is sum of all delays from the network to the router in tens of microseconds.

To understand how EIGRP preventsloops there is a need for another two terms. Feasible Distance (FD) is a metric of the best route to reach a subnet, as calculated on a router. And Reported Distance (RD) is a metric as calculated on a neighboring router and then reported and learned in an EIGRP update. The trick here is that route can be a feasible successor route only if its RD is less than FD. It guarantees that this route doesn’t go through this router. Because otherwise it would obviously be greater than FD.

Again, EIGRP is better IGP from all perspectives. The only barrier that restricts its proliferation is proprietary nature of the protocol.

OSPF comparison with RIP

August 19, 2012

Problems with RIP

RIP is a very basic routing protocol with slow convergence time and primitive best route computation based on the number of hops. Router configured to use RIP, sends route updates to its neighbors every 30 seconds. If you have many routers in your network, which is quite common with modern Layer 2/3 switches, then each time you reconfigure routes, changes propagate for unacceptable amount of time. In worst case each router waits for 30 seconds to send an update to the next router in a chain. Network failures make things even worse. Router considers link as failed if it doesn’t receive updates from it for 180 seconds. Then RIP uses a number of loop avoidance techniques to advertise the failed route. For the end user it means network is unreachable for ages in networking terms. More or less critical infrastructures cannot tolerate such delays. Additionally, RIP calculates best route depending on the hop count to the network and doesn’t account for link speeds, which sometimes becomes inappropriate.

OSPF Solution

Open Shortest Path First (OSPF) protocol was developed to solve RIP’s problems. Neighbor routers in OSPF send topology changes to each other immediately. It became achievable because OSPF sends only changes, not all routes as RIP does. In OSPF routers maintain a so called Link-State Database (LSDB), which contains Link-State Advertisements (LSA). In fact, LSDB doesn’t contain routes themselves, but topology. LSA is either a link record, which has information about a subnet and routers connected to it, or router record which contains information on router’s IPs and masks. Each link in OSPF has a metric. Metrics are weighted based on link speeds. Then OSPF needs to calculate shortest paths and fill routing table. Dijkstra Shortest Path First (SPF) algorithm is applied to LSDB to find best routes.

Link failures is another story. Link failure timer in OSPF is 40 seconds, in comparison to 180 for RIP. But the main issue is that there are a number of routing loop problems inherent to RIP. On link failures RIP uses loop avoidance features, such as “split horizon”, “route poisoning”, “poison reverse”, as well as holddown timer, which take considerable amount of time for RIP to converge. In OSPF routers avoid loops by first asking its neighbors if they lack any LSAs. If router has all LSAs in its LSDB, neighbors do not exchange any information. This allows OSPF to converge much more quickly.

Spanning Tree Protocol Overview

July 16, 2012

When it comes to switching it is recommended to understand how STP works. STP was developed to prevent loops. For example, you connect 3 switches in a ring, some host sends a broadcast packet. Since broadcast packet is flooded to all ports (forget about VLANs for a moment) it will travel several times around the ring until its TTL is equal to 0. This situation will never happen if you work on Cisco switches. They have STP enabled by default. Some low-budget switches do not support STP at all.

To prevent loops STP disables some ports or in other words put them in a blocking state. Ports that are left to forward traffic are in a forwarding state. To exchange STP information switches use Bridge Protocol Data Units (BPDU). They contain three main fields: root switch ID, sender switch ID and cost to reach the root. ID is almost random and are based on priorities and MACs. Cost depends on link speed. 100Mb port’s priority equals to 19, 1Gb is 4, etc.

STP starts from electing a root switch. All switches exchange their IDs and switch with the lowest ID becomes a root switch. As stated above root switch is almost a random choice, but you can manually assign priority if needed. Then spanning tree algorithm (STA) searches for root ports (RP) and designated ports (DP). RP is a port with the shortest path to the root switch. Shortest path is founded based on link weights and if they are equal on switch IDs. DP is a port with the lowest cost to the root on that Ethernet segment. Ethernet segment here is a collision domain, which in its turn in switched network is simply an Ethernet link between two switches. Basically, that means that you will have one shortest path from each non-root switch to the root switch. On one side of each link will be a RP and on the other a DP port. All non-shortest paths will have DP on one side and non-DP non-RP  (blocked) port on the other side. Traffic will not traverse through this port to prevent loops.

You may ask, what’s the point of such distinction between DP and RP in this concept if the only thing that matters is the shortest path. Even though RP and DP lies on the shortest path to the root, just from the opposite sides, there is one significant distinction between them. DP is the port from which Hello BPDUs are continuously sent. Hello BPDU simply indicates that link between switches is working and contains information which allows switch on the other side of the link to find the new shortest path to the root in case an old link brakes. Another difference is that DPs exist not only on root paths, but on each of the Ethernet links.

Along with STP, there is a RSTP, which stands for Rapid Spanning Tree Protocol. The reason for RSTP is that STP converges slowly. Convergence is a process which happens when network topology changes and switches need to reevaluate port statuses (blocking/forwarding). STP converges for approximately 50 seconds. RSTP convergence time is 1 to 10 seconds.

STP and RSTP have several implementations. Cisco by default uses PVST+ (or simply PVST) which is an abbrevation for Per-VLAN Spanning Tree Plus, instead o pure IEEE’s STP. PVST creates one STP topology per VLAN. Instead of using one link for all VLANs and block all other links, you can use first link for even VLANs and second for odd. PVST allows you to do that. Cisco’s implementation of RSTP is called PVRST (Per-VLAN Rapid Spanning Tree) or RPVST (Rapid Per-VLAN Spanning Tree). There is an IEEE implementation of protocol similar to PVRST. It’s called MIST – Multiple Instances of Spanning Trees. MIST is an implementation of RSTP. MIST’s difference from PVRST is that it doesn’t create separate STP for each VLAN as PVRST does by design, but lets you create one STP for multiple VLANs.

NetApp Active/Active Cabling

October 9, 2011

Cabling for active/active NetApp cluster is defined in Active/Active Configuration Guide. It’s described in detail but may be rather confusing for beginners.

First of all we use old DATA ONTAP 7.2.3. Much has changed since it’s release, particularly in disk shelves design. If documentation says:

If your disk shelf modules have terminate switches, set the terminate switches to Off on all but the last disk shelf in loop 1, and set the terminate switch on the last disk shelf to On.

You can be pretty much confident that you won’t have any “terminate switches”. Just skip this step.

To configuration types. We have two NetApp Filers and four disk shelves – two FC and two SATA. You can connect them in several ways.

First way is making two stacks (formely loops) each will be built from shelves of the same type. Each filer will own its stack. This configuration also allows you to implement multipathing. Lets take a look at picture from NetApp flyer:

Solid blue lines show primary connection. Appliance X (AX) 0a port is connected to Appliance X Disk Shelf 1 (AXDS1) A-In port, AXDS1 A-Out port is connected to AXDS2 A-In port. This comprises first stack. Then AY 0c port is connected to AYDS1 A-In port, AYDS1 A-Out port is connected to AYDS2 A-In port. This comprises second stack. If you leave it this way you will have to fully separate stacks.

If you want to implement active/active cluster you should do the same for B channels. As you can see in the picture AX 0c port is connected to AYDS1 B-In port, AYDS1 B-Out port is connected to AYDS2 B-In port. Then AY 0a port is connected to AXDS1 B-In port, AXDS1 B-Out port is connected to AXDS2 B-In port. Now both filers are connected to both stacks and in case of one filer failure the other can takeover.

Now we have four additional free ports: A-Out and B-Out in AXDS2 and AYDS2. You can use these ports for multipathing. Connect AX 0d to AXDS2 B-Out, AYe0d to AXDS2 A-Out, AX 0b to AYDS2 A-Out and AY 0b to AXDS2 B-Out. Now if disk shelf module, connection, or host bus adapter fails there is also a redundant path.

Second way which we implemented assumes that each filer owns one FC and one SATA disk shelf. It requires four loops instead of two, because FC and SATA shelves can’t be mixed in one loop. The shortcoming of such configuration is inability to implement multipathing, because each Filer has only four ports and each of it will be used for its own loop.

This time cabling is simpler. AX 0a is connected to AXDS1 A-In, AX 0b is connected to AYDS1 A-In, AY 0a is connected to AXDS2 A-In, AY 0b is connected to AYDS2 A-In. And to implement clustering you need to connect AX 0c to AXDS2 B-In, AX 0d to AYDS2 B-In, AYe0c to AXDS1 B-In and AY 0d to AYDS1 B-In.

Also I need to mention hardware and software disk ownership. In older system ownership was defined by cable connections. Filer which had connection to shelf A-In port owned all disks in this shelf or stack if there were other shelves daisy chained to channel A. Our FAS3020 for instance already supports software ownership where you can assign any disk to any filer in cluster. It means that it doesn’t matter now which port you use for connection – A-In or B-In. You can reassign disks during configuration.