Posts Tagged ‘RP’

RecoverPoint VE: Common Deployment Issues

April 19, 2016

fixIn one of my previous posts I discussed iSCSI connectivity considerations when deploying RecoverPoint VE. In this post I want to describe common issues you may encounter when deploying RecoverPoint clusters, most of which are applicable to both physical appliance and virtual editions.

VNX MirrorView ports

I already touched on that briefly in my previous post. But it’s worth mentioning again that you can NOT use MirrorView ports for iSCSI connectivity between RPAs and VNX arrays. When you try to use a MirrorView iSCSI port for RecoverPoint, it gets upset and doesn’t communicate with the array.

If you make a mistake of connecting one port per SP and this port is a MirrorView port, you will have no communication with the array at all and get the following error in Unisphere for RecoverPoint:

Error Splitter ARRAYNAME-A is down
Error Splitter ARRAYNAME-B is down

splitter_error

If you connect two ports per SP, one of which is MirrorView port and use two iSCSI network subnets you may get the following error when running a SAN connectivity test from the RPA boxmgmt interface. In this case RPA can communicate with the array only over one subnet:

On array ABCD1234567890, all paths for device with UID=0x1234567890abcdef go through RPA Ethernet port eth2 …

multipathing_issue

The solution is as simple as moving the link from port 0 to port 1 on a 10Gb I/O module. And from port 0 to port 1,2 or 3 on a 1Gb I/O module.

If you don’t want to lose two iSCSI ports (1 per SP), especially if it’s 10Gb, and you’re not using MirrorView, you can uninstall MirrorView enabler from the array. Just keep in mind that it will require an array reboot. Service processors will be rebooted one at a time, so there is no downtime. But if it’s a heavily used storage array it’s recommended to schedule uninstallation out of hours to minimize the impact.

Error when redeploying a cluster

If you’ve made configuration mistakes while deploying a RecoverPoint cluster and want to blow the whole thing away and redeploy it from scratch you may encounter the following error when deploying for the second time:

VNX path set with IP 10.10.10.1 already exists in a different path set (RP_0x123abc456def789g_0_iSCSI1)

rpa_redeploy

The cause of the issue is iSCSI sessions which stayed on the VNX after you deleted RPA VMs. You need to connect to the VNX and delete them in Unisphere manually by right-clicking on the storage array name on the dashboard and selecting iSCSI > Connections Between Storage Systems. This is what duplicate sessions look like:

duplicate_rp

As you can see there’re three sets of RecoverPoint cluster iSCSI connections after three unsuccessful attempts.

You will need to delete old sessions before you are able to proceed with the deployment in RecoverPoint Deployment Manager.

Wrong initiator names

I’ve seen this on multiple occasions when RecoverPoint registers initiators on VNX with inconsistent hostnames.

As you’ve seen on the screenshots above, hostname field of every initiator consists of the cluster ID and RPA ID (not sure what the third field means), such as this:

RP_0x123abc456def789g_1_0

In this example you can see that RPA1 has two hostnames with suffixes _0_0 and _1_0.

wrong_initiators

This issue is purely cosmetic and doesn’t affect RecoverPoint operation, but if you want to fix it you will need to restart Management Servers on VNX service processors. It’s a non-disruptive procedure and can be performed by opening the following link http://SP_IP/setup and clicking on “Restart Management Server” button.

After a restart, array will update hostnames to reflect the actual configuration.

Joining two clusters with the licences already applied

This is just not going to work. Make sure to join production and DR clusters before applying RecoverPoint licences or Deployment Manager “Connect Cluster” wizard will fail.

It’s one of the prerequisites specified in RecoverPoint “Installation and Deployment Guide”:

If you plan to connect the new cluster immediately after preparing it for connection,
ensure:

  • You do not install a license in, or modify the settings of, the new cluster before
    connecting it to the existing system.

Conclusion

There’re always much more things that can potentially go wrong. But if any of the above helped you to solve your RecoverPoint deployment issues make sure to let me know in the comments below!

Advertisements

How STP and RSTP converge

July 20, 2012

In my previous post I described how STP works in normal circumstances. Each 2 seconds root switch sends BPDU Hello packets on all of its ports (since they are all designated) with cost to reach the root which is equal to 0, with root ID (RID) equal to root switch ID and bridge ID equal to ID of the sending switch, which in this case is the same as RID. When non-root switch receives Hello BPDU from its root port (RP) it adds its cost to reach the root, changes BID and send further. Now, what happens if a switch’s link with the shortest path to reach the root fails? STP starts to converge.

STP convergence process

Switch waits for the Max Age time before considering link as failed.  Max Age timer is equal to 10 times of Hello timer. And time between Hellos is usually 2 seconds.  First step in convergence process is re-evaluating a root switch. If the original root switch still has connection to the network, then the switch in question will receive Hello BPDU from it and nothing will change. Otherwise switches will elect a new root.

Next, switch needs to choose new RP. It’s simple. Look through costs to reach the root of all available links and choose the cheapest. Additionally, switch selects which ports are now DPs.

After the port roles are identified, switch transition RP from Blocking state to Forwarding. However, it implies two transitional states: Listening and Learning. Listening state is 15 seconds and is necessary for old MAC table entries to timeout. Otherwise temporary loops are possible. In Learning state switch begins to gather MAC addresses from received packets (for the same 15 seconds). In Listening and Learning states switch do not forward packets. After both transitional states have been finished, port is transitioned to a forwarding state. So during STP convergence, port can be inaccessible for 50 seconds.

RSTP convergence

The key difference between STP and RSTP is rapid convergence of the latter. Hence the name Rapid STP. First of all, RSTP waits for 3 times of Hello timer. So it’s 6 seconds instead of 20. Apart from that, when RP link fails RSTP block all its ports, eliminating loops. It means that Listening state is not needed in this case, which saves us another 15 seconds. And in Learning phase switch sends RSTP proposal message to the neighboring switch right away. And quickly receives agreement, which implies that link is established and is in Forwarding state. As a result, RSTP convergence time is shortened from 50 seconds to 1-10 seconds timeframe.

Spanning Tree Protocol Overview

July 16, 2012

When it comes to switching it is recommended to understand how STP works. STP was developed to prevent loops. For example, you connect 3 switches in a ring, some host sends a broadcast packet. Since broadcast packet is flooded to all ports (forget about VLANs for a moment) it will travel several times around the ring until its TTL is equal to 0. This situation will never happen if you work on Cisco switches. They have STP enabled by default. Some low-budget switches do not support STP at all.

To prevent loops STP disables some ports or in other words put them in a blocking state. Ports that are left to forward traffic are in a forwarding state. To exchange STP information switches use Bridge Protocol Data Units (BPDU). They contain three main fields: root switch ID, sender switch ID and cost to reach the root. ID is almost random and are based on priorities and MACs. Cost depends on link speed. 100Mb port’s priority equals to 19, 1Gb is 4, etc.

STP starts from electing a root switch. All switches exchange their IDs and switch with the lowest ID becomes a root switch. As stated above root switch is almost a random choice, but you can manually assign priority if needed. Then spanning tree algorithm (STA) searches for root ports (RP) and designated ports (DP). RP is a port with the shortest path to the root switch. Shortest path is founded based on link weights and if they are equal on switch IDs. DP is a port with the lowest cost to the root on that Ethernet segment. Ethernet segment here is a collision domain, which in its turn in switched network is simply an Ethernet link between two switches. Basically, that means that you will have one shortest path from each non-root switch to the root switch. On one side of each link will be a RP and on the other a DP port. All non-shortest paths will have DP on one side and non-DP non-RP  (blocked) port on the other side. Traffic will not traverse through this port to prevent loops.

You may ask, what’s the point of such distinction between DP and RP in this concept if the only thing that matters is the shortest path. Even though RP and DP lies on the shortest path to the root, just from the opposite sides, there is one significant distinction between them. DP is the port from which Hello BPDUs are continuously sent. Hello BPDU simply indicates that link between switches is working and contains information which allows switch on the other side of the link to find the new shortest path to the root in case an old link brakes. Another difference is that DPs exist not only on root paths, but on each of the Ethernet links.

Along with STP, there is a RSTP, which stands for Rapid Spanning Tree Protocol. The reason for RSTP is that STP converges slowly. Convergence is a process which happens when network topology changes and switches need to reevaluate port statuses (blocking/forwarding). STP converges for approximately 50 seconds. RSTP convergence time is 1 to 10 seconds.

STP and RSTP have several implementations. Cisco by default uses PVST+ (or simply PVST) which is an abbrevation for Per-VLAN Spanning Tree Plus, instead o pure IEEE’s STP. PVST creates one STP topology per VLAN. Instead of using one link for all VLANs and block all other links, you can use first link for even VLANs and second for odd. PVST allows you to do that. Cisco’s implementation of RSTP is called PVRST (Per-VLAN Rapid Spanning Tree) or RPVST (Rapid Per-VLAN Spanning Tree). There is an IEEE implementation of protocol similar to PVRST. It’s called MIST – Multiple Instances of Spanning Trees. MIST is an implementation of RSTP. MIST’s difference from PVRST is that it doesn’t create separate STP for each VLAN as PVRST does by design, but lets you create one STP for multiple VLANs.