Posts Tagged ‘IBM’

Windows MPIO with IBM storage

September 17, 2012

IBM mid-range storage systems (like DS3950) work in active/passive mode. It means that access to each LUN is given through one controller, in constrast to active/active storage where data between host and two controllers can flow in round-robin fashion. So redundant path here is used only as a failover. Software which provides this failover functionality is called Multipath I/O (MPIO) and has implementations for all operating systems. I’ll desribe how to configure MPIO version for Windows.

Installation

Prior to Windows Server 2008, Microsoft didn’t have its own MPIO implementation and MPIO was distributed with IBM DS Storage Manager product. Now you can install MPIO from “Feautures” sub-menu of Windows Server 2008 Server Manager. After installation is complete you will find MPIO configuration options under Control Panel and in Administrative Tools.

IBM storage works well with default Windows MPIO implementation, however it’s recommended to install IBM MPIO (device-specific module) from Storage Manager installation bundle. In my case MPIO installation file was called SMIA-WSX64-01.03.0305.0608.

Enable multipathing

Initially you will see two hard drives for each LUN in Device Manager. You can enable MPIO for particular hardware ID (in other words, storage system) on Discover Multi-Paths tab of MPIO control panel. You can’t do that with LUN granularity. After you add selected devices and reboot, you will see them on “MPIO Devices” tab. Now each LUN will be seen as a single hard drive in Device Manager.

Configure preferred path

MPIO supports several load-balancing policies, which are configured on a LUN basis from MPIO tab of a hard drive in Device Manager. As a Load Balance Policy select Fail Over Only. Then for each path select which is Active/Optimized and which is a Standby path. Also make active path Preferred, so that after failover it failbacks to it.

Don’t be confused by iSCSI on the figure. It’s the same for pure FC. It’s just for reference.

Check configuration

When you configure active and passive paths you assume that first path listed is to controller A and second path is to controller B. But, in fact, there is no indication of that from the configuration page and you can neither confirm nor deny it. The only ID you see is adapter ports but they don’t even map to the actual ports on HBAs.

To be able to check your configuration you need to install IBM SMdevices utility which comes with IBM DS Storage Manager. Run DS SM installation and go for Custom Installation. There you need to check only the Utilities part. In SMdevices output you can see which path is preferred for this LUN and if it’s configured as active (In Use):

C:\Program Files\IBM_DS\util>SMdevices
IBM System Storage DS Storage Manager Devices
. . .
\\.\PHYSICALDRIVE1 [Storage Subsystem ITSO5300, Logical
Drive 1, LUN 0, Logical Drive ID
<600a0b80002904de000005a246d82e17>, Preferred Path
(Controller-A): In Use]

References

The best reference I found on that topic is IBM Midrange System Storage Hardware Guide (SG24-7676-01), from p.453: DS5000 logical drive representation in Windows Server 2008. As well as Installing and Configuring MPIO guide from Microsoft.

Advertisements

IBM DS4700 copyback failed

August 27, 2012

If you have a global hot spare (GHS) drive when one of the active hard drives failes, your data is reconstructed to a GHS. Then, when you replace the failed drive, storage system automatically initiates a copyback, which gets the data from the GHS back to the replacement drive. Sometimes it doesn’t happen and replacement drive stays in an Unassigned state. If it is the case go to the DS Storage Manager, right click on the RAID array and select Replace Drives. There you should see the failed drive. Choose replacement from unassigned drives and click Replace Drive. Copyback will start immediately.

Take into consideration that copyback can be long-lasting, depending on the array size. If it is a production system and its performance is critical, right click on the logical drive, choose Change -> Modification Priority. There you can set how much resources will be allocated for modification (such as copyback, reconstruction, etc) and performance. Change it to Low for maximum performance.

Configuring remote access to AIX

May 16, 2012

I work on an old AIX 5.1:

# oslevel -r
5100-03

By default it has only telnet preinstalled. Which works out of the box without additional configuration. However, there are several recommended steps to do.

Telnet

Firstly check if you have stable network connection. I had problems connecting to AIX box after connection timeout. It seemed that telnet session somehow hang on the OS side and didn’t allow me to reconnect. To prevent that, you have two options. If you use PuTTY then go to Settings->Connection and set amount of seconds between keepalive packets to say 60 seconds. And PuTTY will maintain connection automatically. Another workaround is to edit TMOUT variable in /etc/profile. By default AIX uses ksh shell which uses this parameter to detect idle sessions. If set this variable to 120, then after two minutes ksh will throw a warning that session will be closed in 60 seconds. This means that if your telnet session breaks, ksh will automatically terminate its shell. (I checked that and it turned out that TMOUT doesn’t help here.)

TCP Wrapper

By default telnet access in AIX is opened for everyone. It’s not what you want for sure. AIX has built-in firewall (called AIX TCP/IP Filters) but it’s rather cumbersome to use it just to restrict telnet access. I’d prefer TCP Wrapper, which is standard for Linux, but optional for AIX. You can get AIX LPP package from Bull AIX freeware site here: http://www.bullfreeware.com/index2.php?page=lppaix51. Then simply:

chmod +x tcp_wrappers-7.6.1.0.exe

Extract package contents by running the executable. Then run smit from directory where you extracted files and go to Software Installation and Maintenance -> Install and Update Software ->  Install Software. Set current directory in “INPUT device / directory for software”. You can view software available, if you press F4 in “SOFTWARE to install” field. Change “ACCEPT new license agreements?” to yes and press Enter.

When package is installed, edit /etc/inetd.conf. Find telnet line and change it:

#telnet stream tcp6 nowait root /usr/sbin/telnetd telnetd -a
telnet stream tcp6 nowait root /usr/local/bin/tcpd telnetd -a

And restart inetd service:

stopsrc -s inetd && startsrc -s inetd

Now to limit telnet access create /etc/hosts.allow:

telnetd: 123.234.123.234 234.123.234.123

and /etc/hosts.deny:

ALL:ALL

Secure Shell

Telnet is completely outdated and insecure protocol. So you’d probably prefer ssh on the server side. I believe SSH is bundled with AIX 5.1, but I simply downloaded it from Bull site. Additionally to OpenSSH package you will have to setup OpenSSL prerequisite. Here are the links:

http://www.bullfreeware.com/affichage.php?id=779
http://sourceforge.net/projects/openssh-aix/files/openssh-aix51/4.1p1/

Install OpenSSL simply by:

rpm -i openssl-0.9.7l-1.aix5.1.ppc.rpm

In case of OpenSSH you will need to gunzip it, untar it and setup using smit. But if you work on AIX with old maintenance level (ML3 in my case) you can run into the following error when running ssh service:

getnameinfo failed: Invalid argument

You can see it if you run sshd with -D and -d flags. Solution here is to download AIX 5.1 ML9 and POSTML9 fixes from IBM Fix Central, extract them and setup in Software Installation and Maintenance -> Install and Update Software ->  Update Installed Software to Latest Level (Update All).

SSH is a standalone service, so you do not need to edit /etc/inetd.conf. Just add new sshd line to /etc/hosts.allow and you are good to go. However, if your ssh was built without wrapper support, then you have a problem. You can check that by calling:

# dump -H /usr/sbin/sshd

/usr/sbin/sshd:

                        ***Loader Section***
                      Loader Header Information
VERSION#         #SYMtableENT     #RELOCent        LENidSTR
0x00000001       0x00000115       0x00000601       0x00000096

#IMPfilID        OFFidSTR         LENstrTBL        OFFstrTBL
0x00000006       0x00006224       0x0000075a       0x000062ba

                        ***Import File Strings***
INDEX  PATH                          BASE                MEMBER
0      /usr/lib:/lib:/opt/freeware/lib
1                                    libc.a              shr.o
2                                    libpthreads.a       shr_comm.o
3                                    libpthreads.a       shr_xpg5.o
4                                    libcrypto.a         libcrypto.so.0.9.7
5                                    libz.a              libz.so.1

If there is no libwrap.a, then the only option you have is to run sshd under tcpd which is run by inetd. To accomplish that add the first line into /etc/services and second into /etc/inetd.conf:

ssh 22/tcp
ssh stream tcp6 nowait root /usr/local/bin/tcpd sshd -i

Switch ‘-i’ tells sshd to generate smaller keys. Otherwise you will wait significant amount of time for login prompts. Also don’t forget to remove sshd startup and shutdown scripts from /etc/rc.d/rc2.d.

DB2 fails to start after promoting to DC

February 24, 2012

Our backup database server is now also an additional domain controller. After DC promotion DB2 failed to start with error:

No mapping between account names and security IDs was done.

It’s an expected behavior, since server removes all local users groups during promotion, including DB2ADMNS and DB2USERS. These groups are used for extended security and in case it’s enabled (which is default) you will experience these kinds of problems. If you don’t change these groups before promotion then you won’t be able to use db2extsec to change them gracefully after promotion because database just won’t start and all CLI commands won’t work.

To solve this problem you need to disable extended security by changing DB2_EXTSECURITY registry variable to NO in HKLM\ SOFTWARE\ IBM\ DB2\ GLOBAL_PROFILE and HKLM\ SOFTWARE\ IBM\ DB2\ InstalledCopies\ DB2COPY1\ GLOBAL_PROFILE. Then create DB2ADMNS and DB2USERS active directory groups and point to them using:

db2extsec -u mydom\db2users -a mydom\db2admns

Bear in mind that using domain groups for extended security is supported starting from DB2 version 9 Fix pack 2. If you’re using an older version then you will have to disable this feature.

shIT happens

February 22, 2012

After we moved our DB2 server from v9.7.3 x86_32 to v9.7.5 x86_64  (with server replacement) DB2 Storage Management has broken down. Tasks that make snapshots of tablespaces have stopped working with error:

CALL CAPTURE_STORAGEMGMT_INFO(2, ‘ ‘, ‘DATA_SPACE’)
SQL0443N Routine “SYSPROC.CAPTURE_STORAGEMGMT_INFO” (specific name “CAPT_STGMGMT_INF”) has returned an error SQLSTATE with diagnostic text “SQL0303”. SQLSTATE=38553

SQL0303N A value cannot be assigned to a host variable in the SELECT, VALUES, or FETCH statement because the data types are not compatible.

Looks like data type incompatibility. I tried dropping all SGMT_* tables, recreate them using CREATE_STORAGEMGMT_TABLES and make a snapshot. Same error.

I’ve googled everything related to this in the Internet. There are only two topics without any suggestions here and here.

It seems like this problem has no solution (IBM DB2 bug?) and we should just go with it.

Rollforward on a backup server to another db name

February 20, 2012

Sometimes it’s handy to restore a production database on a testing server to a particular point in time (PIT). This procedure involves several nuances.

First of all production databases usually have much more memory and larger buffer pools. If you have less memory (as in our case) on a testing server, then your database just won’t rollforward because it won’t be able to activate database which in its turn is due to inability to activate buffer pools. In this situation you also cannot alter buffer pools manually because you cannot connect to the database. Looks like a deadlock. Solution here is to use DB2_OVERRIDE_BPF registry variable with say 5000 pages. After a database restart all buffer pools will be of 5000 pages size. Then you will be able to alter your buffer pools. Don’t forget to unset DB2_OVERRIDE_BPF and restart after you finished all maintenance procedures. I do that inside a batch script:

db2set DB2_OVERRIDE_BPF=5000
db2stop force
db2start

db2cmd -w -c db2 -t -f db_rf_restore.db2 -z db_rf_restore.err

db2set DB2_OVERRIDE_BPF=
db2stop force
db2start

Actual restore is done inside db_rf_restore.db2 DB2 CLP script. Important point here is to use -w flag which will wait for script completion before moving on to the next command. And -c which will automatically close Command Window after script has been run. Otherwise you will need to do it by hand.

Another matter here is how to rollforward a database if you changed its name and moved it to another server. To accomplish that you will need to copy archival (don’t confuse them with active) logs from a production server to your testing server and use OVERFLOW LOG PATH (“D:\backup\tlogs”) clause in ROLLFORWARD DATABASE command to point to directory where you copied them to. Use also AND COMPLETE clause to finish rollforward process and turn off rollforward pending state.

Here is the complete restore script:

DROP DATABASE db_2;

CREATE DATABASE db_2 ON D:;

RESTORE DATABASE db
FROM “D:\backup”
TO D:
INTO db_2
WITH 2 BUFFERS
BUFFER 1024
PARALLELISM 1
COMPRLIB C:\SQLLIB\BIN\db2compr.dll
WITHOUT PROMPTING;

ROLLFORWARD DATABASE db_2
TO 2012-02-16-12.30.00.000000
AND COMPLETE
OVERFLOW LOG PATH (“D:\backup\tlogs”);

CONNECT TO db_2;
ALTER BUFFERPOOL data_pool SIZE 75000;
CONNECT RESET;

DB2 notifications

February 16, 2012

Notifications are used in two parts of DB2 – Health Center and Task Center. Configuration is pretty simple. Go to Control Center – Task Center – Tools – Contacts and click on SMTP Server button. Fill here the FQDN of your mail server.  Then add contacts. After that you can use them during scheduled task configuration on Notifications tab.

To configure Health Center notifications go to Control Center – Health Center – Health Center – Configure – Alert Notification. In Configure Health Alert Notification window select instance and add contacts to Health Notification Contact List.

It may look like this is it. However, there is one issue we ran into. Sometimes DB2 sends emails using just a host name, without the domain part, like db2admin@sqldb2. All modern servers won’t allow sender address not in FQDN format. So we had to add DB2 server IP into white list on our Postfix mail server using check_client_access parameter.

DB2 Monitor Heap Utilization alert

February 16, 2012

Several times a week we receive alerts from DB2 Health Center concerning utilization of heap memory consumed by Health Monitor. Here is the message from logs:

ADM10500E  Health indicator “Monitor Heap Utilization” (“db2.mon_heap_util”) breached the “upper” alarm threshold of “95 %” with value “200 %” on “instance” “DB2”.  Calculation: “((db2.mon_heap_cur_size/db2.mon_heap_max_size)*100);” = “((655360 / 327680) * 100)” = “200 %”.  History (Timestamp, Value, Formula): “()”

Even though MON_HEAP_SZ parameter of our DB2 instance is configured to be managed automatically alert still shows up. As it turned out it’s a minor DB2 bug. Which is described here. Short explanation of this bug follows:

Health Monitor is sending Alert to db2 user while Monitor Heap is set to Automatic.

So if in configuration parameters you see “AUTOMATIC” in Value column for MON_HEAP_SZ then you can safely ignore this alert. However, it’s not a good idea to have unresolved issues in Health Center. Besides, it’s rather annoying since it shows up quite regularly and bring our attention to it. The solution for this problem is to switch off monitoring of monitor these parameters by:

db2 update alert cfg for database manager using db2.mon_heap_util set THRESHOLDSCHECKED NO
db2 update alert cfg for databases using db.db_heap_util set THRESHOLDSCHECKED NO

What’s peculiar about this error is that according to IBM it was fixed in 9.5 and we work on 9.7 (regression bug?). On top of that, there is another bug JR31509 described here which is connected with the previous one. Short explanation:

The health indicator calculates the alarm threshold using db2.mon_heap_max_size, but this max size may not be increased even though the MON_HEAP_SZ is configured as AUTOMATIC.

That basically means that AUTOMATIC may not work properly and we might have memory issues in future. But I’ll keep my fingers crossed.

DB2 transaction logging

February 15, 2012

By default DB2 database uses circular logging. That means you have fixed number of log files which are used circularly. And each time previous file is full next file is erased and reused. It is normal for data warehousing or OLAP where you have fixed data set and only select data from it. In case of failure there are specific restore procedures which simply involves loading of this fixed data set back again. If you work on an OLTP database you have to use archival logging. Otherwise you won’t be able to restore your database to particular point in time if failure occurs. To configure archival logging you need to change several database configuration parameters:

  • LOGRETAIN – change it to Recovery if you want to switch to archival logging
  • LOGARCHMETH1 – point to directory where your log files will be kept

Some other useful parameters:

  • LOGPATH – points to your so called “active logs”. Database use them for immediate recovery needs and they aren’t meant to be used for roll-forward recovery.
  • LOGPRIMARY – number of log files that will be used for active logs
  • LOGFILSIZ – size of each log file

You can find more information here in detail and here in short.

Increasing DB2 buffer pools

February 15, 2012

Just a small tip on DB2 memory allocation. It’s very well described in a number of articles, like this or in IBM DB2 official guide on Troubleshooting and Tuning Database Performance. What I want to describe here is how to increase buffer pools, probably one of the most important tuning parameters and very basic at the same time. The issue you can run into is when you increase buffer pool size you get an error SQL20189W:

The buffer pool operation (CREATE/ALTER) will not take effect until the next database startup due to insufficient memory.

It is not just a warning which suggests you to reboot. In fact, after a reboot your buffer pools won’t activate due to insufficient memory and database will work using small system buffers which will drastically decrease performance.

The reason why it happens is global memory cap which is configured in instance Configuration Parameters and called INSTANCE_MEMORY. It’s a total amount of memory which this instance can use for its operations. In order to have bigger buffer pools you must also increase this parameter. After that, SQL20189W goes away and you can tweak buffer pool memory on-the-fly. To check that change has happened use:

db2mtrk -d -v

and look for the line like

Buffer Pool Heap (1) is of size 3343450112 bytes