Tuesday, November 18, 2008

Recovering EMC power pseudo device config

This is actually a continuation to my earlier post. There was an instance in which I ran into an issue with pseudo devices. Here is a shorter version of what happened:-

We manage an Oracle 10g RAC cluster on two Linux boxes which have ~30 EMC devices. Out of which about ~25 devices are shared between both servers** and remaining ~5 devices are different on both servers, which means that there are about ~35 LUNs from EMC storage. One bad day, one of the server was removed from the storage zone (dont wanna go into the details) and we were left with an up-hill task to recover that node.

**25 devices are used for Oracle data while 5 different LUNs are being used as internal filesystems on both servers.

We managed to get our zoning config back and thankfully, the internal hard-drive was still intact. As soon as the devices were re-presented from the storage, someone ran the HP utility to scan all new LUNs. As soon as the powerpath services were started, the device mapping config changed. Now we were left with a situation where the older /dev/emcpowera device was actually mapped to /dev/emcpowerc :(. This happened as EMC powerpath software assigned the pseudo device IDs based on the new device config (as recognized by OS) and without noticing the difference, they issued powermt config.

One way to recover from such a scenario would have been to copy the device config from the good node to this node. So we did, and got our config back.

But much to my dismay, apart from the 25 devices which were same on both nodes, the 5 different devices IDs were skipped by Powerpath.

So effectively, I had /dev/emcpowers on one node representing the 26th device and /dev/emcpowerx on another (notice that due to config copy, powerpath reserved emcpowers till emcpowerw for the device IDs of good server and presented a new device ID for the recovered node.)

If you are fortunate enough to have Powerpath version 3.0 or earlier, you could manually edit the emcpower.conf file to change the device IDs. But if you the newer versions of powerpath, then the work-around is not there.

Riding on my good luck, I decided to restore the config files emcp_devicesDB.idx & emcp_devicesDB.dat and voila..it worked. I was now back to where we were before the node failure.



Here is how its done:-

For EMCpowerpath version 3.0.6 or earlier:-

Edit the emcpower.conf file and change the device IDs:-

name="emcpoweri" parent="pseudo" instance=8 vidb=00000000000000000000000011********************************* vids=******************************* dev=0x8,0x20,0x8,0x40;

to

name="emcpowers" parent="pseudo" instance=8 vidb=00000000000000000000000011********************************* vids=******************************* dev=0x8,0x20,0x8,0x40;

Now unmount all the filesystems and stop the volume manager (if any) to release the pseudo devices.

Afterwards, restart the powerpath software andd cross-check the device names via powermt command.




For EMCpowerpath version 4.** or later:-

If you don't have emcpower.conf file, you must be having a latest version. For this, do the following:-

1. Recover the following files for older device config from last backup:-
/etc/emcp_devicesDB.idx
/etc/emcp_devicesDB.dat

Or copy it from the good node (provided you do not have any exclusive LUN on either of the server)
2. Unmount the filesystem and stop the volume manager.
3. Restart the powerpath services.


Unmounting of filesystem/stop of volume manager is required, so that you don't run into the problem where the pseudo device is unavailable and hence server health is in question.

Do let me know if you need any clarification...

No comments: