Monday, June 1, 2009

Network Teaming/bonding on Linux

Network Interface Bonding

Interface Bonding is one of the simpler tasks which ensures high availability on the network. Bonding/Teaming is a way through which two or more physical NIC cards could work in tandem in order to ensure high availability from network perspective. It saves us from a network switch failure, network cable failure as well as NIC card failure. Ideal configuration for a high availability server should be:-

n/w switch A n/w switch B

NIC card 1 NIC card 2

SERVER

The only requirement is to have all of the participating slave NICs/Network switches in same subnet/network.


Doing it on Linux is even simpler!! Here are the steps to setup the bonding:-

1. Ensure that your version of OS is natively supporting interface bonding. By default, Redhat does provide bonding as a loadable module within the OS itself. If not, there are a lot of source rpms which could be build and loaded into the kernel.

locate bonding ## run updatedb if slocate database

the above mentioned command will list all the available bonding modules, alternatively you could also do modprobe and then rmmod.

2. Setup the network config files for all ensalved interfaces (i.e. which will form the bonded interface)

vi /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
BOOTPROTO=none
HWADDR=xx:xx:xx:xx:xx:xx
ONBOOT=yes
MASTER=bond0
SLAVE=yes
USERCTL=no


vi /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
BOOTPROTO=none
HWADDR=yy:yy:yy:yy:yy:yy
ONBOOT=yes
MASTER=bond0
SLAVE=yes
USERCTL=no

Boot protocol for NIC has been chosen to None as we are not assigning the IP address to slave NIC cards. HWADDR should NOT be edited. MASTER should be chosen to bond0 or any alias name (even apple0 will work ;)). As you could see above, the MASTER is same for both eth0, eth1. In case you want to ensalve three physical NICs, you need to keep the MASTER same for all three. SLAVE has been chosen as yes as eth0, eth1 will be slaves and will work depending upon the availability/config/requirement.

vi /etc/sysconfig/network-scripts/ifcfg-bond0 ##this file has to be created as its a logical interface
DEVICE=bond0
BOOTPROTO=none
USERCTL=no
IPADDR=.....
NETMASK=.....
GATEWAY=.....
NETWORK=.....
BROADCAST=.....
ONBOOT=yes

All .... entries need to be replaced with the ones which you would set for this logical/bonded interface.

3. vi /etc/modprobe.conf

Add the following lines in modprobe.conf file soon after the slave NICs configuration:-

alias bond0 bonding
options bonding mode=active-backup miimon=100

mode can be active-backup (1), balance-rr (0) etc. Here I am using it in active-backup mode where only one interface will be used for communication till the time a failure is detected on it. I could also use balance-rr which would load balance the packets based on Round Robin algo (sequential). miimon is the link detection interval which checks for the failure.



bonded Interface always have the MAC address of the first slave nic. It could, however, be changed via ifconfig command. Once the bonded logical NIC comes up, the ifconfig will report all slave/bonded NIC with same MAC address:-

# ip link show

2: eth0: mtu 1500 qdisc pfifo_fast master bond0 qlen 1000
link/ether 00:21:5a:d3:eb:b0 brd ff:ff:ff:ff:ff:ff
3: eth1: mtu 1500 qdisc pfifo_fast master bond0 qlen 1000
link/ether 00:21:5a:d3:eb:b0 brd ff:ff:ff:ff:ff:ff
4: bond0: mtu 1500 qdisc noqueue
link/ether 00:21:5a:d3:eb:b0 brd ff:ff:ff:ff:ff:ff

# ifconfig -a
bond0 Link encap:Ethernet HWaddr 00:21:5A:D3:EB:B0
....
....

eth0 Link encap:Ethernet HWaddr 00:21:5A:D3:EB:B0
....
....

eth1 Link encap:Ethernet HWaddr 00:21:5A:D3:EB:B0
.....
.....

Check either your ifconfig script or /proc filesystem for actual MAC address:-


# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 500
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:21:5a:d3:eb:b0

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:21:5a:d3:eb:b1

Monday, February 9, 2009

Making IBM Lotus sametime work on Pidgin (Linux messenger)

Did you ever want to connect to IBM lotus sametime using the Linux messenger client (pidgin)? Here is a procedure of doing the same:-

OS:- Linux redhat 4 U4 (Nahant)
Messaging client:- pidgin (by default included in Redhat).

If you don't have pidgin, I would recommened downloading the rpm or debian package from Internet depending upon the OS version you have. Its a nice client used to connect to several protocols like yahoo, google talk, myspace IM etc.

Here is how to enable the sametime protocol alongwith others in pidgin:-

1. Run pidgin client on command console and check if you have sametime protocol listed in login options.

$ pipdgin &

If you are not running GUI on Linux, use export DISPLAY to export the GUI interface to any Linux desktop. If you have Reflection available, you can export it to Windows desktop as well.
2. Download the "meanwhile" rpm from internet. Check out "http://meanwhile.sourceforge.net", you should be able to download the rpm/debian package based on your OS/kernel version.
3. Install the package

rpm -ivh meanwhile*

if you manage to get hands on the source code, then compile and install the library.
4. Depending upon whether you have a GAIM client or pidgin, the library rpm might differ. For pidgin, I used libpurple-meanwhile** library.
5. Finally check the libmeanwhile.so file under pidgin install directory. "/usr/lib/pidgin/". If you are unable to find the libmeanwhile library under pidgin, then you will have to copy the file into pidgin lib folder.
6. Restart the pidgin client and check if sametime protocol appears under the drop-down.

I have a central server from where I can export the display onto my Linux desktop or various linux desktop's.

My first impression of sametime library was quite nice. It does not hang up while someone tries to contact me and is absolutely flawless in working.

Do not forget to check the port where you would be connecting to your sametime server. Ideally it should be 1533.

Rahul.

Friday, January 30, 2009

How to remove LUN devices from Veritas on Solaris and re-use them on a different server

Ever wanted to get rid of some of the old veritas disk groups from a Sun Solaris box in order to allocate that space onto a new server? Here is a short summary of steps which could be really helpful. As we all have faced these scenarios, I thought of documenting it.

Requirement
To remove about 10 recently allocated storage LUN devices (out of available 18 devices in the veritas disk group) from the server and put them onto a new box.
Summary of work required

1. First of all, unmount all the new filesystems created after the recent addition of 10 devices in the disk group.

2. Stop the new volume within the disk group (If you miss it, it might come back to haunt you when you deport the volume).
vxvol -g datadg stopall ## in case you are removing the whole disk group
vxvol -g datadg stop {volume} ## in case you are removing only some volumes from the disk group.

3. Remove the new volume for freeing the disks.
vxassist remove volume {volume} ## Step for removing some volumes out of the disk group
If you want to remove the complete disk group you can use the following:-
vxdg deport datadg
vxdg destroy datadg

4. Once the volumes are removed, check if the associated LUN disk drives are empty.
vxdg free
It should report the OFFSET to be 0 for the devices, if not DO NOT remove that drive, rather try to shrink the volume. Since the new drives were added latest into the disk group, the resizing will free those up first.

5. If 4 holds true for you (for me it did :D) then go ahead and mark the disk drives offline.
Use vxdiskadm command for the same, its CLI is pretty nice.

6. Once the drives are marked offline, you can safely login to the Storage and unmap the associated drives.
Ideally, while using Sun storage, the devices appear in the order to generation/detection. Unlike Linux, where you can use devlabel, Solaris offers controller based device naming conventions. But the only issue is that this convention is not visible on OS. For Sun storage, all LUN devices have a WWN number, but the server only prefixes the HBA card's WWN number. Hence the order of adding new LUNs should be done one by one and not in batch ;).

7. After I unmasked the drives, I used format command to check if OS still sees those LUNs. Afterwards, follow the following steps to remove the device config from Veritas:-
vxdisk rm {lun}
devfsadm -C
I do not like to use cfgadm to unconfigure the path, this is a step which is required if format still shows the removed LUN devices
cfgadm -c unconfigure cx::dsk/cxtydz

8. Now the devices are out of Solaris/Veritas control. Run vxdctl enable to cross-check if the devices show up again.

Well, the hard/tricky part is over. Go ahead and destroy the unmapped LUNs and re-create the device on storage. Afterwards, map it to new server where it would be used finally. Sounds easy isn't it? Well if you are lucky enough, just like me, you will run into another weird stuff.

When I presented the newly created 300 GB device (6X50 GB old LUNs), the device showed up with veritas plexes/pre configuration on new server. In another words, the device data was still intact :(. And to my surprise it reported 50 GB device rather then the actual 300 GB.

So I had to finally use the good old "dd" companion for flushing out the first 128 bits on the storage LUN. This was enough for Veritas volume manager to treat the device as a new one.

If you run into this issue, better remove the LUN device, re-create it on storage, remove the config from solaris kernel and then add it to the OS.

Hope it helps,
Rahul.