Tuesday, November 25, 2008

Tryst with Dell PowerEdge servers (BCM or TG3 network Driver)

Thought of sharing this vital info and challenge which I faced while setting up Linux on Dell PowerEdge server. I am a firm admirer of HP hardware machines and there ease of deployment. The thing which I liked about Dell's motherboard was the presence of two onboard NIC cards compatible with Broadcomm redundancy module. Although not required but this could very well be a boon for NIC bonding for HA servers.

As long as the boot server setup and OS installation goes, things were looking good but the problem started to occur when the server was live. While trouble-shooting I came across issues which were unique to the hardware:-

We were experiencing intermittent loss of link and network connectivity that could not be restored. Nothing appeared on the NIC cards which indicated loss of packets or collisions.

Tackling the problem, I first went into check what hardware config was on the server.

# lspci | grep Ethernet
08:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5700 Gigabit Ethernet (rev 14)
08:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5700 Gigabit Ethernet (rev 14)

Indeed it was a 1000MB/s NIC but I doubt my switch would support that ;). So I manually set the auto-net to off and put it as 100MBPS full duplex.

To do the same, simply add this line to ifcfg-ethX file in sysconfig:-
ETHTOOL_OPTS="speed 100 duplex full autoneg off"

But even with this, the network packets were getting dropped. After a bit of googling :D, I figured out that the NetXtreme type NICs does not work well with the default tg3 driver in Redhat. So I decided to go with BCM drivers instead. Here is how to configure NICs with BCM instead of tg3 driver.

step 1:- Download the rpm bcm5700-7.1.22-1 and install it on the server.
step 2:- Edit the modules.conf file and change the enteries for NIC cards:-

alias eth0 bcm5700
alias eth1 bcm5700
options bcm5700 auto_flow_control=0,0 tx_flow_control=0,0 rx_flow_control=0,0

Load the modules correctly alongwith network services restart or just reboot the damn thing!!!

My take on this is that I feel its better to switch to BCM drivers ONLY if the NICs are of Nextreme type. I am not going into the details of whether tg3 is good or bcm. But in this case, BCM out-performed the tg3 driver for the older Dell box ;). I recently heard from one of my friend that even Nextreme is looking to go for tg3, not sure if its just a hoax.

Cheers!!!
Rahul.

2 comments:

Srini said...

thanksss.. got to admit it was a deep dive into the subject. It took me a while to resolve one pending issue with my dell box. your solution hit the nail on the head.

Anonymous said...
This comment has been removed by a blog administrator.