Wikipedia on Infiniband: “a switched fabric communications link used in high-performance computing and enterprise data centers”. The most useful feature of this type of link, in the context of a cluster interconnect, is very low latency.
This post describes the SUSE settings needed for a very simple setup for Oracle Real Application Cluster with two database nodes. The hardware:
- QLogic QLE7340 Host Channel Adapters (HCA). Good to know: the infiniband business was acquired by Intel from QLogic, so Intel provides support for these cards now
- Mellanox IS5022 switch
The switch does not need any configuration. Getting the cards to work on the servers needs a bit of work. It’s because these particular types of cards are not officially supported by SUSE, even though SUSE offers the compiled kernel driver module. Supposedly (according to Intel), SP3 for SLES11 will provide support. Until it is released, these are the steps to be taken.
Install the Infiniband (OFED) pattern, in YaST Software Management
Open YaST2/Software Management, select View/Patterns and install the Infiniband (OFED) pattern.
Install the drivers and packages for QLogic QLE7340 HCAs
- in YaST2/Software repositories, enable the SLES11-Extras repository. Then install the kernel-default-extra package that’s available in it. This package contains the kernel driver module for the QLogic QLE7340 HCAs. It’s called “ib_qib”.
- configure openibd to load the drivers by editing /etc/infiniband/openibd.conf and changing this line from no to yes:
- also, you have to install these two packages with userland “verbs” that are specific to these HCAs: libipathverbs, libipathverbs-32bit
If you don’t do these steps and just try to set an IP in YaST2 Networking Devices, you will get this error:
Unable to configure the network card because the kernel device (eth0, wlan0) is not present. This is mostly caused by missing firmware (for wlan devices). See dmesg output for details.
Start openibd and enable it for boot
/etc/init.d/openibd start chkconfig --add openibd
Start Open Subnet Manager and enable it for boot
This needs to be running on only one server that is connected to the switch. Switches with management have the subnet manager embedded in their firmware, but this particular entry-level switch does not, so it needs to run on a server. Without it, there will be no ping response. Edit /etc/sysconfig/opensm and change the last line to ONBOOT=yes.
/etc/init.d/opensmd start chkconfig --add opensmd
Verify that HCAs work
You can use this command: ibv_devinfo. It should display something like this:
hca_id: qib0 transport: InfiniBand (0) fw_ver: 0.0.0 node_guid: 0011:7500:****:**** sys_image_guid: 0011:7500:****:**** vendor_id: 0x1175 vendor_part_id: 29474 hw_ver: 0x2 board_id: InfiniPath_QLE7340 phys_port_cnt: 1 port: 1 state: PORT_INIT (2) max_mtu: 4096 (5) active_mtu: 4096 (5) sm_lid: 65535 port_lid: 65535 port_lmc: 0x00 link_layer: IB
Assign an IP to ib0
Now you can assign an IP as you usually do for any network interface, in YaST Network Devices. Do it on both servers.
Verify that the link works
You can use ping between the IPs you have assigned. Also, ibnetdiscover should show something like this:
# # Topology file: generated on # # Initiated from node 00117500******** port 001175000****** vendid=0x2c9 devid=0xbd36 sysimgguid=0x2c902******** switchguid=0x2c902********(2c902********) Switch 8 "S-0002c902********" # "Infiniscale-IV Mellanox Technologies" base port 0 lid 2 lmc 0  "H-001175000*******"(1175000*******) # "server1 HCA-1" lid 1 4xDDR  "H-001175000*******"(1175000*******) # "server2 HCA-1" lid 3 4xQDR vendid=0x1175 devid=0x7322 sysimgguid=0x117500******** caguid=0x1175000******* Ca 1 "H-00117500********" # "server2 HCA-1" (117500*******) "S-0002c9020*******" # lid 3 lmc 0 "Infiniscale-IV Mellanox Technologies" lid 2 4xQDR vendid=0x1175 devid=0x7322 sysimgguid=0x117500********* caguid=0x11750******** Ca 1 "H-001175000******" # "server1 HCA-1" (1175000*******) "S-0002c9020********" # lid 1 lmc 0 "Infiniscale-IV Mellanox Technologies" lid 2 4xDDR