How to configure an Infiniband interconnect for Oracle RAC on SUSE Linux Enterprise Server 11

Wikipedia on Infiniband: “a switched fabric communications link used in high-performance computing and enterprise data centers”. The most useful feature of this type of link, in the context of a cluster interconnect, is very low latency.

This post describes the SUSE settings needed for a very simple setup for Oracle Real Application Cluster with two database nodes. The hardware:

  • QLogic QLE7340 Host Channel Adapters (HCA). Good to know: the infiniband business was acquired by Intel from QLogic, so Intel provides support for these cards now
  • Mellanox IS5022 switch
  • Cables

The switch does not need any configuration. Getting the cards to work on the servers needs a bit of work. It’s because these particular types of cards are not officially supported by SUSE, even though SUSE offers the compiled kernel driver module. Supposedly (according to Intel), SP3 for SLES11 will provide support. Until it is released, these are the steps to be taken.

Install the Infiniband (OFED) pattern, in YaST Software Management

Open YaST2/Software Management, select View/Patterns and install the Infiniband (OFED) pattern.

Install the drivers and packages for QLogic QLE7340 HCAs

  • in YaST2/Software repositories, enable the SLES11-Extras repository. Then install the kernel-default-extra package that’s available in it. This package contains the kernel driver module for the QLogic QLE7340 HCAs. It’s called “ib_qib”.
  • configure openibd to load the drivers by editing /etc/infiniband/openibd.conf and changing this line from no to yes:
  • also, you have to install these two packages with userland “verbs” that are specific to these HCAs: libipathverbs, libipathverbs-32bit

If you don’t do these steps and just try to set an IP in YaST2 Networking Devices, you will get this error:

Unable to configure the network card because the kernel device (eth0, wlan0) is not present.
This is mostly caused by missing firmware (for wlan devices). See dmesg output for details.

Start openibd and enable it for boot

/etc/init.d/openibd start
chkconfig --add openibd

Start Open Subnet Manager and enable it for boot

This needs to be running on only one server that is connected to the switch. Switches with management have the subnet manager embedded in their firmware, but this particular entry-level switch does not, so it needs to run on a server. Without it, there will be no ping response. Edit /etc/sysconfig/opensm and change the last line to ONBOOT=yes.

/etc/init.d/opensmd start
chkconfig --add opensmd

Verify that HCAs work

You can use this command: ibv_devinfo. It should display something like this:

hca_id: qib0
        transport:                      InfiniBand (0)
        fw_ver:                         0.0.0
        node_guid:                      0011:7500:****:****
        sys_image_guid:                 0011:7500:****:****
        vendor_id:                      0x1175
        vendor_part_id:                 29474
        hw_ver:                         0x2
        board_id:                       InfiniPath_QLE7340
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_INIT (2)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 65535
                        port_lid:               65535
                        port_lmc:               0x00
                        link_layer:             IB

Assign an IP to ib0

Now you can assign an IP as you usually do for any network interface, in YaST Network Devices. Do it on both servers.

Verify that the link works

You can use ping between the IPs you have assigned. Also, ibnetdiscover should show something like this:

# Topology file: generated on 
# Initiated from node 00117500******** port 001175000******

Switch  8 "S-0002c902********"          # "Infiniscale-IV Mellanox Technologies" base port 0 lid 2 lmc 0
[1]     "H-001175000*******"[1](1175000*******)                 # "server1 HCA-1" lid 1 4xDDR
[2]     "H-001175000*******"[1](1175000*******)                 # "server2 HCA-1" lid 3 4xQDR

Ca      1 "H-00117500********"          # "server2 HCA-1"
[1](117500*******)     "S-0002c9020*******"[2]         # lid 3 lmc 0 "Infiniscale-IV Mellanox Technologies" lid 2 4xQDR

Ca      1 "H-001175000******"          # "server1 HCA-1"
[1](1175000*******)     "S-0002c9020********"[1]         # lid 1 lmc 0 "Infiniscale-IV Mellanox Technologies" lid 2 4xDDR