Introduction
InfiniBand (IB) networking offers a high throughput and low latency. To use IB, network applications must use IB verb APIs. Traditional IP network applications cannot run on an IB network directly, hence you must specify a layer for encapsulating and transmitting IP packets over InfiniBand (IPoIB) networks so those IP applications still run without modifying the code. Note that since IPoIB emulates the IP layer, the performance of the IP applications is lower than if they were written to use InfiniBand natively.
This document describes how to configure the IPoIB layer on IB systems equipped with Intel® Xeon Phi™ coprocessors. You will need the Intel® Manycore Platform Software Stack (Intel® MPSS) to work with the coprocessors. In addition, you will also need to install OpenFabrics Enterprise Distribution* (OFED*) software to configure IPoIB.
The IPoIB driver in the OFED stack allows TCP/IP applications to run over the IB network. This driver implements IP over the IB protocol (RFC4391, RFC4392, and RFC4755). The IPoIB driver supports two operation modes: Unreliable Datagram (UD) or Reliable Connected (RC) mode. The UD mode matches the IP protocol, which is also an unreliable datagram. In the RC mode, a connection must be established before the transmission can start.
Preparation
The following tests were run on two systems equipped with the Intel® Xeon® processor E5-2670 2.6 GHz and two Intel Xeon Phi coprocessors 7120 connected to each host. Both systems were running Red Hat Enterprise Linux* 64-bit 6.6 (kernel 2.6.32-504). On each system, a Mellanox ConnectX*-3 VPI IB adapter was installed into a PCIe* slot. The ports of the adaptors were connected directly with no intervening switch.
After the system is rebooted, verify that the Mellanox ConnectX-3 VPI IB adapter is properly identified:
# lspci | grep Mellanox 03:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
On both systems (called knightscorner4 and kinghtscorner5), I downloaded and installed Intel MPSS 3.6.1 from the Intel MPSS download page and the OFED stack OFED-3.18-1 from the open source OpenFabrics Alliance.
Instructions for installing the OFED stack are included in the “Intel® MPSS User’s Guide,” Section 3.6. After the OFED stack is installed successfully, you can verify the version of the OFED installed:
# ofed_info -s OFED-3.18-1:
The following instructions (see the “Intel® MPSS User Guide,” Section 3.6.9) are necessary in order to bring OFED up.
As root, bring the MPSS service up:
# service mpss start Loading MIC module: [ OK ]
Configuring IPoIB on the Intel® Xeon® Processor Host and Intel Xeon Phi Coprocessor
This section shows which configuration files need to be modified to enable the IPoIB interface.
In this example, I create four IPoIB nodes on the subnet 192.168.100.0/255. The host knightscorner4 and the connected coprocessor knightscorner4-mic0 are assigned the IP addresses 192.168.100.1 and 192.168.100.100 respectively. Similarly, host knightscorner5 and the connected coprocessor knightscorner5-mic0 are assigned the IP addresses 192.168.100.2 and 192.168.100.200 respectively.
You can use a configure file to configure the IPoIB interface on the hosts.
To configure the device ib0 in knightscorner4, edit the /etc/sysconfig/network-scripts/ifcfg-ib0 host configuration file. The following configuration shows that the IP address 192.168.100.1 is assigned to the ib0 device on the host.
[knightscorner4 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ib0 DEVICE=ib0 TYPE=InfiniBand UUID=296da2e8-6193-4ec2-9122-4a7ca1f3fcc0 ONBOOT=yes BOOTPROTO=none NETWORK=192.168.100.0 NETMASK=255.255.255.0 IPADDR=192.168.100.1
On host knightscorner4, edit the /etc/mpss/ipoib.conf coprocessor configuration file to configure the interface ib0 on mic0. The ib0 interface on knightscorner4-mic0 is assigned the IP address 192.168.100.100:
[knightscorner4 ~]# cat /etc/mpss/ipoib.conf # to start ipoib on the mic automatically, uncomment the following # ipoib_enabled=yes # # to assign ip addresses to ib devices on the mic, specify the ip address # using the following example for setting ib0 on mic0 address # mic0_ib0=192.168.100.100 # # if netmask needs to tbe set or other ifconfig option, add them to # the ip address (quoted) # # mic0_ib1="192.168.100.101 netmask 255.255.0.0" # # to pass options to ib_ipoib module on the mic, use the following line # # ipoib_parms="send_queue_size=2048 recv_queue_size=4096"
Similarly, to configure the device ib0 on host knightscorner5, edit the /etc/sysconfig/network-scripts/ifcfg-ib0 host configuration file. A typical configuration file for the device ib0 looks similar to the one below. This configuration file assigns the IP address 192.168.100.2 to device ib0:
[knightscorner5 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ib0 DEVICE=ib0 TYPE=InfiniBand UUID=a371c52b-fa6c-4666-b060-ec04ceaa2382 ONBOOT=yes BOOTPROTO=none NETWORK=192.168.100.0 NETMASK=255.255.255.0 IPADDR=192.168.100.2
On host knightscorner5, edit the file /etc/mpss/ipoib.conf to configure the interface ib0 on mic0. The ib0 interface on knightscorner5-mic0 is assigned the IP address 192.168.100.200:
[knightscorner5 ~]# cat /etc/mpss/ipoib.conf # to start ipoib on the mic automatically, uncomment the following # ipoib_enabled=yes # # to assign ip addresses to ib devices on the mic, specify the ip address # using the following example for setting ib0 on mic0 address # mic0_ib0=192.168.100.200 # # if netmask needs to tbe set or other ifconfig option, add them to # the ip address (quoted) # # mic0_ib1="192.168.100.101 netmask 255.255.0.0" # # to pass options to ib_ipoib module on the mic, use the following line # # ipoib_parms="send_queue_size=2048 recv_queue_size=4096"
Bringing the OFED Stack Up
After changing the configuration files to enable the IPoIB protocol, you can bring the OFED stack up on both knightscorner4 and knightscorner5.
1. First, start the OFED stack on the host systems:
# service openibd start Loading HCA driver and Access Layer: [ OK ]
You may verify the IPoIB driver is now loaded:
[knightscorner5 ~]# lsmod | grep ib_ipoib ib_ipoib 80814 0 ib_cm 36932 3 rdma_cm,ib_ipoib . . . . . . . . .
2. Start the IB subnet manager to configure the fabric:
# service opensmd start Starting IB Subnet Manager. [ OK ]
At this point, the interface ib0 is created. To verify the operation mode of the interface ib0 in your system, type the following command:
# cat /sys/class/net/ib0/mode connected
You can change the mode to datagram by typing:
# echo datagram > /sys/class/net/ib0/mode
Or switch to connected mode by typing:
# echo connected > /sys/class/net/ib0/mode
The default mode can be configured by editing the SET_IPOIB_CM parameter in /etc/infiniband/openib.conf. That is, setting SET_IPOIB_CM=yes will set the default mode to connected.
To check whether the network interface ib0 on host knightscorner5 is available:
[knightscorner5 ~]# ifconfig ib0 Ifconfig uses the ioctl access method to get the full address information, which limits hardware addresses to 8 bytes. Because Infiniband address has 20 bytes, only the first 8 bytes are displayed correctly. Ifconfig is obsolete! For replacement check ip. ib0 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:192.168.100.2 Bcast:192.168.100.255 Mask:255.255.255.0 inet6 addr: fe80::f652:1403:7d:2b91/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:56 errors:0 dropped:0 overruns:0 frame:0 TX packets:59 errors:0 dropped:10 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:3692 (3.6 KiB) TX bytes:5008 (4.8 KiB)
To show the complete MAC address of the network, use the following command:
[root@knightscorner5 ~]# ip addr show ib0 11: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP qlen 256 link/infiniband 80:00:00:48:fe:80:00:00:00:00:00:00:f4:52:14:03:00:7d:2b:91 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff inet 192.168.100.2/24 brd 192.168.100.255 scope global ib0 inet6 fe80::f652:1403:7d:2b91/64 scope link valid_lft forever preferred_lft forever
3. Start the ofed-mic service. This also loads the ib_ipoib driver on the Intel Xeon Phi coprocessor:
# service ofed-mic start Starting OFED Stack: host [ OK ] mic0 : ib0 [ OK ] mic1 [ OK ]
Verify that the 192.168.100 subnet is configured and routed through the interface ib0 on the host:
[knightscorner5 ~]# route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 192.168.100.0 * 255.255.255.0 U 0 0 0 ib0 10.23.3.0 * 255.255.255.0 U 0 0 0 eth0 172.31.1.0 * 255.255.255.0 U 0 0 0 mic0 192.0.2.0 * 255.255.255.0 U 0 0 0 mic0 172.31.2.0 * 255.255.255.0 U 0 0 0 mic1 link-local * 255.255.0.0 U 1002 0 0 eth0 link-local * 255.255.0.0 U 1004 0 0 mic0 link-local * 255.255.0.0 U 1005 0 0 mic1 link-local * 255.255.0.0 U 1010 0 0 ib0 default jf311-lfw-a_vl5 0.0.0.0 UG 0 0 0 eth0
Verify that the 192.168.100 subnet is configured and routed through the interface ib0 on the coprocessor:
[knightscorner5 ~]# ssh mic0 route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 192.168.100.0 * 255.255.255.0 U 0 0 0 ib0 172.31.1.0 * 255.255.255.0 U 0 0 0 mic0 192.0.2.0 * 255.255.255.0 U 0 0 0 mic0 default host 0.0.0.0 UG 0 0 0 mic0
You may have to bring the interface ib0 down and back up if necessary:
# ifconfig ib0 down # ifconfig ib0 up # ifconfig ib0
Verification
To verify that IPoIB is working, you can use the ping utility to ping to all IPoIB devices. For example, from host knightscorner5, you can pingknightscorner4 (192.168.100.1), knightscorner4-mic0 (192.168.100.100), and knightscorner5-mic0 (192.168.100.200):
[knightscorner5 ~]# ping -c 3 192.168.100.1 PING 192.168.100.1 (192.168.100.1) 56(84) bytes of data. 64 bytes from 192.168.100.1: icmp_seq=1 ttl=64 time=0.142 ms 64 bytes from 192.168.100.1: icmp_seq=2 ttl=64 time=0.176 ms 64 bytes from 192.168.100.1: icmp_seq=3 ttl=64 time=0.178 ms --- 192.168.100.1 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 1999ms rtt min/avg/max/mdev = 0.142/0.165/0.178/0.019 ms
[knightscorner5 ~]# ping -c 3 192.168.100.100 PING 192.168.100.100 (192.168.100.100) 56(84) bytes of data. 64 bytes from 192.168.100.100: icmp_seq=1 ttl=64 time=14.0 ms 64 bytes from 192.168.100.100: icmp_seq=2 ttl=64 time=2.24 ms 64 bytes from 192.168.100.100: icmp_seq=3 ttl=64 time=0.943 ms --- 192.168.100.100 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2003ms rtt min/avg/max/mdev = 0.943/5.761/14.094/5.916 ms
[knightscorner5 ~]# ping -c 3 192.168.100.200 PING 192.168.100.200 (192.168.100.200) 56(84) bytes of data. 64 bytes from 192.168.100.200: icmp_seq=1 ttl=64 time=18.9 ms 64 bytes from 192.168.100.200: icmp_seq=2 ttl=64 time=7.56 ms 64 bytes from 192.168.100.200: icmp_seq=3 ttl=64 time=5.89 ms --- 192.168.100.200 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2008ms rtt min/avg/max/mdev = 5.896/10.817/18.994/5.822 ms
Similarly, from coprocessor knightscorner5-mic0, you can pingknightscorner4 (192.168.100.1), knightscorner4-mic0 (192.168.100.100), and knightscorner5 (192.168.100.2):
[knightscorner5 ~]# ssh knightscorner5-mic0 [knightscorner5-mic0 ~]# ping -c 3 192.168.100.1 PING 192.168.100.1 (192.168.100.1) 56(84) bytes of data. 64 bytes from 192.168.100.1: icmp_req=1 ttl=64 time=4.89 ms 64 bytes from 192.168.100.1: icmp_req=2 ttl=64 time=9.99 ms 64 bytes from 192.168.100.1: icmp_req=3 ttl=64 time=9.77 ms --- 192.168.100.1 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2025ms rtt min/avg/max/mdev = 4.899/8.220/9.991/2.352 ms
[knightscorner5-mic0 ~]# ping -c 3 192.168.100.100 PING 192.168.100.100 (192.168.100.100) 56(84) bytes of data. 64 bytes from 192.168.100.100: icmp_req=1 ttl=64 time=14.8 ms 64 bytes from 192.168.100.100: icmp_req=2 ttl=64 time=9.99 ms 64 bytes from 192.168.100.100: icmp_req=3 ttl=64 time=9.67 ms --- 192.168.100.100 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2025ms rtt min/avg/max/mdev = 9.676/11.496/14.817/2.351 ms
[knightscorner5-mic0 ~]# ping -c 3 192.168.100.2 PING 192.168.100.2 (192.168.100.2) 56(84) bytes of data. 64 bytes from 192.168.100.2: icmp_req=1 ttl=64 time=5.11 ms 64 bytes from 192.168.100.2: icmp_req=2 ttl=64 time=9.98 ms 64 bytes from 192.168.100.2: icmp_req=3 ttl=64 time=9.66 ms --- 192.168.100.2 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2025ms rtt min/avg/max/mdev = 5.115/8.256/9.989/2.224 ms
Conclusion
Configuring IPoIB on Intel Xeon Phi coprocessors requires the Intel MPSS stack and OFED stack. This article showed the necessary steps to configure IPoIB and assign IP addresses on two host systems equipped with Intel Xeon Phi coprocessors and connected directly via IB host channel adapters. Finally, a simple test was done to verify that IPoIB is working correctly.
References
“Intel® Manycore Platform Software Stack (Intel® MPSS) User’s Guide,” December 2015, Revision 3.6.1
Open Fabrics Enterprise Distribution (OFED) Version 3.18-1
Notices
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at intel.com.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.
This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.
The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.
Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting www.intel.com/design/literature.htm.
Intel, the Intel logo, Intel Xeon Phi, and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
© 2016 Intel Corporation.
This sample source code is released under the Intel Sample Source Code License Agreement.