This document shows how to install the Intel® Omni-Path Host Fabric Interface (Intel® OP HFI) card and the Intel® Omni-Path Architecture (Intel® OPA) software on the host. It also describes how to verify the link and port status, perform the necessary steps to bring the stack up, and run a test on the fabric.
Introduction to Intel® Omni-Path Architecture
Intel OPA is the latest generation of Intel’s high-performance fabric technology. It adds new capabilities to enhance high-performance computing performance, scalability, and quality of service. The Intel OPA components include Intel OP HFI, which provides fabric connectivity, switches that connect a scalable number of endpoints, copper, and optical cables, and a Fabric Manager (FM) that identifies all nodes, provides centralized provisioning, and monitors fabric resources. Intel OP HFI is the Intel OPA interface card which provides host-to-switch connectivity. Intel OP HFI can also connect directly to another HFI (back-to-back connectivity).
This paper focuses on how to install the Intel OP HFI, configure the IP over Fabric, and test the fabric using a prebuilt program. Two systems, each equipped with a preproduction Intel® Xeon® E5 processor, were used in this example. Both systems were running Red Hat Enterprise Linux* 7.2 and were equipped with Gigabit Ethernet adapters connected through a Gigabit Ethernet router.
Intel® Omni-Path Host Fabric Interface
Intel OP HFI is a standard PCIe* card that interfaces with a router or other HFI. There are two current models of Intel OP HFI: PCIe x16, which supports 100 Gbps, and PCIe x8, which supports 56 Gbps. Designed for low latency/high bandwidth, it can configure from 0 to 8 Virtual Lanes plus one management. MTU size can be configurable as 2, 4, 6, 8, or 10 KB.
Below is the picture of an Intel OP HFI PCIe x16 that was used in this test:
Hardware Installation
Two Intel® Xeon® processor-based servers running Red Hat Enterprise Linux 7.2 were used. They have Gigabit Ethernet adapters and are connected through a router. Their IP addresses were 10.23.3.27
and 10.23.3.148
. In this example, we will install an Intel OP HFI PCIe x16 card on each server and use an Intel OPA cable to connect them in a back-to-back configuration.
First, power down the systems and then install an Intel OP HFI in an x16 PCIe slot in each system. Connect the two Intel OP HFIs with the Intel OPA cable and power up the systems. Verify that the solid-green LED of the Intel OP HFI is on; this indicates the Intel OP HFI link status is activated.
Next, verify that the OS detects the Intel OP HFI by using the lspci
command:
# lspci -vv | grep Omni 18:00.0 Fabric controller: Intel Corporation Omni-Path HFI Silicon 100 Series [discrete] (rev 11) Subsystem: Intel Corporation Omni-Path HFI Silicon 100 Series [discrete]
The first field of the display output (18:00.0
) shows the PCI slot number, the second field shows the slot name “Fabric controller
”, and the last field shows the device name which is the Intel OP HFI.
To verify the Intel OP HFI speed, type “lspci –vv
” to display more details and search for the previous slot (18:00.0
):
# lspci -vv ................................................... 18:00.0 Fabric controller: Intel Corporation Omni-Path HFI Silicon 100 Series [discrete] (rev 11) Subsystem: Intel Corporation Omni-Path HFI Silicon 100 Series [discrete] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 38 NUMA node: 0 Region 0: Memory at a0000000 (64-bit, non-prefetchable) [size=64M] Expansion ROM at <ignored> [disabled] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [70] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <8us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L1, Exit Latency L0s <4us, L1 <64us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+ EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest- ...................................................
LinkCap
and LinkStat
indicate a speed of 8 GT/s, width x16. This is Gen3 PCIe. It confirms that the optimal speed for this Intel OP HFI is used.
Host Software Installation
The Intel® Omni-Path Fabric host software is available in two versions:
- Basic bundle: This software is usually installed on the compute nodes.
- Intel® Omi-Path Fabric Suite (IFS) is a superset of the Basic bundle and is usually installed on a head/management node, which additionally containing the following packages:
- FastFabric: A collection of utilities for installation, testing, and monitoring the fabric
- Intel® Omni-Path Fabric Manager
- FMGUI is Intel's Fabric Manager GUI
Each fabric requires at least one FM; however if many FMs coexist, only one FM is the master FM and others are standby FMs. The FM identifies all nodes, switches, and routers; it assigns local IDs, maintains all nodes with routing tables, and scans for change and reprograms automatically. It is recommended that the host memory must reserve 500 MB for Compute nodes, and 1 GB per each FM instance. The following figure shows all components of the Fabric Host Software Stack (source: “Intel® Omni-Path Fabric Host Software User Guide, Rev. 5.0”).
Both versions can be downloaded from https://downloadcenter.intel.com/search?keyword=Omni-Path. You need to install the required OS RPMs before installing the host software. Please refer to the Section 1.1.1.1 OS RPMs Installation Prerequisites in the document “Intel® Omni-Path Fabric Software Installation Guide (Rev 5.0)”.
In this example, we download the IFS package IntelOPA-IFS.RHEL72-x86_64.10.3.0.0.81.tgz
, extract the package, and then run the installation script on both machines (both machines run the same OS and the same IFS package):
# tar -xvf IntelOPA-IFS.RHEL72-x86_64.10.3.0.0.81.tgz # cd IntelOPA-IFS.RHEL72-x86_64.10.3.0.0.81/ # ./INSTALL -a Installing All OPA Software Determining what is installed on system... ------------------------------------------------------------------------------- Preparing OFA 10_3_0_0_82 release for Install... ... A System Reboot is recommended to activate the software changes Done Installing OPA Software. Rebuilding boot image with "/usr/bin/dracut -f"...done.
This requires a reboot on the host:
# reboot
After the system is rebooted, you load the Intel OP HFI driver, run lsmod
, and then check for Intel OPA modules:
# modprobe hfi1 # lsmod | grep hfi1 hfi1 633634 1 rdmavt 57992 1 hfi1 ib_mad 51913 5 hfi1,ib_cm,ib_sa,rdmavt,ib_umad ib_core 98787 14 hfi1,rdma_cm,ib_cm,ib_sa,iw_cm,xprtrdma,ib_mad,ib_ucm,rdmavt,ib_iser,ib_umad,ib_uverbs,ib_ipoib,ib_isert i2c_algo_bit 13413 2 ast,hfi1 i2c_core 40582 6 ast,drm,hfi1,ipmi_ssif,drm_kms_helper,i2c_algo_bit
For installation errors or additional information, you can refer to the /var/log/opa.log
file.
Post Installation
To configure IP over Fabric from the Intel OPA software, run the script again:
# ./INSTALL
Then choose option 2) Reconfigure OFA IP over IB. In this example, we configure the IP address (IPoFabric) of the host as 192.168.100.101
. You can verify the IP over Fabric Interface:
# more /etc/sysconfig/network-scripts/ifcfg-ib0 DEVICE=ib0 BOOTPROTO=static IPADDR=192.168.100.101 BROADCAST=192.168.100.255 NETWORK=192.168.100.0 NETMASK=255.255.255.0 ONBOOT=yes NM_CONTROLLED=no CONNECTED_MODE=yes MTU=65520
Bring the IP over Fabric Interface up, and then verify its IP address:
# ifup ib0 # ifconfig ib0 ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 65520 inet 192.168.100.101 netmask 255.255.255.0 broadcast 192.168.100.255 inet6 fe80::211:7501:179:311 prefixlen 64 scopeid 0x20<link> Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8). infiniband 80:00:00:02:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 txqueuelen 256 (InfiniBand) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 16 bytes 2888 (2.8 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Note that common information is captured in /var/log/messages
file and Intel OPA-related information is captured in /var/log/opa.log
.
So far, we have installed the host software package on one host (10.23.3.27
). We repeat the same procedure on the second host (10.23.3.148
) and configure the IP over Fabric with 192.168.100.102
on that host. Verify that you can ping 192.168.100.102
from the first server:
# ping 192.168.100.102 PING 192.168.100.102 (192.168.100.102) 56(84) bytes of data. 64 bytes from 192.168.100.102: icmp_seq=1 ttl=64 time=1.34 ms 64 bytes from 192.168.100.102: icmp_seq=2 ttl=64 time=0.303 ms 64 bytes from 192.168.100.102: icmp_seq=3 ttl=64 time=0.253 ms ^C
And from the second server, verify that you can ping the IP over Fabric interface:
# ping 192.168.100.101 PING 192.168.100.101 (192.168.100.101) 56(84) bytes of data. 64 bytes from 192.168.100.101: icmp_seq=1 ttl=64 time=0.024 ms 64 bytes from 192.168.100.101: icmp_seq=2 ttl=64 time=0.043 ms 64 bytes from 192.168.100.101: icmp_seq=3 ttl=64 time=0.023 ms ^C
The opainfo
command can be used to verify the fabric:
# opainfo hfi1_0:1 PortGUID:0x0011750101790311 PortState: Init (LinkUp) LinkSpeed Act: 25Gb En: 25Gb LinkWidth Act: 4 En: 4 LinkWidthDnGrd ActTx: 4 Rx: 4 En: 1,2,3,4 LCRC Act: 14-bit En: 14-bit,16-bit,48-bit Xmit Data: 0 MB Pkts: 0 Recv Data: 0 MB Pkts: 0 Link Quality: 5 (Excellent)
This confirms the Intel OP HFI speed is 100 Gb (25 Gb x 4).
Next, you need to enable the Intel OPA Fabric Manager to run on one host and start the Intel OPA Fabric Manager. Note that the FM must go through many steps, including physical subnet establishment, subnet discovery, information gathering, LID assignment, path establishment, port configuration, switch configuration, and subnet activation.
# opaconfig –E opafm # service opafm start Redirecting to /bin/systemctl start opafm.service
You can query the status of the Intel OPA Fabric Manager at any time:
# service opafm status Redirecting to /bin/systemctl status opafm.service ● opafm.service - OPA Fabric Manager Loaded: loaded (/usr/lib/systemd/system/opafm.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2017-01-09 23:42:41 EST; 21min ago Process: 6758 ExecStart=/usr/lib/opa-fm/bin/opafmd -D (code=exited, status=0/SUCCESS) Main PID: 6759 (opafmd) CGroup: /system.slice/opafm.service ├─6759 /usr/lib/opa-fm/bin/opafmd -D └─6760 /usr/lib/opa-fm/runtime/sm -e sm_0 # opainfo hfi1_0:1 PortGID:0xfe80000000000000:0011750101790311 PortState: Active LinkSpeed Act: 25Gb En: 25Gb LinkWidth Act: 4 En: 4 LinkWidthDnGrd ActTx: 4 Rx: 4 En: 3,4 LCRC Act: 14-bit En: 14-bit,16-bit,48-bit Mgmt: True LID: 0x00000001-0x00000001 SM LID: 0x00000001 SL: 0 Xmit Data: 0 MB Pkts: 21 Recv Data: 0 MB Pkts: 22 Link Quality: 5 (Excellent)
The port state is “Active
” which indicates the normal operating state of a fully functional link. To display Intel OP HFI port information and to monitor the link quality, run the opaportinfo
command:
# opaportinfo Present Port State: Port 1 Info Subnet: 0xfe80000000000000 GUID: 0x0011750101790311 LocalPort: 1 PortState: Active PhysicalState: LinkUp OfflineDisabledReason: None IsSMConfigurationStarted: True NeighborNormal: True BaseLID: 0x00000001 SMLID: 0x00000001 LMC: 0 SMSL: 0 PortType: Unknown LimtRsp/Subnet: 32 us, 536 ms M_KEY: 0x0000000000000000 Lease: 0 s Protect: Read-only LinkWidth Act: 4 En: 4 Sup: 1,2,3,4 LinkWidthDnGrd ActTx: 4 Rx: 4 En: 3,4 Sup: 1,2,3,4 LinkSpeed Act: 25Gb En: 25Gb Sup: 25Gb PortLinkMode Act: STL En: STL Sup: STL PortLTPCRCMode Act: 14-bit En: 14-bit,16-bit,48-bit Sup: 14-bit,16-bit,48-bit NeighborMode MgmtAllowed: No FWAuthBypass: Off NeighborNodeType: HFI NeighborNodeGuid: 0x00117501017444e0 NeighborPortNum: 1 Capability: 0x00410022: CN CM APM SM Capability3: 0x0008: SS SM_TrapQP: 0x0 SA_QP: 0x1 IPAddr IPV6/IPAddr IPv4: ::/0.0.0.0 VLs Active: 8+1 VL: Cap 8+1 HighLimit 0x0000 PreemptLimit 0x0000 VLFlowControlDisabledMask: 0x00000000 ArbHighCap: 16 ArbLowCap: 16 MulticastMask: 0x0 CollectiveMask: 0x0 P_Key Enforcement: In: Off Out: Off MulticastPKeyTrapSuppressionEnabled: 0 ClientReregister 0 PortMode ActiveOptimize: Off PassThru: Off VLMarker: Off 16BTrapQuery: Off FlitCtrlInterleave Distance Max: 1 Enabled: 1 MaxNestLevelTxEnabled: 0 MaxNestLevelRxSupported: 0 FlitCtrlPreemption MinInitial: 0x0000 MinTail: 0x0000 LargePktLim: 0x00 SmallPktLimit: 0x00 MaxSmallPktLimit 0x00 PreemptionLimit: 0x00 PortErrorActions: 0x172000: CE-UVLMCE-BCDCE-BTDCE-BHDR-BVLM BufferUnits:VL15Init 0x0110 VL15CreditRate 0x00 CreditAck 0x0 BufferAlloc 0x3 MTU Supported: (0x6) 8192 bytes MTU Active By VL: 00: 8192 01: 0 02: 0 03: 0 04: 0 05: 0 06: 0 07: 0 08: 0 09: 0 10: 0 11: 0 12: 0 13: 0 14: 0 15: 2048 16: 0 17: 0 18: 0 19: 0 20: 0 21: 0 22: 0 23: 0 24: 0 25: 0 26: 0 27: 0 28: 0 29: 0 30: 0 31: 0 StallCnt/VL: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 HOQLife VL[00,07]: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 HOQLife VL[08,15]: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 HOQLife VL[16,23]: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 HOQLife VL[24,31]: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 ReplayDepth Buffer 0x80; Wire 0x0c DiagCode: 0x0000 LedEnabled: Off LinkDownReason: None NeighborLinkDownReason: None OverallBufferSpace: 0x0880 Violations M_Key: 0 P_Key: 0 Q_Key: 0
To display information about the Intel OP HFI:
# hfi1_control -i Driver Version: 0.9-294 Driver SrcVersion: A08826F35C95E0E8A4D949D Opa Version: 10.3.0.0.81 0: BoardId: Intel Omni-Path Host Fabric Interface Adapter 100 Series 0: Version: ChipABI 3.0, ChipRev 7.17, SW Compat 3 0: ChipSerial: 0x00790311 0,1: Status: 5: LinkUp 4: ACTIVE 0,1: LID=0x1 GUID=0011:7501:0179:0311 # systemctl stop firewalld # systemctl status firewalld ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled) Active: inactive (dead) since Thu 2017-01-12 21:35:18 EST; 1s ago Process: 137597 ExecStart=/usr/sbin/firewalld --nofork --nopid $FIREWALLD_ARGS (code=exited, status=0/SUCCESS) Main PID: 137597 (code=exited, status=0/SUCCESS) Jan 12 21:34:13 ebi2s28c01.jf.intel.com firewalld[137597]: 2017-01-12 21:34:1... Jan 12 21:35:17 ebi2s28c01.jf.intel.com systemd[1]: Stopping firewalld - dyna... Jan 12 21:35:18 ebi2s28c01.jf.intel.com systemd[1]: Stopped firewalld - dynam... Hint: Some lines were ellipsized, use -l to show in full.
Finally, you need to set up Secure Shell (SSH) password-less for the MPI test. When running a program on the target machine, SSH requires the target password to log on to the target machine and execute the program. To enable this transaction without manual intervention, you need to enable the ssh
login without a password. To do this, first generate a pair of authentication keys on the host without entering a passphrase:
[host-device ~]$ ssh-keygen -t rsa
Then append the host machine new public key to the target machine public key using the command ssh-copy-id:
[host-device ~]$ ssh-copy-id <user>@192.168.100.102
Running an Intel® MPI Benchmarks Program
In this section, we use a benchmark program to observe the IP over Fabric performance. Note that the numbers are for reference and illustration purposes only, as the tests are running with preproduction servers.
On both systems, Intel® Parallel Studio 2017 Update 1 was installed. First, we run the Sendrecv benchmark between the servers using TCP protocol. Sendrecv
is a Parallel Transfer Benchmark in the suite Intel® MPI Benchmarks IMB-MPI1, this tool is available with Intel Parallel Studio. To use the TCP protocol in this benchmark, users can specify “-genv I_MPI_FABRICS shm:tcp
”:
[root@ebi2s28c01 ~]# mpirun -genv I_MPI_FABRICS shm:tcp -host ebi2s28c01 -n 1 /opt/intel/impi/2017.1.132/bin64/IMB-MPI1 Sendrecv : -host ebi2s28c02 -n 1 /opt/intel/impi/2017.1.132/bin64/IMB-MPI1 Source Parallel Studio Intel(R) Parallel Studio XE 2017 Update 1 for Linux* Copyright (C) 2009-2016 Intel Corporation. All rights reserved. #------------------------------------------------------------ # Intel (R) MPI Benchmarks 2017, MPI-1 part #------------------------------------------------------------ # Date : Fri Jan 13 21:02:36 2017 # Machine : x86_64 # System : Linux # Release : 3.10.0-327.el7.x86_64 # Version : #1 SMP Thu Oct 29 17:29:29 EDT 2015 # MPI Version : 3.1 # MPI Thread Environment: # Calling sequence was: # /opt/intel/impi/2017.1.132/bin64/IMB-MPI1 Sendrecv # Minimum message length in bytes: 0 # Maximum message length in bytes: 4194304 # # MPI_Datatype : MPI_BYTE # MPI_Datatype for reductions : MPI_FLOAT # MPI_Op : MPI_SUM # # # List of Benchmarks to run: # Sendrecv #----------------------------------------------------------------------------- # Benchmarking Sendrecv # #processes = 2 #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 63.55 63.57 63.56 0.00 1 1000 62.30 62.32 62.31 0.03 2 1000 53.16 53.17 53.16 0.08 4 1000 69.16 69.16 69.16 0.12 8 1000 63.27 63.27 63.27 0.25 16 1000 62.46 62.47 62.46 0.51 32 1000 57.75 57.76 57.76 1.11 64 1000 62.57 62.60 62.58 2.04 128 1000 45.21 45.23 45.22 5.66 256 1000 45.04 45.08 45.06 11.36 512 1000 50.28 50.28 50.28 20.37 1024 1000 60.76 60.78 60.77 33.69 2048 1000 81.36 81.38 81.37 50.33 4096 1000 121.30 121.37 121.33 67.50 8192 1000 140.51 140.63 140.57 116.50 16384 1000 232.06 232.14 232.10 141.16 32768 1000 373.63 373.74 373.69 175.35 65536 640 799.55 799.92 799.74 163.86 131072 320 1473.76 1474.09 1473.92 177.83 262144 160 2806.43 2808.14 2807.28 186.70 524288 80 6031.64 6033.80 6032.72 173.78 1048576 40 9327.35 9330.27 9328.81 224.77 2097152 20 19665.44 19818.81 19742.13 211.63 4194304 10 50839.90 52294.80 51567.35 160.41 # All processes entering MPI_Finalize
Next, edit the /etc/hosts
file and add an alias for the above IP addresses over Fabric (192.168.100.101
, 192.168.100.102
).
# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.23.3.27 ebi2s28c01 10.23.3.148 ebi2s28c02 192.168.100.101 ebi2s28c01-opa 192.168.100.102 ebi2s28c02-opa
To use Intel OPA, users can specify “-genv I_MPI_FABRICS shm:tmi -genv I_MPI_TMI_PROVIDER psm2
” or just “–PSM2
”. Note that PSM2 stands for Intel® Performance Scaled Messaging 2, a high-performance vendor-specific protocol that provides a low-level communications interface for the Intel® Omni-Path family of products. By using Intel OPA, the performance increases significantly:
[root@ebi2s28c01 ~]# mpirun -PSM2 -host ebi2s28c01-opa -n 1 /opt/intel/impi/2017.1.132/bin64/IMB-MPI1 Sendrecv : -host ebi2s28c02-opa -n 1 /opt/intel/impi/2017.1.132/bin64/IMB-MPI1 Source Parallel Studio Intel(R) Parallel Studio XE 2017 Update 1 for Linux* Copyright (C) 2009-2016 Intel Corporation. All rights reserved. #------------------------------------------------------------ # Intel (R) MPI Benchmarks 2017, MPI-1 part #------------------------------------------------------------ # Date : Fri Jan 27 22:31:23 2017 # Machine : x86_64 # System : Linux # Release : 3.10.0-327.el7.x86_64 # Version : #1 SMP Thu Oct 29 17:29:29 EDT 2015 # MPI Version : 3.1 # MPI Thread Environment: # Calling sequence was: # /opt/intel/impi/2017.1.132/bin64/IMB-MPI1 Sendrecv # Minimum message length in bytes: 0 # Maximum message length in bytes: 4194304 # # MPI_Datatype : MPI_BYTE # MPI_Datatype for reductions : MPI_FLOAT # MPI_Op : MPI_SUM # # # List of Benchmarks to run: # Sendrecv #----------------------------------------------------------------------------- # Benchmarking Sendrecv # #processes = 2 #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 1.09 1.09 1.09 0.00 1 1000 1.05 1.05 1.05 1.90 2 1000 1.07 1.07 1.07 3.74 4 1000 1.08 1.08 1.08 7.43 8 1000 1.02 1.02 1.02 15.62 16 1000 1.24 1.24 1.24 25.74 32 1000 1.21 1.22 1.22 52.63 64 1000 1.26 1.26 1.26 101.20 128 1000 1.20 1.20 1.20 213.13 256 1000 1.31 1.31 1.31 392.31 512 1000 1.36 1.36 1.36 754.03 1024 1000 2.16 2.16 2.16 948.54 2048 1000 2.38 2.38 2.38 1720.91 4096 1000 2.92 2.92 2.92 2805.56 8192 1000 3.90 3.90 3.90 4200.97 16384 1000 8.17 8.17 8.17 4008.37 32768 1000 10.44 10.44 10.44 6278.62 65536 640 17.15 17.15 17.15 7641.97 131072 320 21.90 21.90 21.90 11970.32 262144 160 32.62 32.62 32.62 16070.33 524288 80 55.30 55.30 55.30 18961.18 1048576 40 99.05 99.05 99.05 21172.45 2097152 20 187.19 187.30 187.25 22393.31 4194304 10 360.39 360.39 360.39 23276.25 # All processes entering MPI_Finalize
The graph below summarizes the results when running the benchmark using TCP and Intel OPA. The x-axis represents the message length in bytes, and the y-axis represents the throughput (Mbytes/sec).
Summary
This document showed how the Intel OP HFI cards were installed on two systems and connected back-to-back with an Intel OPA cable. Then Intel Omni-Path Fabric host software was installed on the host. All the configuration and verification steps were shown in detail to bring the necessary services up. Finally, a simple Intel MPI Benchmarks was run to illustrate the benefit of using Intel OPA.
References
- Intel® Omni-Path Fabric 100 Series
- Publications and Release Notes for Intel® Omni-Path Software
- Intel OP HFI Products
- https://downloadcenter.intel.com
- Intel® Omni-Path Fabric Host Software User Guide (Rev 5.0)
- Intel® Omni-Path Fabric Software Installation Guide (Rev 5.0)
- Intel® Performance Scaled Messaging 2 Programmer’s Guide (December 2016)
- Intel® Omni-Path Fabric Manager User Guide
- Intel OP HFI Platform Configuration Guide
- Selecting Fabric section in Intel® MPI Library Development Guide for Linux* OS