Quantcast
Channel: Intel Developer Zone Articles
Viewing all articles
Browse latest Browse all 3384

Using Intel® Omni-Path Architecture

$
0
0

This document shows how to install the Intel® Omni-Path Host Fabric Interface (Intel® OP HFI) card and the Intel® Omni-Path Architecture (Intel® OPA) software on the host. It also describes how to verify the link and port status, perform the necessary steps to bring the stack up, and run a test on the fabric.

Introduction to Intel® Omni-Path Architecture

Intel OPA is the latest generation of Intel’s high-performance fabric technology. It adds new capabilities to enhance high-performance computing performance, scalability, and quality of service. The Intel OPA components include Intel OP HFI, which provides fabric connectivity, switches that connect a scalable number of endpoints, copper, and optical cables, and a Fabric Manager (FM) that identifies all nodes, provides centralized provisioning, and monitors fabric resources. Intel OP HFI is the Intel OPA interface card which provides host-to-switch connectivity. Intel OP HFI can also connect directly to another HFI (back-to-back connectivity).

This paper focuses on how to install the Intel OP HFI, configure the IP over Fabric, and test the fabric using a prebuilt program. Two systems, each equipped with a preproduction Intel® Xeon® E5 processor, were used in this example. Both systems were running Red Hat Enterprise Linux* 7.2 and were equipped with Gigabit Ethernet adapters connected through a Gigabit Ethernet router.

Intel® Omni-Path Host Fabric Interface

Intel OP HFI is a standard PCIe* card that interfaces with a router or other HFI. There are two current models of Intel OP HFI: PCIe x16, which supports 100 Gbps, and PCIe x8, which supports 56 Gbps. Designed for low latency/high bandwidth, it can configure from 0 to 8 Virtual Lanes plus one management. MTU size can be configurable as 2, 4, 6, 8, or 10 KB.

Below is the picture of an Intel OP HFI PCIe x16 that was used in this test:

 PCIe x16

Hardware Installation

Two Intel® Xeon® processor-based servers running Red Hat Enterprise Linux 7.2 were used. They have Gigabit Ethernet adapters and are connected through a router. Their IP addresses were 10.23.3.27 and 10.23.3.148. In this example, we will install an Intel OP HFI PCIe x16 card on each server and use an Intel OPA cable to connect them in a back-to-back configuration.

First, power down the systems and then install an Intel OP HFI in an x16 PCIe slot in each system. Connect the two Intel OP HFIs with the Intel OPA cable and power up the systems. Verify that the solid-green LED of the Intel OP HFI is on; this indicates the Intel OP HFI link status is activated.

Intel OP HFI Link

Next, verify that the OS detects the Intel OP HFI by using the lspci command:

# lspci -vv | grep Omni
18:00.0 Fabric controller: Intel Corporation Omni-Path HFI Silicon 100 Series [discrete] (rev 11)
        Subsystem: Intel Corporation Omni-Path HFI Silicon 100 Series [discrete]

The first field of the display output (18:00.0) shows the PCI slot number, the second field shows the slot name “Fabric controller”, and the last field shows the device name which is the Intel OP HFI.

To verify the Intel OP HFI speed, type “lspci –vv” to display more details and search for the previous slot (18:00.0):

# lspci -vv

...................................................

18:00.0 Fabric controller: Intel Corporation Omni-Path HFI Silicon 100 Series [discrete] (rev 11)
        Subsystem: Intel Corporation Omni-Path HFI Silicon 100 Series [discrete]
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 38
        NUMA node: 0
        Region 0: Memory at a0000000 (64-bit, non-prefetchable) [size=64M]
        Expansion ROM at <ignored> [disabled]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <8us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L1, Exit Latency L0s <4us, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-

...................................................

LinkCap and LinkStat indicate a speed of 8 GT/s, width x16. This is Gen3 PCIe. It confirms that the optimal speed for this Intel OP HFI is used.

Host Software Installation

The Intel® Omni-Path Fabric host software is available in two versions:

  • Basic bundle: This software is usually installed on the compute nodes.
  • Intel® Omi-Path Fabric Suite (IFS) is a superset of the Basic bundle and is usually installed on a head/management node, which additionally containing the following packages:
    • FastFabric: A collection of utilities for installation, testing, and monitoring the fabric
    • Intel® Omni-Path Fabric Manager
    • FMGUI is Intel's Fabric Manager GUI

Each fabric requires at least one FM; however if many FMs coexist, only one FM is the master FM and others are standby FMs. The FM identifies all nodes, switches, and routers; it assigns local IDs, maintains all nodes with routing tables, and scans for change and reprograms automatically. It is recommended that the host memory must reserve 500 MB for Compute nodes, and 1 GB per each FM instance. The following figure shows all components of the Fabric Host Software Stack (source: “Intel® Omni-Path Fabric Host Software User Guide, Rev. 5.0”).

Host Software Stack Components

Both versions can be downloaded from https://downloadcenter.intel.com/search?keyword=Omni-Path. You need to install the required OS RPMs before installing the host software. Please refer to the Section 1.1.1.1 OS RPMs Installation Prerequisites in the document “Intel® Omni-Path Fabric Software Installation Guide (Rev 5.0)”.

In this example, we download the IFS package IntelOPA-IFS.RHEL72-x86_64.10.3.0.0.81.tgz, extract the package, and then run the installation script on both machines (both machines run the same OS and the same IFS package):

# tar -xvf IntelOPA-IFS.RHEL72-x86_64.10.3.0.0.81.tgz
# cd IntelOPA-IFS.RHEL72-x86_64.10.3.0.0.81/
# ./INSTALL -a
Installing All OPA Software
Determining what is installed on system...
-------------------------------------------------------------------------------
Preparing OFA 10_3_0_0_82 release for Install...
...
A System Reboot is recommended to activate the software changes
Done Installing OPA Software.
Rebuilding boot image with "/usr/bin/dracut -f"...done.

This requires a reboot on the host:

# reboot

After the system is rebooted, you load the Intel OP HFI driver, run lsmod, and then check for Intel OPA modules:

# modprobe hfi1
# lsmod | grep hfi1
hfi1                  633634  1
rdmavt                 57992  1 hfi1
ib_mad                 51913  5 hfi1,ib_cm,ib_sa,rdmavt,ib_umad
ib_core                98787  14 hfi1,rdma_cm,ib_cm,ib_sa,iw_cm,xprtrdma,ib_mad,ib_ucm,rdmavt,ib_iser,ib_umad,ib_uverbs,ib_ipoib,ib_isert
i2c_algo_bit           13413  2 ast,hfi1
i2c_core               40582  6 ast,drm,hfi1,ipmi_ssif,drm_kms_helper,i2c_algo_bit

For installation errors or additional information, you can refer to the /var/log/opa.log file.

Post Installation

To configure IP over Fabric from the Intel OPA software, run the script again:

# ./INSTALL

Then choose option 2) Reconfigure OFA IP over IB. In this example, we configure the IP address (IPoFabric) of the host as 192.168.100.101. You can verify the IP over Fabric Interface:

# more /etc/sysconfig/network-scripts/ifcfg-ib0
DEVICE=ib0
BOOTPROTO=static
IPADDR=192.168.100.101
BROADCAST=192.168.100.255
NETWORK=192.168.100.0
NETMASK=255.255.255.0
ONBOOT=yes
NM_CONTROLLED=no
CONNECTED_MODE=yes
MTU=65520

Bring the IP over Fabric Interface up, and then verify its IP address:

# ifup ib0
# ifconfig ib0
ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65520
        inet 192.168.100.101  netmask 255.255.255.0  broadcast 192.168.100.255
        inet6 fe80::211:7501:179:311  prefixlen 64  scopeid 0x20<link>
Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).
        infiniband 80:00:00:02:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  txqueuelen 256  (InfiniBand)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 16  bytes 2888 (2.8 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Note that common information is captured in /var/log/messages file and Intel OPA-related information is captured in /var/log/opa.log.

So far, we have installed the host software package on one host (10.23.3.27). We repeat the same procedure on the second host (10.23.3.148) and configure the IP over Fabric with 192.168.100.102 on that host. Verify that you can ping 192.168.100.102 from the first server:

# ping 192.168.100.102
PING 192.168.100.102 (192.168.100.102) 56(84) bytes of data.
64 bytes from 192.168.100.102: icmp_seq=1 ttl=64 time=1.34 ms
64 bytes from 192.168.100.102: icmp_seq=2 ttl=64 time=0.303 ms
64 bytes from 192.168.100.102: icmp_seq=3 ttl=64 time=0.253 ms
^C

And from the second server, verify that you can ping the IP over Fabric interface:

# ping 192.168.100.101
PING 192.168.100.101 (192.168.100.101) 56(84) bytes of data.
64 bytes from 192.168.100.101: icmp_seq=1 ttl=64 time=0.024 ms
64 bytes from 192.168.100.101: icmp_seq=2 ttl=64 time=0.043 ms
64 bytes from 192.168.100.101: icmp_seq=3 ttl=64 time=0.023 ms
^C

The opainfo command can be used to verify the fabric:

# opainfo
hfi1_0:1                           PortGUID:0x0011750101790311
   PortState:     Init (LinkUp)
   LinkSpeed      Act: 25Gb         En: 25Gb
   LinkWidth      Act: 4            En: 4
   LinkWidthDnGrd ActTx: 4  Rx: 4   En: 1,2,3,4
   LCRC           Act: 14-bit       En: 14-bit,16-bit,48-bit
   Xmit Data:                  0 MB Pkts:                    0
   Recv Data:                  0 MB Pkts:                    0
   Link Quality: 5 (Excellent)

This confirms the Intel OP HFI speed is 100 Gb (25 Gb x 4).

Next, you need to enable the Intel OPA Fabric Manager to run on one host and start the Intel OPA Fabric Manager. Note that the FM must go through many steps, including physical subnet establishment, subnet discovery, information gathering, LID assignment, path establishment, port configuration, switch configuration, and subnet activation.

# opaconfig –E opafm
# service opafm start
Redirecting to /bin/systemctl start  opafm.service

You can query the status of the Intel OPA Fabric Manager at any time:

# service opafm status
Redirecting to /bin/systemctl status  opafm.service
● opafm.service - OPA Fabric Manager
   Loaded: loaded (/usr/lib/systemd/system/opafm.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2017-01-09 23:42:41 EST; 21min ago
  Process: 6758 ExecStart=/usr/lib/opa-fm/bin/opafmd -D (code=exited, status=0/SUCCESS)
 Main PID: 6759 (opafmd)
   CGroup: /system.slice/opafm.service
           ├─6759 /usr/lib/opa-fm/bin/opafmd -D
           └─6760 /usr/lib/opa-fm/runtime/sm -e sm_0

# opainfo
hfi1_0:1                           PortGID:0xfe80000000000000:0011750101790311
   PortState:     Active
   LinkSpeed      Act: 25Gb         En: 25Gb
   LinkWidth      Act: 4            En: 4
   LinkWidthDnGrd ActTx: 4  Rx: 4   En: 3,4
   LCRC           Act: 14-bit       En: 14-bit,16-bit,48-bit       Mgmt: True
   LID: 0x00000001-0x00000001       SM LID: 0x00000001 SL: 0
   Xmit Data:                  0 MB Pkts:                   21
   Recv Data:                  0 MB Pkts:                   22
   Link Quality: 5 (Excellent)

The port state is “Active” which indicates the normal operating state of a fully functional link. To display Intel OP HFI port information and to monitor the link quality, run the opaportinfo command:

# opaportinfo
Present Port State:
Port 1 Info
   Subnet:   0xfe80000000000000       GUID: 0x0011750101790311
   LocalPort:     1                 PortState:        Active
   PhysicalState: LinkUp
   OfflineDisabledReason: None
   IsSMConfigurationStarted: True   NeighborNormal: True
   BaseLID:       0x00000001        SMLID:            0x00000001
   LMC:           0                 SMSL:             0
   PortType: Unknown                LimtRsp/Subnet:     32 us, 536 ms
   M_KEY:    0x0000000000000000     Lease:       0 s  Protect: Read-only
   LinkWidth      Act: 4            En: 4             Sup: 1,2,3,4
   LinkWidthDnGrd ActTx: 4  Rx: 4   En: 3,4           Sup: 1,2,3,4
   LinkSpeed      Act: 25Gb         En: 25Gb          Sup: 25Gb
   PortLinkMode   Act: STL          En: STL           Sup: STL
   PortLTPCRCMode Act: 14-bit       En: 14-bit,16-bit,48-bit Sup: 14-bit,16-bit,48-bit
   NeighborMode   MgmtAllowed:  No  FWAuthBypass: Off NeighborNodeType: HFI
   NeighborNodeGuid:   0x00117501017444e0   NeighborPortNum:   1
   Capability:    0x00410022: CN CM APM SM
   Capability3:   0x0008: SS
   SM_TrapQP: 0x0  SA_QP: 0x1
   IPAddr IPV6/IPAddr IPv4:  ::/0.0.0.0
   VLs Active:    8+1
   VL: Cap 8+1   HighLimit 0x0000   PreemptLimit 0x0000
   VLFlowControlDisabledMask: 0x00000000   ArbHighCap: 16  ArbLowCap: 16
   MulticastMask: 0x0    CollectiveMask: 0x0
   P_Key Enforcement: In: Off Out: Off
   MulticastPKeyTrapSuppressionEnabled:  0   ClientReregister  0
   PortMode ActiveOptimize: Off PassThru: Off VLMarker: Off 16BTrapQuery: Off
   FlitCtrlInterleave Distance Max:  1  Enabled:  1
     MaxNestLevelTxEnabled: 0  MaxNestLevelRxSupported: 0
   FlitCtrlPreemption MinInitial: 0x0000 MinTail: 0x0000 LargePktLim: 0x00
     SmallPktLimit: 0x00 MaxSmallPktLimit 0x00 PreemptionLimit: 0x00
   PortErrorActions: 0x172000: CE-UVLMCE-BCDCE-BTDCE-BHDR-BVLM
   BufferUnits:VL15Init 0x0110 VL15CreditRate 0x00 CreditAck 0x0 BufferAlloc 0x3
   MTU  Supported: (0x6) 8192 bytes
   MTU  Active By VL:
   00: 8192 01:    0 02:    0 03:    0 04:    0 05:    0 06:    0 07:    0
   08:    0 09:    0 10:    0 11:    0 12:    0 13:    0 14:    0 15: 2048
   16:    0 17:    0 18:    0 19:    0 20:    0 21:    0 22:    0 23:    0
   24:    0 25:    0 26:    0 27:    0 28:    0 29:    0 30:    0 31:    0
   StallCnt/VL:  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
   HOQLife VL[00,07]: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
   HOQLife VL[08,15]: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
   HOQLife VL[16,23]: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
   HOQLife VL[24,31]: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
   ReplayDepth Buffer 0x80; Wire 0x0c
   DiagCode: 0x0000    LedEnabled: Off
   LinkDownReason: None    NeighborLinkDownReason: None
   OverallBufferSpace: 0x0880
   Violations    M_Key: 0         P_Key: 0        Q_Key: 0

To display information about the Intel OP HFI:

# hfi1_control -i
Driver Version: 0.9-294
Driver SrcVersion: A08826F35C95E0E8A4D949D
Opa Version: 10.3.0.0.81
0: BoardId: Intel Omni-Path Host Fabric Interface Adapter 100 Series
0: Version: ChipABI 3.0, ChipRev 7.17, SW Compat 3
0: ChipSerial: 0x00790311
0,1: Status: 5: LinkUp 4: ACTIVE
0,1: LID=0x1 GUID=0011:7501:0179:0311


# systemctl stop firewalld
# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Thu 2017-01-12 21:35:18 EST; 1s ago
  Process: 137597 ExecStart=/usr/sbin/firewalld --nofork --nopid $FIREWALLD_ARGS (code=exited, status=0/SUCCESS)
 Main PID: 137597 (code=exited, status=0/SUCCESS)

Jan 12 21:34:13 ebi2s28c01.jf.intel.com firewalld[137597]: 2017-01-12 21:34:1...
Jan 12 21:35:17 ebi2s28c01.jf.intel.com systemd[1]: Stopping firewalld - dyna...
Jan 12 21:35:18 ebi2s28c01.jf.intel.com systemd[1]: Stopped firewalld - dynam...
Hint: Some lines were ellipsized, use -l to show in full.

Finally, you need to set up Secure Shell (SSH) password-less for the MPI test. When running a program on the target machine, SSH requires the target password to log on to the target machine and execute the program. To enable this transaction without manual intervention, you need to enable the ssh login without a password. To do this, first generate a pair of authentication keys on the host without entering a passphrase:

[host-device ~]$ ssh-keygen -t rsa

Then append the host machine new public key to the target machine public key using the command ssh-copy-id:

[host-device ~]$ ssh-copy-id <user>@192.168.100.102

Running an Intel® MPI Benchmarks Program

In this section, we use a benchmark program to observe the IP over Fabric performance. Note that the numbers are for reference and illustration purposes only, as the tests are running with preproduction servers.

On both systems, Intel® Parallel Studio 2017 Update 1 was installed. First, we run the Sendrecv benchmark between the servers using TCP protocol. Sendrecv is a Parallel Transfer Benchmark in the suite Intel® MPI Benchmarks IMB-MPI1, this tool is available with Intel Parallel Studio. To use the TCP protocol in this benchmark, users can specify “-genv I_MPI_FABRICS shm:tcp”:

[root@ebi2s28c01 ~]# mpirun -genv I_MPI_FABRICS shm:tcp -host ebi2s28c01 -n 1 /opt/intel/impi/2017.1.132/bin64/IMB-MPI1 Sendrecv : -host ebi2s28c02 -n 1 /opt/intel/impi/2017.1.132/bin64/IMB-MPI1
Source Parallel Studio
Intel(R) Parallel Studio XE 2017 Update 1 for Linux*
Copyright (C) 2009-2016 Intel Corporation. All rights reserved.
#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 2017, MPI-1 part
#------------------------------------------------------------
# Date                  : Fri Jan 13 21:02:36 2017
# Machine               : x86_64
# System                : Linux
# Release               : 3.10.0-327.el7.x86_64
# Version               : #1 SMP Thu Oct 29 17:29:29 EDT 2015
# MPI Version           : 3.1
# MPI Thread Environment:


# Calling sequence was:

# /opt/intel/impi/2017.1.132/bin64/IMB-MPI1 Sendrecv

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM
#
#

# List of Benchmarks to run:

# Sendrecv

#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 2
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000        63.55        63.57        63.56         0.00
            1         1000        62.30        62.32        62.31         0.03
            2         1000        53.16        53.17        53.16         0.08
            4         1000        69.16        69.16        69.16         0.12
            8         1000        63.27        63.27        63.27         0.25
           16         1000        62.46        62.47        62.46         0.51
           32         1000        57.75        57.76        57.76         1.11
           64         1000        62.57        62.60        62.58         2.04
          128         1000        45.21        45.23        45.22         5.66
          256         1000        45.04        45.08        45.06        11.36
          512         1000        50.28        50.28        50.28        20.37
         1024         1000        60.76        60.78        60.77        33.69
         2048         1000        81.36        81.38        81.37        50.33
         4096         1000       121.30       121.37       121.33        67.50
         8192         1000       140.51       140.63       140.57       116.50
        16384         1000       232.06       232.14       232.10       141.16
        32768         1000       373.63       373.74       373.69       175.35
        65536          640       799.55       799.92       799.74       163.86
       131072          320      1473.76      1474.09      1473.92       177.83
       262144          160      2806.43      2808.14      2807.28       186.70
       524288           80      6031.64      6033.80      6032.72       173.78
      1048576           40      9327.35      9330.27      9328.81       224.77
      2097152           20     19665.44     19818.81     19742.13       211.63
      4194304           10     50839.90     52294.80     51567.35       160.41


# All processes entering MPI_Finalize

Next, edit the /etc/hosts file and add an alias for the above IP addresses over Fabric (192.168.100.101, 192.168.100.102).

# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.23.3.27  ebi2s28c01
10.23.3.148 ebi2s28c02
192.168.100.101 ebi2s28c01-opa
192.168.100.102 ebi2s28c02-opa

To use Intel OPA, users can specify “-genv I_MPI_FABRICS shm:tmi -genv I_MPI_TMI_PROVIDER psm2” or just “–PSM2”. Note that PSM2 stands for Intel® Performance Scaled Messaging 2, a high-performance vendor-specific protocol that provides a low-level communications interface for the Intel® Omni-Path family of products. By using Intel OPA, the performance increases significantly:

[root@ebi2s28c01 ~]# mpirun -PSM2 -host ebi2s28c01-opa -n 1 /opt/intel/impi/2017.1.132/bin64/IMB-MPI1 Sendrecv : -host ebi2s28c02-opa -n 1 /opt/intel/impi/2017.1.132/bin64/IMB-MPI1
Source Parallel Studio
Intel(R) Parallel Studio XE 2017 Update 1 for Linux*
Copyright (C) 2009-2016 Intel Corporation. All rights reserved.
#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 2017, MPI-1 part
#------------------------------------------------------------
# Date                  : Fri Jan 27 22:31:23 2017
# Machine               : x86_64
# System                : Linux
# Release               : 3.10.0-327.el7.x86_64
# Version               : #1 SMP Thu Oct 29 17:29:29 EDT 2015
# MPI Version           : 3.1
# MPI Thread Environment:


# Calling sequence was:

# /opt/intel/impi/2017.1.132/bin64/IMB-MPI1 Sendrecv

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM
#
#

# List of Benchmarks to run:

# Sendrecv

#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 2
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000         1.09         1.09         1.09         0.00
            1         1000         1.05         1.05         1.05         1.90
            2         1000         1.07         1.07         1.07         3.74
            4         1000         1.08         1.08         1.08         7.43
            8         1000         1.02         1.02         1.02        15.62
           16         1000         1.24         1.24         1.24        25.74
           32         1000         1.21         1.22         1.22        52.63
           64         1000         1.26         1.26         1.26       101.20
          128         1000         1.20         1.20         1.20       213.13
          256         1000         1.31         1.31         1.31       392.31
          512         1000         1.36         1.36         1.36       754.03
         1024         1000         2.16         2.16         2.16       948.54
         2048         1000         2.38         2.38         2.38      1720.91
         4096         1000         2.92         2.92         2.92      2805.56
         8192         1000         3.90         3.90         3.90      4200.97
        16384         1000         8.17         8.17         8.17      4008.37
        32768         1000        10.44        10.44        10.44      6278.62
        65536          640        17.15        17.15        17.15      7641.97
       131072          320        21.90        21.90        21.90     11970.32
       262144          160        32.62        32.62        32.62     16070.33
       524288           80        55.30        55.30        55.30     18961.18
      1048576           40        99.05        99.05        99.05     21172.45
      2097152           20       187.19       187.30       187.25     22393.31
      4194304           10       360.39       360.39       360.39     23276.25


# All processes entering MPI_Finalize

The graph below summarizes the results when running the benchmark using TCP and Intel OPA. The x-axis represents the message length in bytes, and the y-axis represents the throughput (Mbytes/sec).

Sendrecv Benchmark Chart

Summary

This document showed how the Intel OP HFI cards were installed on two systems and connected back-to-back with an Intel OPA cable. Then Intel Omni-Path Fabric host software was installed on the host. All the configuration and verification steps were shown in detail to bring the necessary services up. Finally, a simple Intel MPI Benchmarks was run to illustrate the benefit of using Intel OPA.

References


Viewing all articles
Browse latest Browse all 3384

Trending Articles