Intel® Xeon® Processor D Product Family Technical Overview

1. Form Factor Overview
2. Intel® Xeon® Processor D Product Family Overview
3. Intel® Xeon® Processor D Product Family Feature Overview
4. Intel® Xeon® processor D Product Family introduces new instructions as well as enhancements of previous instructions⁴
5. Intel® Advanced Vector Extensions 2 (Intel® AVX2) Instructions
6. VT Cache QoS Monitoring/Enforcement and Memory Bandwidth Monitoring⁴
7. A/D Bits for EPT
8. Intel® Virtual Machine Control Structure Shadowing (Intel® VMCS Shadowing).
9. APICv
10. Supervisor Mode Access Protection (SMAP)
11. RDSEED⁴
12. Intel ® Trusted Execution Technology (Intel® TXT)
13. Intel® Node Manager
14. RAS – Reliability Availability Serviceability
15. Intel® Processor Trace⁴
16. Non-Transparent Bridge (NTB)
17. Asynchronous DRAM Refresh (ADR)
18. Intel® QuickData Technology
19. Resources

1. Form Factor Overview

Microservers are an emerging form of servers designed to process lightweight, scale out workloads for hyper-scale data centers. They’re a good form factor example to use to describe the design considerations when implementing an Intel® SoC. Typical workloads suited for microservers include dynamic and static web page serving, entry dedicated hosting, cold and warm storage, and basic content delivery, among others. A microserver consists of a collection of nodes that share a common backplane. Each node contains a system-on-chip (SoC), local memory for the SoC, and ideally all required IO components for the desired implementation. Because of the microserver’s high-density and energy-efficient design, its infrastructure (including the fan and power supply) can be shared by tens or even hundreds of SoCs, eliminating the space and power consumption demands of duplicate infrastructure components. Even within the microserver category, there is no one-size-fits-all answer to system design or processor choice. Some microservers may have high-performing single-socket processors with robust memory and storage, while others may have a far higher number of miniature dense configurations with lower power and relatively lower compute capacity per SoC.

Image may be NSFW.
Clik here to view.
Figure 1. Comparison of server form factors

To meet the full breadth of these requirements, Intel provides a range of processors that provide a spectrum of performance options so companies can select what’s appropriate for their lightweight scale out workloads. The Intel® Xeon® processor D product family offers new options for infrastructure optimization, by bringing the performance and advanced intelligence of Intel® Xeon® processors into dense, lower-power SoCs. The Intel® Xeon® processor E3 family offers a choice of integrated graphics, node performance, performance per watt, and flexibility. The Intel® Atom™ processor C2000 product family provides extreme low power and higher density.

The Intel® Xeon® processor D-1500 product family is Intel’s first generation SoC that is based on Intel Xeon processor line and is manufactured using Intel’s low-power 14nm process. This SoC adds additional performance capabilities to Intel’s SoC line up with such features as hyperthreading, improved cache sizes, DDR4 memory capability, Intel® 10GbE Network Adapter and more. Power enhancements are also a point of focus with a SoC thermal design power of 20-45 Watts and additional power capabilities such as Intel® Node Manager. Multiple redundancy features are also available that help mitigate failures with memory and storage.

The data center environment is diversifying both in terms of the infrastructure and the market segments including storage, network, and cloud. Each area has unique requirements, providing opportunities for targeted solutions to best cover these needs. The Intel Xeon processor D-1500 product family extends market segment coverage beyond Intel’s previous microserver product line based on the Intel Atom processor C2000 product family. Cloud service providers can benefit from the SoC with compute-focused workloads associated with hyper scale out such as distributed memcaching, web frontend, content delivery, and dedicated hosting. The Intel Xeon processor D-1500 product family is also beneficial for mid-range network-focused workloads such as those associated with compact PCI advanced mezzanine cards (AMC) found in router mid-range control. For storage-focused workloads it can also provide benefit with entry enterprise SAN/NAS, cloud storage nodes, or warm cloud storage.

These SoCs offer a significant step up from the Intel® Atom™ SoC C2750, delivering up to 3.4 times the performance per node^1,3 and up to 1.7x estimated better performance per watt.^2,3 With exceptional node performance, up to 12 MB of last level cache, and support for up to 128 GB of high-speed DDR4 memory, these SoCs are ideal for emerging lightweight hyper-scale workloads, including memory caching, dynamic web serving, and dedicated hosting.

2. Intel® Xeon® Processor D Product Family Overview

Table 1 provides a high-level summary of the hardware differences between the Intel Xeon processor D-1500 product family and the Intel Atom SoC C2000 product family. Some of the more notable changes introduced with the Intel Xeon processor D-1500 product family include Intel® Hyper-Threading Technology (Intel® HT Technology), an L3 cache, greater memory capacity and speed, C-states, and more.

Table 1. Comparison of the Intel® Atom™ Processor C2000 Product Family to the Intel® Xeon® Processor D Product Family

	Intel® Atom™ Processor C2000 Product Family on the Edisonville platform	Intel® Xeon® Processor D-1500 Product Family on the Grangeville platform
Silicon Core Process technology	22nm	14nm
Core / Thread Count	Up to 8 cores / 8 threads	Up to 8 cores / 16 threads
Core Frequency	Up to 2.4GHz (2.6GHz with Turbo)	Up to 2.0Ghz (2.6Ghz with Turbo)
L1 Cache	32KB Data, 24KB Instruction per core	32KB Data, 32KB Instruction per core
L2 Cache	1MB shared per 2 cores	256K per core
L3 Cache	None	1.5Mb per core
SoC Thermal Design Power	5W - 20W	~20W - 45W
C-states	No	Yes
Memory Addressing	38 bits physical / 48 bits virtual	48 bits physical / 48 bits virtual
Memory	2 Channels 2 DIMMs per ch 1600 DDR3/L	2 Channels 2 DIMMs per ch 1600 DDR3/L 2133 DDR4
	64GB Max capacity	128GB Max capacity
	SODIMM, UDIMM, VLP UDIMM ECC	RDIMM, UDIMM, SODIMM ECC
IO: PCI Express* (PCIe) lanes	16x PCIe G2	24x Gen3, 8x Gen2
IO: GbE	4x 1GbE/2.5GbE	2x 1GbE / 2.5GbE / 10GbE
IO: SATA ports	4x SATA2, 2x SATA3	6x SATA3
IO: USB ports	4x USB 2.0	4x USB 2.0, 4x USB 3.0

Image may be NSFW.
Clik here to view.
Figure 2. A block diagram of the Intel® Xeon® processor D-1500 product

3. Intel® Xeon® Processor D Product Family Feature Overview

The rest of this paper discusses some of the new features in the Intel Xeon processor D-1500 product family. In Table 2 the items denoted with a⁴ have been newly introduced with this version of the silicon, while the other features are new to the entire Intel SoC product line, which previously contained only Intel Atom processors. Some of the features previously existed on other Intel Xeon processor product families, but are new to Intel’s SoC product line.

Table 2. Features and associated workload segments

Features/Technologies	COMPUTE: Hyper Scale Out, Distributed Memcaching, Web Frontend, Content Delivery, Dedicated Hosting	NETWORK: Router Mid Control such as with high density, compact PCI Advanced Mezzanine Cards (AMC)
New or Enhanced Instructions (ADC, SBB, ADCX, ADOX, PREFETCHW, MWAIT)⁴	√	√
Intel® Advanced Vector Extensions 2 (Intel® AVX2)	√	√
VT Cache QoS Monitoring/Enforcement⁴	√	√
Memory Bandwidth Monitoring⁴	√	√
A/D Bits for EPT	√	√
Intel® Virtual Machine Control Structure Shadowing (Intel® VMCS Shadowing)	√	√
Posted Interrupts	√	v
APICv	√	√
RDSEED⁴	√	√
Supervisor Mode Access Protection (SMAP)⁴	√	√
Intel® Trusted Execution Technology	√	√
Intel® Node Manager	√	√
RAS	√	√
Intel® Processor Trace⁴	√	√
Intel® QuickAssist Technology		v
Intel^® Quick Data Technology
Non-Transparent Bridge
Asynchronous DRAM Refresh

4. Intel® Xeon® processor D Product Family introduces new instructions as well as enhancements of previous instructions⁴

ADCX (unsigned integer add with carry) and ADOX (unsigned integer add with overflow) have been introduced for Asymmetric Crypto Assist⁵ in addition to faster ADC/SSB instructions (no re-compilation required for ADC/SSB benefits). ADCX and ADOX are extensions of ADC (add with carry) and ADO (add with overflow) instructions for use in large integer arithmetic, greater than 64 bits. Performance improvements are due to two parallel carry chains being supported at the same time. ADOX/ADCX can be combined with MULX for additional performance improvements with public key encryption such as RSA. Large integer arithmetic is also used for Elliptic Curve Cryptography (ECC) and Diffie-Hellman (DH) Key Exchange. Beyond cryptography, there are many use cases in complex research and high performance computing (HPC). The demand for this functionality is high enough to warrant a number of commonly used optimized libraries, such as the GNU Multi-Precision (GMP) library (e.g., Mathematica), see New Instructions Supporting Large Integer Arithmetic on Intel® Architecture Processors. To take advantage of these new instructions you need to obtain a new software library and recompilation (Intel® Compiler 14.1+, GCC 4.7+, and Microsoft Visual Studio* 2013+).

MWAIT extensions for advanced power management can be used by the Operating System to implement power management policy.

PREFETCHW, which prefetches data into cache in anticipation of a write, now helps optimization with the network stack.

For more information about these instructions see the Intel® 64 and IA-32 Architectures Developer’s Manual. Currently, Intel® Compiler 14.1+, GCC 4.7+, and Microsoft Visual Studio* 2013+ support these instructions.

5. Intel® Advanced Vector Extensions 2 (Intel® AVX2) Instructions

With Intel® AVX, all the floating point vector instructions were extended from 128 bit to 256 bits. The Intel Xeon processor D Family further improves performance by reducing floating point multiply (MULPS, PD) to 3 cycles vs 5 cycles on the previous generation of Intel Xeon processor. Intel® AVX2 also extends the integer vector instructions to 256 bits. Intel AVX2 uses the same 256 bit YMM registers as Intel AVX. Intel AVX2 instructions benefit high performance computing (HPC) applications, databases, and audio and video applications. Intel AVX2 instructions include fused multiply add (FMA), gather, shifts, and permute instructions.

The FMA instruction computes ±(a×b)±c with only one rounding. axb intermediate results are not rounded and therefore bring increased accuracy compared to MUL and ADD instructions. FMA increases performance and accuracy of many floating point computations such as matrix multiplication, dot product and polynomial evaluation. With 256 bits, we can have 8 single precision and 4 double precision FMA operations. Since FMA combines 2 operations into one, floating point operations per second (FLOPS) are increased. Additionally, because there are 2 FMA units, the peak FLOPS are doubled.

The gather instruction loads sparse elements to a single vector. It can gather 8 single precision (Dword) or 4 double precision (Qword) data elements into a vector register in a single operation. A base address points to the data structure in memory, and an Index (offset) gives the offset of each element from the base address. The mask register tracks which elements need to be gathered. Gather is complete when the mask register is all zeros. The gather instruction enables vectorization for workloads that could previously not be vectorized for various reasons.

Intel Xeon processor D product family adds additional hardware capability with a gather index table (GIT) to improve performance (Figure 3). No recompiling is required to take advantage of this new feature. The GIT provides storage for full width indices near the address generation unit. A special load grabs the correct index, simplifying the index handling. Loaded elements are merged directly into the destination.

Image may be NSFW.
Clik here to view.
Figure 3. Gather Index Table Conceptual Block Diagram

Other new operations in Intel AVX2 include integer versions of permute instructions, new broadcast instructions, and blend instructions. A 1,024 radix divider for reduced latency, along with a "split" operation for scalar divides, where two scalar divides occur simultaneously, improve performance over previous generations of Intel Xeon processors.

Currently, the Intel Compiler 14.1+, GCC 4.7+, and Microsoft Visual Studio 2013+ support these instructions.

6. VT Cache QoS Monitoring/Enforcement and Memory Bandwidth Monitoring⁴

The Intel Xeon processor D product family has the ability to monitor the last level of processor cache on a per-thread, application, or VM basis. This allows the VMM or OS scheduler to make changes based on policy enforcement. One scenario where this can be of benefit is if you have a multi-tenant environment and a VM is causing a lot of thrash with the cache. This feature allows the VMM or OS to migrate this “noisy neighbor” to a different location where it may have less of an impact on other VMs. This product family also introduces a new capability to manage the processor LLC based on pre-defined levels of service, independent of the OS or VMM. A QoS mask can be used to provide 16 different levels of enforcement to limit the amount of cache that a thread can consume.

Intel® 64 and IA-32 Architectures Software Developer’s Manual (SDM) volume-3 chapter-17.14 provides the CQM & MBM programming details. Chapter 17.15 provides the CQE programming details. To read the raw value from the IA32_QM_CTR register, multiply by a factor given in the CPUID field CPUID.0xF.1:EBX to convert to bytes.

For additional resources see: Benefits of Intel Cache Monitoring Technology in the Intel® Xeon™ Processor E5 v3 Family, IntelRLI Cache Monitoring Technology Software-Visible Interfaces, Intel's Cache Monitoring Technology: Use Models and Data, or Intel's Cache Monitoring Technology: Software Support and Tools

Image may be NSFW.
Clik here to view.
Figure 4. Cache and memory bandwidth monitoring and enforcement vectors.

Another new capability enables the OS or VMM to monitor memory bandwidth. This allows scheduling decisions to be made based on memory bandwidth usage on a per core or thread basis. An example of this situation is when one core is being heavily utilized by two applications, while another core is being underutilized by two other applications. With memory bandwidth monitoring the OS or VMM now has the ability to schedule a VM or an application to a different core to balance out memory bandwidth utilization. In Figure 5 two high memory bandwidth applications are competing for the same resource. The OS or VMM can move one of the high bandwidth memory applications to another resource to balance out the load on the cores.

Image may be NSFW.
Clik here to view.
Figure 5. Memory Bandwidth Monitoring use case

7. A/D Bits for EPT

In the previous generation, accessed and dirty bits (A/D bits) were emulated in VMM and accessing them caused VM exits. EPT A/D bits are implemented in hardware to reduce VM exits. This enables efficient live migration of VMs and fault tolerance.

Image may be NSFW.
Clik here to view.
Figure 6. VM exits with EPT A/D in hardware vs emulation

This feature requires enabling VT-x at the BIOS level. Currently it is supported by KVM with 3.6+ kernel and Xen* 4.3+. For other VM providers please contact them to find out when this feature will be supported.

8. Intel® Virtual Machine Control Structure Shadowing (Intel® VMCS Shadowing)

Nested virtualization allows a root Virtual Machine Monitor (VMM) to support guest VMMs. However, additional Virtual Machine (VM) exits can impact performance. As shown in Figure 7, Intel® VMCS Shadowing directs the guest VMM VMREAD/VMWRITE to a VMCS shadow structure. This reduces nesting induced VM exits. Intel VMCS Shadowing increases efficiency by reducing virtualization latency.

Image may be NSFW.
Clik here to view.
Figure 7. VM exits with Intel® VMCS Shadowing vs software-only

This feature requires enabling VT-x at the BIOS level. Currently it is supported by KVM with Linux Kernel 3.10+ and Xen 4.3+. For other VM providers please contact them to find out when this feature will be supported.

9. APICv

The Virtual Machine Monitor emulates most guest accesses to interrupts and the Advanced Programmable Interrupt Controller (APIC) in a virtual environment. This causes VM exits, creating overhead on the system. APICv offloads this task to the hardware, eliminating VM exits and increasing I/O throughput.

Image may be NSFW.
Clik here to view.
Figure 8. VM exits with APICv vs without APICv

This feature requires enabling VT-x at the BIOS level. Currently it is supported by KVM with Linux Kernel 3.10+, ESX(i)* 4.0+. For other VM providers please contact them to find out when this feature will be supported.

10. Supervisor Mode Access Protection (SMAP) 4

Supervisor Mode Access Protection (SMAP) is a new CPU-based mechanism for user-mode address-space protection. It extends the protection that previously was provided by Supervisor Mode Execution Prevention (SMEP). SMEP prevents supervisor mode execution from user pages, while SMAP prevents unintended supervisor mode accesses to data on user pages. There are legitimate instances where the operating system needs to access user pages, and SMAP does provide support for those situations.

Image may be NSFW.
Clik here to view.
Figure 9. SMAP conceptual diagram

SMAP was developed with the Linux community and is supported on kernel 3.12+ and KVM version 3.15+. Support for this feature depends on which operating system or VMM you are using.

11. RDSEED⁴

The RDSEED instruction is intended for seeding a Pseudorandom Number Generator (PRNG) of arbitrary width, which can be useful when you want to create stronger cryptography keys. If you do not need to seed another PRNG, then use the RDSEED instruction. For more information see Table 3, Figure 10, and The Difference Between RDRAND and RDSEED.

Table 3. RDSEED and RDRAND compliance and source information

Instruction	Source	NIST Compliance
RDRAND	Cryptographically secure pseudorandom number generator	SP 800-90A
RDSEED	Non-deterministic random bit generator	SP 800-90B & C (drafts)

Image may be NSFW.
Clik here to view.
Figure 10. RDSEED and RDRAND conceptual block diagram

Currently the Intel® Compiler 15+, GCC 4.8+, and Microsoft Visual Studio* 2013+ support RDSEED.

RDSEED loads a hardware-generated random value and stores it in the destination register. The random value is generated from an Enhanced NRBG (Non Deterministic Random Bit Generator) that is compliant with NIST SP800-90B and INST SP800-90C in the XOR construction mode.

In order for the hardware design to meet its security goals, the random number generator continuously tests itself and the random data it is generating. The self-test hardware detects run-time failures in the random number generator circuitry or statistically anomalous data occurring by chance and flags the resulting data as bad. In such extremely rare cases, the RDSEED instruction will return no data instead of bad data.

Intel C/C++ Compiler Intrinsic Equivalent:

RDSEED int_rdseed16_step( unsigned short * );
RDSEED int_rdseed32_step( unsigned int * );
RDSEED int_rdseed64_step( unsigned __int64 *);

As with RDRAND, RDSEED will avoid any OS or library enabling dependencies and can be used directly by any software at any protection level or processor state.

For more information see section 7.3.17.2 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual (SDM).

12. Intel ® Trusted Execution Technology (Intel® TXT)

Intel® TXT is the hardware basis for mechanisms that validate platform trustworthiness during boot and launch, which enables reliable evaluation of the computing platform and its protection level. Intel TXT is compact and difficult to defeat or subvert, and it allows for flexibility and extensibility to verify the integrity during boot and launch of platform components, including BIOS, operating system loader, and hypervisor. Because of the escalating sophistication of malicious threats, mainstream organizations must employ ever-more stringent security requirements and scrutinize every aspect of the execution environment.

Intel TXT reduces the overall attack surface for both individual systems and compute pools. The technology provides a signature that represents the state of an intact system’s launch environment. The corresponding signature at the time of future launches can then be compared against that known-good state to verify a trusted software launch, to execute system software, and to ensure that cloud infrastructure as a service (IaaS) has not been tampered with. Security policies based on a trusted platform or pool status can then be set to restrict (or allow) the deployment or redeployment of virtual machines (VMs) and data to trusted platforms with known security profiles. Rather than relying on the detection of malware, Intel TXT builds trust into a known software environment and thus ensures that the software being executed hasn’t been compromised. This advances security to address key stealth attack mechanisms used to gain access to parts of the data center in order to access or compromise information. Intel TXT works with Intel® Virtualization Technology (Intel® VT) to create a trusted, isolated environment for VMs.

Image may be NSFW.
Clik here to view.
Figure 11. Simplified Intel® TXT Component diagram

For more details on Intel TXT and its implementation see Intel ® TXT Enabling Guide.

13. Intel® Node Manager

Intel® Node Manager is a core set of power management features providing a smart way to optimize and manage power, cooling, and compute resources in the data center. This server management technology extends component instrumentation to the platform level and can be used to make the most of every watt consumed in the data center. First, Intel Node Manager reports vital platform information, such as power, temperature, and resource utilization using standards-based, out-of-band communications. Second, it provides fine-grained controls to limit platform power in compliance with IT policy. This feature can be found across Intel products segments providing consistency within the data center.

Table 4. Intel® Node Manager features

Image may be NSFW.
Clik here to view.

To use this feature you must enable the BMC LAN and the associated BMC user configuration at the BIOS level, which should be available under the server management menu. The Programmer’s Reference Kit is very simple to use and requires no additional external libraries to compile or run. All that is needed is a C/C++ compiler and to then run the configuration and compilation scripts.

Intel® Node Manager website

Intel® Node Manager Programmer’s Reference Kit

Open Source Reference Kit

How to set up Intel® Node Manager

14. RAS – Reliability Availability Serviceability

Server reliability, availability, and serviceability (RAS) are crucial issues for modern enterprise IT data centers that deliver mission-critical applications and services, as application delivery failures can be extremely costly per hour of system downtime. Furthermore, the likelihood of such failures increases statistically with the size of the servers, data, and memory required for these deployments. The Intel Xeon processor D product family offers a set of RAS features in silicon to provide error detection, correction, containment, and recovery. This feature set is a powerful foundation for hardware and software vendors to build higher-level RAS layers and provide overall server reliability across the entire hardware-software stack from silicon to application delivery and services. Table 5 shows a comparison of the RAS features available on the Intel Xeon processor D product family vs the Intel Atom processor C2000 series.

Table 5. Comparison of RAS features

Category	Feature	Intel® Atom™ Processor C2000 Product Family on the Edisonville platform	Intel® Xeon® Processor D-1500 Product Family on the Grangeville platform
Memory	ECC	√	√
Memory	Error detection and correction coverage	√	√
Memory	Failed DIMM Identification	√	√
Memory	Memory Address Parity Protection on Reads/Writes	No	√
Memory	Memory Demand and Patrol Scrubbing	√	√
Memory	Memory Thermal Throttling	√	√
Memory	Memory BIST including Error Injection	No	√
Memory	Data Scrambling with address	√	√
Memory	SDDC	No	√
Platform	PCIe* Device Surprise Removal	No	√
Platform	PCIe and GbE Advanced Error Reporting (AER)	√	√
Platform	PCIe Device Hot Add / Remove / Swap	No	√
Platform	ECRC on PCIe	No	√
Platform	Data Poisoning - Containment	Via parity	√
Platform	Corrected Error Cloaking from OS	√	√
Platform	Disable CMCI	No CMCI support	√
Platform	Uncorrected error signaling to SMI (dual-signaling)	√	√
Platform	Intel® Silicon View Technology	No	√

15. Intel® Processor Trace⁴

Intel® Processor Trace enables low-overhead instruction tracing of workloads to memory. This can be of value for low-level debugging, fine tuning performance, or post-mortem analysis (core dumps, save on crash, etc.). The output includes control flow details, enabling precise reconstruction of the path of software execution. It also provides timing information, software context details, processor frequency indication and more. Intel Processor Trace has a sampling mode to estimate the number of function calls and loop iterations in an application being profiled. It has a limited impact to system execution and does not require any enabling, you simply need Intel® VTune™ Amplifier 2015 update 1 (and newer).

For additional information see the Intel® Processor Trace lecture or pdf given at IDF14.

Image may be NSFW.
Clik here to view.
Figure 12. Overview of Intel® Processor Trace

16. Non-Transparent Bridge (NTB)

Non-Transparent Bridge (NTB) reduces loss of data, allowing a secondary system to take over the PCIe* storage devices in the event of a CPU failure providing high-availability for your storage devices.

Image may be NSFW.
Clik here to view.
Figure 13. Overview of Non-Transparent Bridge with a local and remote host on the Intel® Xeon® processor D product family

17. Asynchronous DRAM Refresh (ADR)

Asynchronous DRAM Refresh (ADR) preserves key data in the battery-backed DRAM in the event of AC power supply failure.

Image may be NSFW.
Clik here to view.
Figure 14. Overview of Asynchronous DRAM Refresh

18. Intel® QuickData Technology

Intel® QuickData Technology is a platform solution designed to maximize the throughput of server data traffic across a broader range of configurations and server environments to achieve faster, scalable, and more reliable I/O. It enables the chipset instead of the CPU to copy data, which allows data to move more efficiently through the server. This technology is supported on Linux kernel 2.6.18+ and Windows* Server 2008 R2 and will require enabling within the BIOS.

For more information, see the Intel® QuickData Technology Software Guide for Linux.

Image may be NSFW.
Clik here to view.
Figure 15. Overview of Intel® QuickData Technology

19. Resources

Intel® Xeon® processor D product family performance comparisons for general compute, cloud, storage and network.

Intel® 64 and IA-32 Architectures Software Developer’s Manual (SDM)

Intel® Processor Trace IDF 2014 Video Presentation

Intel® Processor Trace IDF 2014 PDF Presentation

Benefits of Intel Cache Monitoring Technology in the Intel® Xeon™ Processor E5 v3 Family

Intel’s Cache Monitoring Technology Software-Visible Interfaces

Intel's Cache Monitoring Technology: Use Models and Data

Intel's Cache Monitoring Technology: Software Support and Tools

The Difference Between RDRAND and RDSEED

Intel® Node Manager website

Intel® Node Manager Programmer’s Reference Kit

Open Source Reference Kit

How to set up Intel® Node Manager

Intel® QuickData Technology Software Guide for Linux

Haswell Cryptographic Performance

Intel® TXT Enabling Guide

Intel® Atom™ processor C2000 product family

Intel® Xeon® processor E3 family

Up to 3.4x better performance on dynamic web serving Intel® Xeon® processor D-based reference platform with one Intel Xeon processor D (8C, 1.9GHz, 45W, ES2), Intel® Turbo Boost Technology enabled, Intel® Hyper-Threading Technology enabled, 64GB memory (4x16GB DDR4-2133 RDIMM ECC), 2x10GBase-T X552, 3x S3700 SATA SSD, Fedora* 20 (3.17.8-200.fc20.x86_64, Nginx* 1.4.4, Php-fpm* 15.4.14, Memcached* 1.4.14, Simultaneous users=43844 Supermicro SuperServer* 5018A-TN4 with one Intel® Atom™ processor C2750 (8C, 2.4GHz,20W), Intel Turbo Boost Technology enabled, 32GB memory (4x8GB DDR3-1600 SO-DIMM ECC), 1x10GBase-T X520, 2x S3700 SATA SSD, Ubuntu* 14.10 (3.16.0-23 generic), Nginx 1.4.4, Php-fpm 15.4.14, Memcached 1.4.14, Simultaneous users=12896.2
Up to 1.7x (estimated) better performance per watt on dynamic web serving Intel® Xeon® processor D-based reference platform with one Intel Xeon processor D (8C, 1.9GHz, 45W, ES2), Intel® Turbo Boost Technology enabled, Intel® Hyper-Threading Technology enabled, 64GB memory (4x16GB DDR4-2133 RDIMM ECC), 2x10GBase-T X552, 3x S3700 SATA SSD, Fedora* 20 (3.17.8-200.fc20.x86_64, Nginx* 1.4.4, Php-fpm* 15.4.14, Memcached* 1.4.14, Simultaneous users=43844, Estimated wall power based on microserver chassis, power=90W, Perf/W=487.15 users/W Supermicro SuperServer* 5018A-TN4 with one Intel® Atom™ processor C2750 (8C, 2.4GHz,20W), Intel® Turbo Boost Technology enabled, 32GB memory (4x8GB DDR3-1600 SO-DIMM ECC), 1x10GBase-T X520, 2x S3700 SATA SSD, Ubuntu* 14.10 (3.16.0-23 generic), Nginx 1.4.4, Php-fpm 15.4.14, Memcached 1.4.14, Simultaneous users=12896. Maximum wall power =46W, Perf/W=280.3 users/W
Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http ://www.intel.com/performance.
New feature introduced with the Intel® Xeon® processor D product family. Intel technologies may require enabled hardware, specific software, or services activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer.
Intel® processors do not contain crypto algorithms, but support math functionality that accelerates the sub-operations.

Contents