Getting Started with Parallel STL

Parallel STL is an implementation of the C++ standard library algorithms with support for execution policies, as specified in the working draft N4659 for the next version of the C++ standard, commonly called C++17. The implementation also supports the unsequenced execution policy specified in the ISO* C++ working group paper P0076R3.

Parallel STL offers efficient support for both parallel and vectorized execution of algorithms for Intel® processors. For sequential execution, it relies on an available implementation of the C++ standard library.

Parallel STL is available as a part of Intel® Parallel Studio XE and Intel® System Studio.

Prerequisites

To use Parallel STL, you must have the following software installed:

C++ compiler with:
- Support for C++11
- Support for OpenMP* 4.0 SIMD constructs
Intel® Threading Building Blocks (Intel® TBB) 2018

The latest version of the Intel® C++ Compiler is recommended for better performance of Parallel STL algorithms, comparing to previous compiler versions.

To build an application that uses Parallel STL on the command line, you need to set the environment variables for compilation and linkage. You can do this by calling suite-level environment scripts such as compilervars.{sh|csh|bat}, or you can set just the Parallel STL environment variables by running pstlvars.{sh|csh|bat} in <install_dir>/{linux|mac|windows}/pstl/bin.

<install_dir> is the installation directory, by default, it is:

For Linux* and macOS*:

For super-users: /opt/intel/compilers_and_libraries_<version>
For ordinary users: $HOME/intel/compilers_and_libraries_<version>

For Windows*:

<Program Files>\IntelSWTools\compilers_and_libraries_<version>

Using Parallel STL

Follow these steps to add Parallel STL to your application:

Add the <install_dir>/pstl/include folder to the compiler include paths. You can do this by calling the pstlvars script.
Add #include "pstl/execution" to your code. Then add a subset of the following set of lines, depending on the algorithms you intend to use:
- #include "pstl/algorithm"
- #include "pstl/numeric"
- #include "pstl/memory"
When using algorithms and execution policies, specify the namespaces std::execution in case of there is no vendor implementation of C++17 standard library or pstl::execution otherwise. See the 'Examples' section below.
For any of the implemented algorithms, pass one of the values seq, unseq, par or par_unseq as the first parameter in a call to the algorithm to specify the desired execution policy. The policies have the following meaning:

Execution policy
Meaning
seq
Sequential execution.
unseq
Try to use SIMD. This policy requires that all functions provided are SIMD-safe.
par
Use multithreading.
par_unseq
Combined effect of unseq and par.
Compile the code as C++11 (or later) and using compiler options for vectorization:
- For the Intel® C++ Compiler:
  - For Linux* and macOS*: -qopenmp-simd or -qopenmp
  - For Windows*: /Qopenmp-simd or /Qopenmp
- For other compilers, find a switch that enables OpenMP* 4.0 SIMD constructs.
To get good performance, specify the target platform. For the Intel C++ Compiler, some of the relevant options are:
- For Linux* and macOS*: -xHOST, -xSSE4.1, -xCORE-AVX2, -xMIC-AVX512.
- For Windows*: /QxHOST, /QxSSE4.1, /QxCORE-AVX2, /QxMIC-AVX512.
If using a different compiler, see its documentation.
Link with the Intel TBB dynamic library for parallelism. For the Intel C++ Compiler, use the options:
- For Linux* and macOS*: -tbb
- For Windows*: /Qtbb (optional, this should be handled by #pragma comment(lib, <libname>))

Execution policy	Meaning
`seq`	Sequential execution.
`unseq`	Try to use SIMD. This policy requires that all functions provided are SIMD-safe.
`par`	Use multithreading.
`par_unseq`	Combined effect of `unseq` and `par`.

Version Macros

Macros related to versioning, as described below. You should not redefine these macros.

PSTL_VERSION

Current Parallel STL version. The value is a decimal numeral of the form xyy where x is the major version number and yy is the minor version number.

PSTL_VERSION_MAJOR

PSTL_VERSION/100; that is, the major version number.

PSTL_VERSION_MINOR

PSTL_VERSION - PSTL_VERSION_MAJOR * 100; that is, the minor version number.

Macros

PSTL_USE_PARALLEL_POLICIES

This macro controls the use of parallel policies.

When set to 0, it disables the par and par_unseq policies, making their use a compilation error. It's recommended for code that only uses vectorization with unseq policy, to avoid dependency on Intel® TBB runtime library.

When the macro is not defined (default) or evaluates to a non-zero value all execution policies are enabled.

PSTL_USE_NONTEMPORAL_STORES

This macro enables the use of #pragma vector nontemporal in the algorithms std::copy, std::copy_n, std::fill, std::fill_n, std::generate, std::generate_n with the unseq policy. For further details about the pragma, see the User and Reference Guide for the Intel® C++ Compiler at https://software.intel.com/en-us/node/524559.

If the macro evaluates to a non-zero value, the use of #pragma vector nontemporal is enabled.

When the macro is not defined (default) or set to 0, the macro does nothing.

Examples

Example 1

The following code calls vectorized copy:

#include "pstl/execution"
#include "pstl/algorithm"
void foo(float* a, float* b, int n) {
    std::copy(pstl::execution::unseq, a, a+n, b);
}

Example 2

This example calls the parallelized version of fill_n:

#include <vector>
#include "pstl/execution"
#include "pstl/algorithm"

int main()
{
    std::vector<int> data(10000000);
    std::fill_n(pstl::execution::par_unseq, data.begin(), data.size(), -1);  // Fill the vector with -1

    return 0;
}

Implemented Algorithms

Parallel STL supports all of the aforementioned execution policies only for the algorithms listed in the following table. Adding a policy argument to any of the rest of the C++ standard library algorithms will result in sequential execution.

Algorithm	Algorithm page at cppreference.com
`adjacent_find`	http://en.cppreference.com/w/cpp/algorithm/adjacent_find
`all_of`	http://en.cppreference.com/w/cpp/algorithm/all_any_none_of
`any_of`	http://en.cppreference.com/w/cpp/algorithm/all_any_none_of
`copy`	http://en.cppreference.com/w/cpp/algorithm/copy
`copy_if`	http://en.cppreference.com/w/cpp/algorithm/copy
`copy_n`	http://en.cppreference.com/w/cpp/algorithm/copy_n
`count`	http://en.cppreference.com/w/cpp/algorithm/count
`count_if`	http://en.cppreference.com/w/cpp/algorithm/count
`destroy`	http://en.cppreference.com/w/cpp/memory/destroy
`destroy_n`	http://en.cppreference.com/w/cpp/memory/destroy_n
`equal`	http://en.cppreference.com/w/cpp/algorithm/equal
`exclusive_scan`	http://en.cppreference.com/w/cpp/algorithm/exclusive_scan
`fill`	http://en.cppreference.com/w/cpp/algorithm/fill
`fill_n`	http://en.cppreference.com/w/cpp/algorithm/fill_n
`find`	http://en.cppreference.com/w/cpp/algorithm/find
`find_end`	http://en.cppreference.com/w/cpp/algorithm/find_end
`find_first_of`	http://en.cppreference.com/w/cpp/algorithm/find_first_of
`find_if`	http://en.cppreference.com/w/cpp/algorithm/find
`find_if_not`	http://en.cppreference.com/w/cpp/algorithm/find
`for_each`	http://en.cppreference.com/w/cpp/algorithm/for_each
`for_each_n`	http://en.cppreference.com/w/cpp/algorithm/for_each_n
`generate`	http://en.cppreference.com/w/cpp/algorithm/generate
`generate_n`	http://en.cppreference.com/w/cpp/algorithm/generate_n
`inclusive_scan`	http://en.cppreference.com/w/cpp/algorithm/inclusive_scan
`is_heap`	http://en.cppreference.com/w/cpp/algorithm/is_heap
`is_heap_until`	http://en.cppreference.com/w/cpp/algorithm/is_heap_until
`is_partitioned`	http://en.cppreference.com/w/cpp/algorithm/is_partitioned
`is_sorted`	http://en.cppreference.com/w/cpp/algorithm/is_sorted
`is_sorted_until`	http://en.cppreference.com/w/cpp/algorithm/is_sorted_until
`lexicographical_compare`	http://en.cppreference.com/w/cpp/algorithm/lexicographical_compare
`max_element`	http://en.cppreference.com/w/cpp/algorithm/max_element
`merge`	http://en.cppreference.com/w/cpp/algorithm/merge
`min_element`	http://en.cppreference.com/w/cpp/algorithm/min_element
`minmax_element`	http://en.cppreference.com/w/cpp/algorithm/minmax_element
`mismatch`	http://en.cppreference.com/w/cpp/algorithm/mismatch
`move`	http://en.cppreference.com/w/cpp/algorithm/move
`none_of`	http://en.cppreference.com/w/cpp/algorithm/all_any_none_of
`partial_sort`	http://en.cppreference.com/w/cpp/algorithm/partial_sort
`partition_copy`	http://en.cppreference.com/w/cpp/algorithm/partition_copy
`reduce`	http://en.cppreference.com/w/cpp/algorithm/reduce
`remove_copy`	http://en.cppreference.com/w/cpp/algorithm/remove_copy
`remove_copy_if`	http://en.cppreference.com/w/cpp/algorithm/remove_copy
`replace`	http://en.cppreference.com/w/cpp/algorithm/replace
`replace_copy`	http://en.cppreference.com/w/cpp/algorithm/replace_copy
`replace_copy_if`	http://en.cppreference.com/w/cpp/algorithm/replace_copy
`replace_if`	http://en.cppreference.com/w/cpp/algorithm/replace
`search`	http://en.cppreference.com/w/cpp/algorithm/search
`search_n`	http://en.cppreference.com/w/cpp/algorithm/search_n
`sort`	http://en.cppreference.com/w/cpp/algorithm/sort
`stable_sort`	http://en.cppreference.com/w/cpp/algorithm/stable_sort
`swap_ranges`	http://en.cppreference.com/w/cpp/algorithm/swap_ranges
`transform`	http://en.cppreference.com/w/cpp/algorithm/transform
`transform_exclusive_scan`	http://en.cppreference.com/w/cpp/algorithm/transform_exclusive_scan
`transform_inclusive_scan`	http://en.cppreference.com/w/cpp/algorithm/transform_inclusive_scan
`transform_reduce`	http://en.cppreference.com/w/cpp/algorithm/transform_reduce
`uninitialized_copy`	http://en.cppreference.com/w/cpp/memory/uninitialized_copy
`uninitialized_copy_n`	http://en.cppreference.com/w/cpp/memory/uninitialized_copy_n
`uninitialized_default_construct`	http://en.cppreference.com/w/cpp/memory/uninitialized_default_construct
`uninitialized_default_construct_n`	http://en.cppreference.com/w/cpp/memory/uninitialized_default_construct_n
`uninitialized_fill`	http://en.cppreference.com/w/cpp/memory/uninitialized_fill
`uninitialized_fill_n`	http://en.cppreference.com/w/cpp/memory/uninitialized_fill_n
`uninitialized_move`	http://en.cppreference.com/w/cpp/memory/uninitialized_move
`uninitialized_move_n`	http://en.cppreference.com/w/cpp/memory/uninitialized_move_n
`uninitialized_value_construct`	http://en.cppreference.com/w/cpp/memory/uninitialized_value_construct
`uninitialized_value_construct_n`	http://en.cppreference.com/w/cpp/memory/uninitialized_value_construct_n
`unique_copy`	http://en.cppreference.com/w/cpp/algorithm/unique_copy

Known limitations

Parallel and vector execution is only supported for a subset of aforementioned algorithms if random access iterators are provided, while for the rest execution will remain serial.

Legal Information

Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
* Other names and brands may be claimed as the property of others.
© Intel Corporation

Getting Started with Parallel STL

Prerequisites

Using Parallel STL

Version Macros

Macros

Examples

Implemented Algorithms

Known limitations

Legal Information

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112