Quantcast
Channel: Intel Developer Zone Articles
Viewing all articles
Browse latest Browse all 3384

Performance Optimization of Intel® Xeon® Processor Using Intel® Data Analytics Acceleration Library

$
0
0

Abstract

This article provides a comparative study of the performance of the Intel® Xeon® Gold processor when the Naive Bayes algorithm is taken from the textbook Artificial Intelligence: A Modern Approach (AIMA) by Stuart Russell and Peter Norvig. scikit-learn* (SkLearn), and the PyDAAL programming interface are run to show the advantage of the Intel® Data Analytics Acceleration Library (Intel® DAAL). The accuracy of the above-mentioned varieties of the Naive Bayes classifier in Intel® Xeon® processors was calculated and compared. It was observed that the performance of Naive Bayes is considerably better in PyDAAL (multinomial) as compared to the performance of SkLearn and AIMA. It was also observed that the performance was better in SkLearn as compared with AIMA.

Test and System Configuration 

Environment setup

We used the following environment setup to run the code and determine the test processor performance.

ProcessorSystemCoresStorage (RAM)Python* VersionPyDAAL Version
Intel® Xeon® Gold 6128 processor 3.40 GHzCentOS* (7.4.1708)2492 GB3.6.22018.0.0.20170814

Test setup

We used the following conventions and methods to perform the test and compare the values:

  • To run the Naive Bayes classifier from PyDAAL, we used the Conda* virtual environment.
  • The Naive Bayes classifier described in AIMA is available in the learning_apps.ipynb file from the GitHub* code.
  • Calculated average execution time and accuracy of learning_apps.ipynb (converted to .py) with Naive Bayes learner from AIMA.
  • Calculated average execution time and accuracy of learning_apps.ipynb (converted to .py) with Naive Bayes classifier from SkLearn and PyDAAL.
  • To calculate the average execution time, Linux* time command is used:
    • Example: time(cmd="python learning_apps.py"; for i in $(seq 10); do $cmd; done)
    • Average execution time = time/10.
  • To calculate accuracy, the accuracy_score method in SkLearn is used in all cases.
  • Performance gain percentage = ((AIMA - PyDAAL)/AIMA) × 100 or ((SkLearn - PyDAAL)/SkLearn) × 100.
  • Performance Improvement (x) = AIMA(s)/PyDAAL(s) or Sklearn (s)/PyDAAL(s).
  • The higher the value of the performance gain percentage, the better the performance of PyDAAL.
  • Performance improvement (x) value greater than 1 indicates better performance for PyDAAL.
  • Only the Naive Bayes part of the learning_apps.ipynb file are compared.

Code and conditional probability

The Naive Bayes learner part of the code given in AIMA was compared to the corresponding implementation from SkLearn (Gaussian and multinomial) and PyDAAL (multinomial). The following are the relevant code samples:

AIMA
from learning import

temp_train_lbl = train_lbl.reshape((60000,1))
training_examples = np.hstack((train_img, temp_train_lbl))

MNIST_DataSet = DataSet(examples=training_examples, distance=manhattan_distance)
nBD = NaiveBayesLearner(MNIST_DataSet, continuous=False)
y_pred = np.empty(len(test_img),dtype=np.int)
for i in range (0,len(test_img)-1):
y_pred[i] = nBD(test_img[i])

temp_test_lbl = test_lbl.reshape((10000,1))
temp_y_pred_np = y_pred.reshape((10000,1))
SkLearn (Gaussian)
from sklearn.naive_bayes import GaussianNB

classifier=GaussianNB()
classifier = classifier.fit(train_img, train_lbl)

churn_predicted_target=classifier.predict(test_img)
SkLearn (multinomial)
from sklearn.naive_bayes import MultinomialNB

classifier=MultinomialNB()
classifier = classifier.fit(train_img, train_lbl)

churn_predicted_target=classifier.predict(test_img)
PyDAAL (multinomial)
from daal.data_management import HomogenNumericTable, BlockDescriptor_Float64, readOnly
from daal.algorithms import classifier
from daal.algorithms.multinomial_naive_bayes import training as nb_training
from daal.algorithms.multinomial_naive_bayes import prediction as nb_prediction

def getArrayFromNT(table, nrows=0):
bd = BlockDescriptor_Float64()
if nrows == 0:
nrows = table.getNumberOfRows()
table.getBlockOfRows(0, nrows, readOnly, bd)
npa = np.copy(bd.getArray())
table.releaseBlockOfRows(bd)
return npa

temp_train_lbl = train_lbl.reshape((60000,1))
train_img_nt = HomogenNumericTable(train_img)
train_lbl_nt = HomogenNumericTable(temp_train_lbl)
temp_test_lbl = test_lbl.reshape((10000,1))
test_img_nt = HomogenNumericTable(test_img)
nClasses=10
nb_train = nb_training.Online(nClasses)

# Pass new block of data from the training data set and dependent values to the algorithm
nb_train.input.set(classifier.training.data, train_img_nt)
nb_train.input.set(classifier.training.labels, train_lbl_nt)
# Update ridge regression model
nb_train.compute()
model = nb_train.finalizeCompute().get(classifier.training.model)

nb_Test = nb_prediction.Batch(nClasses)
nb_Test.input.setTable(classifier.prediction.data,  test_img_nt)
nb_Test.input.setModel(classifier.prediction.model, model)
predictions = nb_Test.compute().get(classifier.prediction.prediction)

predictions_np = getArrayFromNT(predictions)

The ‘learning_apps.ipynb’ of ‘aima-python-master’ is used as a reference code for the experiment. This file implements the classification of the MNIST dataset using ‘Naive Bayes classifier’ in a conventional way. But this consumes a lot of time for classifying the data.

In order to check for better performance, the same experiment is implemented using a high-performance data analytics library for Python* called PyDAAL. In this, the data structure mainly uses ‘NumericTables’, a generic datatype to represent data in the memory.

In the code, the data is loaded as train_img, train_lbl, test_img and test_lbl using the function ‘load_MNIST()’. The ‘train_img’ and the ‘test_img’ represent the train data and test data while train_lbl and test_lbl represent the labels used for training and testing. These input data are converted into 'HomogenNumericTable' after checking the ‘C-contiguous’ nature. This is done because the conversion can only happen if the input data is ‘C-contiguous’.

An algorithm object (nb_train) is created to train the multinomial Naive Bayes model in online processing mode. The two pieces of input, that is, data and labels, are set using the 'input.set' member methods of the ‘nb_train’ algorithm object. Further, the 'compute()' method is used to update the partial model. After creating the model, a test object (nb_Test) is defined. The testing data set and the trained model is passed to the algorithm using the methods input.setTable() and nbTest.input.setModel(), respectively. After finding the predictions using the ‘compute()’ method, the accuracy and time taken for the experiment are calculated. The ‘SkLearn’ library and the ‘time’ command in Linux are used for these calculations.

Another implementation of the same code was done using the ‘Multinomial Naive Bayes’ in SkLearn for the comparison with conventional method and PyDAAL.

On analyzing the time taken for the experiments, it is clear that PyDAAL has better time performance compared to the other methods.

  • The conditional probability distribution assumption made in AIMA is
    • A probability distribution formed by observing and counting examples.
    • If p is an instance of this class and o is an observed value, there are three main operations:
      • p.add(o) increments the count for observation o by 1.
      • p.sample() returns a random element from the distribution.
      • p[o] returns the probability for o (as in a regular ProbDist).
  • The conditional probability distribution assumption made in Gaussian Naive Bayes is Gaussian/normal distribution.
  • The conditional probability distribution assumption made in multinomial Naive Bayes is multinomial distribution.

Introduction

During the test, the Intel® Xeon® Gold processor was used to run the Naive Bayes from AIMA, SkLearn (Gaussian and multinomial), and PyDAAL (multinomial). To determine the performance improvement of the processors, we compared the accuracy percentage for all relevant scenarios. We also calculated the performance improvement (x) for PyDAAL when compared to the others. Naive Bayes (Gaussian) was not included in this calculation, assuming that it was more appropriate to compare the multinomial versions of both SkLearn and PyDAAL.

Observations

Intel® DAAL helps to speed up big data analysis by providing highly optimized algorithmic building blocks for all stages of data analytics (preprocessing, transformation, analysis, modeling, validation, and decision making) in batch, online, and distributed processing modes of computation.

  • Helps applications deliver better predictions faster
  • Analyzes larger data sets with the same compute resources
  • Optimizes data ingestion and algorithmic compute together for the highest performance
  • Supports offline, streaming, and distributed usage models to meet a range of application needs
  • Provides priority support―connect privately with Intel engineers for technical questions

Accuracy

We ran the Naive Bayes learner from AIMA and observed that both PyDAAL and SkLearn (multinomial) had the same percentage of accuracy (refer Test and System Configuration).

Figure 1 provides a graph of the accuracy values of Naive Bayes.

graph of accuracy values
Figure 1. Intel® Xeon® Gold 6128 processor—graph of accuracy values.

Benchmark results were obtained prior to the implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown". Implementation of these updates may make these results inapplicable to your device or system.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information, see Performance Benchmark Test Disclosure.

Configuration: Intel® Xeon® Gold 6128 processor 3.40 GHz; System CentOS* (7.4.1708); Cores 24; Storage (RAM) 92 GB; Python* Version 3.6.2; PyDAAL Version 2018.0.0.20170814.
Benchmark Source: Intel Corporation. See below for further notes and disclaimers.1

Performance improvement

The performance improvement (x) with respect to time among the Naive Bayes (AIMA and PyDAAL and also SkLearn and PyDAAL) was calculated, and observed that the performance (refer Test and System Configuration) was better on PyDAAL.

Figures 2 and 3 provide graphs of the performance improvement speedup values.

AIMA versus PyDAAL
Figure 2. Intel® Xeon® Gold 6128 processor—graph of AIMA versus PyDAAL performance improvement.

Benchmark results were obtained prior to the implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown". Implementation of these updates may make these results inapplicable to your device or system.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information, see Performance Benchmark Test Disclosure.

Configuration: Intel® Xeon® Gold 6128 processor 3.40 GHz; System CentOS* (7.4.1708); Cores 24; Storage (RAM) 92 GB; Python* Version 3.6.2; PyDAAL Version 2018.0.0.20170814.
Benchmark Source: Intel Corporation. See below for further notes and disclaimers.1

SkLearn versus PyDAAL
Figure 3. Intel® Xeon® Gold 6128 processor—graph of SkLearn versus PyDAAL performance improvement.

Benchmark results were obtained prior to the implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown". Implementation of these updates may make these results inapplicable to your device or system.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information, see Performance Benchmark Test Disclosure.

Configuration: Intel® Xeon® Gold 6128 processor 3.40 GHz; System CentOS* (7.4.1708); Cores 24; Storage (RAM) 92 GB; Python* Version 3.6.2; PyDAAL Version 2018.0.0.20170814.
Benchmark Source: Intel Corporation. See below for further notes and disclaimers.1

Summary

The optimization test on the Intel Xeon Gold processor illustrates that PyDAAL takes less time (see figure 4) and hence provides better performance (refer Test and System Configuration) when compared to AIMA and SkLearn. In this scenario, both SkLearn (multinomial) and PyDAAL had the same accuracy. The conditional probability distribution assumption made in AIMA is a simple distance measure. However, in SkLearn and PyDAAL, it is either Gaussian distribution or multinomial, which is the reason for the difference in accuracy observed.

graph of performance time.
Figure 4. Intel® Xeon® Gold 6128 processor—graph of performance time.

Benchmark results were obtained prior to the implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown". Implementation of these updates may make these results inapplicable to your device or system.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information, see Performance Benchmark Test Disclosure.

Configuration: Intel® Xeon® Gold 6128 processor 3.40 GHz; System CentOS* (7.4.1708); Cores 24; Storage (RAM) 92 GB; Python* Version 3.6.2; PyDAAL Version 2018.0.0.20170814.
Benchmark Source: Intel Corporation. See below for further notes and disclaimers.1

References

  1. AIMA code:
    https://github.com/aimacode/aima-python
  2. The AIMA data folder:
    https://github.com/aimacode/aima-python (download separately)
  3. Book:
    Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig

1Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information, visit www.intel.com/benchmarks.

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804


Viewing all articles
Browse latest Browse all 3384

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>