Quantcast
Channel: Intel Developer Zone Articles
Viewing all articles
Browse latest Browse all 3384

Code Example of Power/Performance Optimization on Android* Using Intel® Intrinsics

$
0
0

Introduction

It goes without saying that battery life, especially of mobile devices, is critically important for users. We’ve all been in situations where we lose power right when we need it the most—navigating a new city, mid-conversation on an important call, and so on. It may not be completely intuitive, but by optimizing application performance, developers reduce power consumption and that helps users.

Analyzing Apps with a Combination of Intel® Graphics Performance Analyzers  + VTune™ Amplifier

What is the first step to improve the power/performance of your application? First, you have to understand whether your app is CPU or GPU bound. And you can do it using a combination of Intel® tools:

Intel® Graphics Performance Analyzers or GPA is a tool for graphics analysis and optimization of Microsoft DirectX* applications and Android* OpenGL ES* applications. You can find more about it here: https://software.intel.com/en-us/articles/gpa-which-version

For purposes of Android optimization I prefer the GPA console client. You can read about it here: https://software.intel.com/en-us/android/articles/using-intel-graphics-performance-analyzers-console-client-for-android-application

VTune™ Amplifier helps you analyze the algorithm choices and identify where and how your application can benefit from available hardware resources. Use VTune Amplifier to locate or determine the following:

  • The most time-consuming functions (hotspots) in your application and/or on the whole system
  • Sections of code that do not effectively utilize available processor time
  • The best sections of code to optimize for sequential performance and for threaded performance
  • Synchronization objects that affect the application performance
  • Whether, where, and how your application spends time on input/output operations
  • The performance impact of different synchronization methods, different numbers of threads, or different algorithms
  • Thread activity and transitions
  • Hardware-related bottlenecks in your code

Configure the data collection on the host system (Linux*, OS X*, or Windows*) and run the analysis on a remote system (Linux or Android). Remote analysis on Android and embedded Linux systems is supported by the VTune Amplifier for systems only.

You can read more here: https://software.intel.com/en-us/node/496918

The figure below shows how to use a combination of GPA and VTune Amplifier to analyze and optimize your application.

What are Intel® Intrinsics

Intel® intrinsics are assembly-coded functions that allow you to use C/C++ function calls and variables instead of assembly instructions. Intrinsics provide access to instructions that cannot be generated using the standard constructs of the C and C++ languages.

Intrinsics are expanded inline, eliminating function call overhead. Providing the same benefit as using inline assembly, intrinsics improve code readability, assist instruction scheduling, and help reduce debugging.

You can read more here: https://software.intel.com/en-us/node/523351

How to find and connect Intel® C++ Compiler for Android* OS to your project?

Intel® C++ Compiler for Android* OS is included in Intel® INDE suite. Inte®l C++ Compiler for Android* integrates in Android NDK and provides an optimized alternative to compile x86 libraries.

Download and install Intel C++ Compiler for Android. Provide a path to NDK directory during the installation to integrate Intel C++ Compiler for Android into Android NDK.

After the successful installation, the Intel® C++ Compiler for Android will be automatically integrated into the Android NDK toolchain and will compile optimized libraries for x86 architecture.

Example

To demonstrate the usage of Intel intrinsics, let’s look at the C++ code:

Float x = 1.0f / sqrtf( y );

This type of code (especially in physics algorithms) often takes place in hotspots.

By analyzing this string in the VTune Amplifier, the profile  will show you that the compiler generates sqrt + div instead of rsqrt.

The way to fix it is using Intel intrinsics:

Float x = rsqrt( y );

Where rsqrt is:

         #include

         …

         inline float rsqrt(const float x)
         {
             float r;
             _mm_store_ss(&r, _mm_rsqrt_ss( _mm_load_ss(&x)));
             return r;
         }

References

For more information, watch my video: https://videoportal.intel.com/media/0_qgvcof5s

 

About the Author

Stanislav Pavlov works in the Software & Service Group at Intel Corporation. He has 10+ years of experience in technologies. His main interest is optimization of performance, power consumption, and parallel programming. In his current role as a Senior Application Engineer providing technical support for Intel®-based devices, Stanislav works closely with software developers and SoC architects to help them achieve the best possible performance on Intel® platforms. Stanislav holds a Master's degree in Mathematical Economics from the National Research University Higher School of Economics. He is currently pursuing an MBA in the Moscow Business School.

Notices

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Any software source code reprinted in this document is furnished under a software license and may only be used or copied in accordance with the terms of that license.

Intel, the Intel logo, and VTune are trademarks of Intel Corporation in the U.S. and/or other countries.

Copyright © 2014 Intel Corporation. All rights reserved.

*Other names and brands may be claimed as the property of others.


Viewing all articles
Browse latest Browse all 3384

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>