Quantcast
Channel: Intel Developer Zone Articles
Viewing all 3384 articles
Browse latest View live

OpenStack App Developer Survey

$
0
0

As part of a long-term commitment to enhance ease-of-use, the OpenStack UX project, with support of the OpenStack Foundation and the Technical Committee, is now building a community of application and software developers interested in providing their feedback at key points during development. Your time investment is minimal, and the survey is limited to no more than 10 questions. No familiarity with OpenStack (openstack.org) is needed to participate. In fact, the aim is to recruit a number of application and software developers with no personal interest in OpenStack.

 

Complete the survey: https://www.surveymonkey.com/r/appdev-survey (only 10 questions)

 

Please forward this survey to your peers—this is a community-wide effort and not limited to any specific company.

 

As thanks for your time, three (3) lucky participants will be randomly selected to win a $100 gift card, one each in June, July and August.


[Series] Know Your Customer: Pick Your Channels

$
0
0

Know Your Customer! Step Two: Pick Your Channels

Once you know who your target audience is, the next step is figuring out where to find them. What Websites do they read? Where do they go for news about the latest games?

When you’re unknown, bringing customers to you can be a difficult challenge, which is why—especially when you’re young and just getting started—it’s important to figure out where they are, and go to them.

In this second article in our three-part series called Know Your Customer, we’ll look at how your customer persona informs where you concentrate your marketing efforts.

Understand Their Journey

We’ve already posted an article focused specifically on the customer journey, and we started talking about it in the last article in this series—it’s really important. Not only does it help you understand the customer even better, it allows you to start identifying their behavior so you can better anticipate their needs, and meet them where they are.

There are five main stages in the customer journey:

  • Awareness— Potential customers find out about your app
  • Discovery— Interested customers seek more information about your app
  • Consideration— If your product description, ratings, and imagery look good, customers will download your app
  • Conversion— Customers go through the setup or login process and begin using the app
  • Retention— Customers continue to use the app regularly

To create really targeted marketing, you’ll need to determine your customer’s wants and actions at each stage of the journey. It can be helpful to create a visual of your customer’s journey. For a starting point, download the worksheet below.

What Do They Want?

The customer embarks on the journey because they want something. Begin fleshing out your customer journey by putting yourself in the customer’s shoes and thinking about the questions they would have at each stage of the journey. Do they need information, reassurance, a preview of the gameplay? The answer will be different depending on where they are in the customer journey.

Think back to the persona we created in Part 1, the 28-year old fish owner named John. During the discovery phase, he might want to know how well the app works, and during the conversion phase, he might want to know whether its features fit with his lifestyle. For instance, he might want to know if he can log in with his Facebook account or easily integrate his existing calendar.

What Do They Do?

Once you've considered what the customer wants at each stage of the journey, think about their behavior. What action will they take? When John our fish owner wants to know how well the app works, does he rely on ratings, customer reviews, or expert reviews? Where does he go to learn about specific features? Think about the persona you created, and walk through the journey with them.

How You Can Meet Them

Once you understand what your customer wants along the journey, and what actions they'll take—you can identify the right channels and meet them there. In the above example, if John wants to know how well the app works, and he's going to look for expert reviews, then you need to find an expert who will write a review. If you're developing a trivia game and your customer spends time on Reddit to learn about new games, then you need to have a presence in that community. We’ll talk more about this—and how you can get creative about getting on those channels—in Part 3 of this series.

Unless you have really deep pockets, you’ll want to research and understand all the potential channels, and then choose the few you think will be most effective with this target audience.

Here are some places to consider:

  • YouTube
  • Google search for top new games or apps
  • Social media
  • Reddit communities
  • Blogs or magazines
  • Gaming portals
  • Tech blogs and magazines
  • Industry blogs and magazines
  • Tech or gaming events
  • App stores
  • Your own website

Look for Trusted Spaces

Keep in mind that the more authentic the channel, the more well-received. We’ll talk more about this as it relates to customer acquisition in the next article, but a good rule of thumb is to find a balance between paid placements and personal testimonials. Social content, blogs, and other types of original content all tend to have a high believability factor. Things like search engine marketing—while effective—are understood as more straight-forward advertising, and therefore less trusted. That doesn’t mean you don’t want to use those channels, just be smart about finding a balance.
 

What’s something you’ve purchased recently—an app or something else? What was your customer journey? Tell us about it in the comments!

 

Fluid Simulation for Video Games (Part 20)

$
0
0

Download PDF of Fluid Simulation for Video Games (Part 20) [PDF 1.1MB]

Comparison of fluid simulations

Figure 1: Comparison of fluid simulations with the same interior vorticity but different solvers and boundary values. The top row uses direct integral method, where the domain boundary has no explicit influence. The middle row uses a Poisson solver, with boundary conditions of vector potential calculated from a treecode integral method. The bottom row uses a Poisson solver, with boundary conditions of vector potential set to zero.

Assigning Vector Potential at Boundaries

Fluid simulation entails computing flow velocity everywhere in the fluid domain. This article places the capstone on a collection of techniques that combine to provide a fast way to compute velocity from vorticity. This is the recipe:

  1. Within some domain, use vortex particles to track where fluid rotates.
  2. At domain boundaries, integrate vorticity to compute the vector potential.
  3. Throughout the domain interior, solve a vector Poisson equation to compute vector potential everywhere else.
  4. Everywhere in the domain, compute the curl of vector potential to obtain the velocity field.
  5. Advect particles according the velocity field.

This system has some nice properties:

  • Most game engines already support particle systems. This technique builds on such systems.
  • Accelerate the integration of vorticity by using the treecode O(N log N), which is pretty fast.
  • Solving a vector Poisson equation can be even faster: O(N).
  • Computing curl is mathematically simple and fast: O(N).
  • Most particle systems already support advecting particles according to a velocity field.
  • The same velocity field can advect both vortex and tracer particles.
  • All the computation routines have fairly simple mathematical formulae.
  • The algorithms are numerically stable.
  • You can parallelize all the computation routines above by using Intel® Threading Building Blocks (Intel® TBB).
  • Using particles to represent vorticity means that only the most interesting aspects of the flow cost resources.

The scheme is well-suited to fire and smoke simulations, but it has at least one drawback: It’s not well suited to simulating liquid–air interfaces. If you want to simulate pouring, splashes, or waves, other techniques are better. Smoothed-particle hydrodynamics (SPH) and shallow-water wave equations will give you better results.

This article complements part 19, describing how to improve fidelity while reducing computational cost by combining techniques described in earlier articles in this series. In particular, this article describes the following steps:

  1. Compute vector potential at boundaries only by integrating vorticity with a treecode.
  2. Enable the multigrid Poisson solver described in part 6.
  3. Modify UpSample to skip overwriting values at boundaries.
  4. Use Intel TBB to parallelize UpSample and DownSample.

The code accompanying this article provides a complete fluid simulation using the vortex particle method. You can switch between various techniques, including integral and differential techniques, to compare their performance and visual aesthetics. At the end of the article, I offer some performance profiles that demonstrate that the method this article describes runs the fastest of all those presented in this series. To my eye, it also offers the most visually pleasing motion.

Part 1 and part 2 summarized fluid dynamics and simulation techniques. Part 3 and part 4 presented a vortex-particle fluid simulation with two-way fluid–body interactions that run in real time. Part 5 profiled and optimized that simulation code. Part 6 described a differential method for computing velocity from vorticity. Figure 1 shows the relationships between the various techniques and articles. Part 7 showed how to integrate a fluid simulation into a typical particle system. Part 8, part 9, 10, and part 11 explained how to simulate density, buoyancy, heat, and combustion in a vortex-based fluid simulation. Part 12 explained how improper sampling caused unwanted jerky motion and described how to mitigate it. Part 13 added convex polytopes and lift-like forces. Part 14, part 15, part 16, part 17, and part 18 added containers, SPH, liquids, and fluid surfaces, respectively.

Integrate Vorticity to Compute Vector Potential at Boundaries

Part 19 provided details of using a treecode algorithm to integrate vorticity and compute vector potential. Use the treecode algorithm to compute vector potential only at boundaries, as shown in Figure 2. (Later, the vector Poisson solver will “fill in” vector potentials through the domain interior.)

Vector potential computed at domain boundaries only for a vortex ring

Figure 2: Vector potential computed at domain boundaries only for a vortex ring

The treecode algorithm has an asymptotic time complexity of O(N log N), where N is the number of points where the computation occurs. At first glance, that seems more expensive than the O(N) Poisson solver, but you can confine the treecode computation to the boundaries (a two-dimensional manifold) that have Nb points, so the treecode algorithm costs O(Nb log Nb). In contrast, the Poisson algorithm runs in the three-dimensional interior, which has Ni points. (Figure 3 shows how the ratio of the numbers of boundary-to-interior points diminishes as the problem grows.) For a domain that has Ni∝Np3 points, Nb∝Np2= Ni2/3, so the overall cost of this algorithm is:

O(Ni2/3 log Ni2/3) + O(Ni)

The first term grows more slowly than the second, so asymptotically, the algorithm overall has asymptotic time complexity O(Ni).

Ratio of face to volume points on a cubic grid

Figure 3: Ratio of face to volume points on a cubic grid. As the number of grid points increases, the relative cost of computing boundary values diminishes compared to computing interior values.

Retain two code paths to compute the integral either throughout the entire domain or just at boundaries. You can accommodate that by adding logic to conditionally skip the domain interior grid points. The yellow-highlighted code below shows that modification.

The treecode algorithm takes many conditional branches, and its memory access pattern has poor spatial locality: It jumps around a lot, which makes the algorithm run slowly. Fortunately, vector potential values at boundaries have properties you can exploit to reduce the cost of computing them: They don’t vary much spatially, and they’re far from most of the "action" in the domain interior. You can compute boundary values with lower spatial granularity to save compute time. To do so, compute vector potential on boundaries at every other point, then copy those values to their neighbors. That cuts the cost about in half. The cyan-highlighted code below shows that modification.

(If you’re curious about how much this decimated computation affects the final result, try computing with and without decimation. See if you can tell the difference.)

void VortonSim::ComputeVectorPotentialAtGridpoints_Slice( size_t izStart , size_t izEnd , bool boundariesOnly
                     , const UniformGrid< VECTOR< unsigned > > & vortonIndicesGrid , const NestedGrid< Vorton > & influenceTree )
{
    const size_t            numLayers               = influenceTree.GetDepth() ;
    UniformGrid< Vec3 > &   vectorPotentialGrid     = mVectorPotentialMultiGrid[ 0 ] ;
    const Vec3 &            vMinCorner              = mVelGrid.GetMinCorner() ;
    static const float      nudge                   = 1.0f - 2.0f * FLT_EPSILON ;
    const Vec3              vSpacing                = mVelGrid.GetCellSpacing() * nudge ;
    const unsigned          dims[3]                 =   { mVelGrid.GetNumPoints( 0 )
                                                        , mVelGrid.GetNumPoints( 1 )
                                                        , mVelGrid.GetNumPoints( 2 ) } ;
    const unsigned          numXY                   = dims[0] * dims[1] ;
    unsigned                idx[ 3 ] ;
    const unsigned          incrementXForInterior   = boundariesOnly ? ( dims[0] - 1 ) : 1 ;

    // Compute fluid flow vector potential at each boundary grid point, due to all vortons.
    for( idx[2] = static_cast< unsigned >( izStart ) ; idx[2] < izEnd ; ++ idx[2] )
    {   // For subset of z index values...
        Vec3 vPosition ;
        vPosition.z = vMinCorner.z + float( idx[2] ) * vSpacing.z ;
        const unsigned  offsetZ     = idx[2] * numXY ;
        const bool      topOrBottom = ( 0 == idx[2] ) || ( dims[2]-1 == idx[2] ) ;
        for( idx[1] = 0 ; idx[1] < dims[1] ; ++ idx[1] )
        {   // For every grid point along the y-axis...
            vPosition.y = vMinCorner.y + float( idx[1] ) * vSpacing.y ;
            const unsigned  offsetYZ    = idx[1] * dims[0] + offsetZ ;
            const bool      frontOrBack = ( 0 == idx[1] ) || ( dims[1]-1 == idx[1] ) ;
            const unsigned  incX        = ( topOrBottom || frontOrBack ) ? 1 : incrementXForInterior ;
            for( idx[0] = 0 ; idx[0] < dims[0] ; idx[0] += incX )
            {   // For every grid point along the x-axis...
                vPosition.x = vMinCorner.x + float( idx[0] ) * vSpacing.x ;
                const unsigned offsetXYZ = idx[0] + offsetYZ ;

                    if( 0 == ( idx[0] & 1 ) )
                    {   // Even x indices.  Compute value.
                        static const unsigned zeros[3] = { 0 , 0 , 0 } ; /* Starter indices for recursive algorithm */
                        if( numLayers > 1 )
                        {
                            vectorPotentialGrid[ offsetXYZ ] = ComputeVectorPotential_Tree( vPosition , zeros , numLayers – 1
                                                                                    , vortonIndicesGrid , influenceTree ) ;
                        }
                        else
                        {
                            vectorPotentialGrid[ offsetXYZ ] = ComputeVectorPotential_Direct( vPosition ) ;
                        }
                    }
                    else
                    {   // Odd x indices. Copy value from preceding grid point.
                        vectorPotentialGrid[ offsetXYZ ] = vectorPotentialGrid[ offsetXYZ - 1 ] ;
                    }
            }
        }
    }
}

Retain Boundary Values When Up-Sampling

Interleaved between each solver step, multigrid algorithms down-sample values from finer to coarser grids, then up-sample values from coarser to finer grids (as show in in Figure 4 and explained in part 6). This resampling creates a problem: Lower fidelity information up-sampled from coarser grids replaces boundary values originally computed on finer grids.

A fine grid, a medium grid, and a coarse grid in a multigrid solver

Figure 4: A fine grid, a medium grid, and a coarse grid in a multigrid solver

Because using the treecode to compute vector potential values is expensive, you want to avoid recomputing that. So, modify UniformGrid::UpSample to avoid overwriting values at boundaries. Use a flag in that routine to indicate whether toa omit or include boundary points in the destination grid. To omit writing at boundaries, change the for loop begin and end values to cover only the interior. (See the code for that below.) Then, in the multigrid algorithm, during the up-sampling phase, pass the flag to omit up-sampling at boundaries. This code snippet is a modification of the version of VortonSim::ComputeVectorPotential originally presented in part 6. The highlighted text shows the modification:

// Coarse-to-fine stage of V-cycle: Up-sample from coarse to fine, running iterations of Poisson solver for each up-sampled grid.
for( unsigned iLayer = maxValidDepth ; iLayer >= 1 ; -- iLayer )
{
  // Retain boundary values as they were computed initially (above) in finer grids.
  vectorPotentialMultiGrid.UpSampleFrom( iLayer , UniformGridGeometry::INTERIOR_ONLY ) ;
  SolveVectorPoisson( vectorPotentialMultiGrid[ iLayer - 1 ] , negativeVorticityMultiGrid[ iLayer - 1 ]
					, numSolverSteps , boundaryCondition , mPoissonResidualStats ) ;
}

Avoid Superfluous and Expensive Intermediate Fidelity When Down-Sampling

The down-sampling routine provided in part 6 accumulates values from multiple grid points in the finer source grid to compute values in the coarser destination grid. That provides higher-fidelity results, but because the Poisson solver overwrites those values with refinements, the additional fidelity is somewhat superfluous. It’s computationally cheaper to down-sample using nearest values (instead of accumulating), then running more iterations of the Poisson solver (if you want the additional fidelity in the solution). So, you can also modify DownSample to use a faster but less accurate down-sampling technique. This code snippet is a modification of the version of VortonSim::ComputeVectorPotential originally presented in part 6. The highlighted text shows the modification:

// Fine-to-coarse stage of V-cycle: down-sample from fine to coarse, running some iterations of the Poisson solver for each down-sampled grid.
  for( unsigned iLayer = 1 ; iLayer < negativeVorticityMultiGrid.GetDepth() ; ++ iLayer )
  {
      const unsigned minDim = MIN3( negativeVorticityMultiGrid[ iLayer ].GetNumPoints( 0 )
          , negativeVorticityMultiGrid[ iLayer ].GetNumPoints( 1 ) , negativeVorticityMultiGrid[ iLayer ].GetNumPoints( 2 ) ) ;
      if( minDim > 2 )
      {
          negativeVorticityMultiGrid.DownSampleInto( iLayer , UniformGridGeometry::FASTER_LESS_ACCURATE ) ;
          vectorPotentialMultiGrid.DownSampleInto( iLayer , UniformGridGeometry::FASTER_LESS_ACCURATE ) ;
          SolveVectorPoisson( vectorPotentialMultiGrid[ iLayer ] , negativeVorticityMultiGrid[ iLayer ] , numSolverSteps
                            , boundaryCondition , mPoissonResidualStats ) ;
      }
      else
      {
          maxValidDepth = iLayer - 1 ;
          break ;
      }
  }

Parallelize Resampling Algorithms with Intel® Threading Building Blocks

Even with the above changes, the resampling routines cost significant time. You can use Intel TBB to parallelize the resampling algorithms. The approach follows the familiar recipe:

  • Write a worker routine that operates on a slice of the problem.
  • Write a functor class that wraps the worker routine.
  • Write a wrapper routine that directs Intel TBB to call a functor.

The worker, functor, and wrapper routines for DownSample and UpSample are sufficiently similar that I only include DownSample in this article. You can see the entire code in the archive that accompanies this article.

This excerpt from the worker routine for DownSample shows the slicing logic and modifications made to implement nearest sampling, described above:

void DownSampleSlice( const UniformGrid< ItemT > & hiRes , AccuracyVersusSpeedE accuracyVsSpeed , size_t izStart , size_t izEnd )
        {
            UniformGrid< ItemT > &  loRes        = * this ;
            const unsigned  &       numXhiRes           = hiRes.GetNumPoints( 0 ) ;
            const unsigned          numXYhiRes          = numXhiRes * hiRes.GetNumPoints( 1 ) ;
            static const float      fMultiplierTable[]  = { 8.0f , 4.0f , 2.0f , 1.0f } ;

            // number of cells in each grid cluster
            const unsigned pClusterDims[] = {   hiRes.GetNumCells( 0 ) / loRes.GetNumCells( 0 )
                                            ,   hiRes.GetNumCells( 1 ) / loRes.GetNumCells( 1 )
                                            ,   hiRes.GetNumCells( 2 ) / loRes.GetNumCells( 2 ) } ;

            const unsigned  numPointsLoRes[3]   = { loRes.GetNumPoints( 0 ) , loRes.GetNumPoints( 1 ) , loRes.GetNumPoints( 2 ) };
            const unsigned  numXYLoRes          = loRes.GetNumPoints( 0 ) * loRes.GetNumPoints( 1 ) ;
            const unsigned  numPointsHiRes[3]   = { hiRes.GetNumPoints( 0 ) , hiRes.GetNumPoints( 1 ) , hiRes.GetNumPoints( 2 ) };
            const unsigned  idxShifts[3]        = { pClusterDims[0] / 2 , pClusterDims[1] / 2 , pClusterDims[2] / 2 } ;

            // Since this loop iterates over each destination cell, it parallelizes without contention.
            unsigned idxLoRes[3] ;
            for( idxLoRes[2] = unsigned( izStart ) ; idxLoRes[2] < unsigned( izEnd ) ; ++ idxLoRes[2] )
            {
                const unsigned offsetLoZ = idxLoRes[2] * numXYLoRes ;
                for( idxLoRes[1] = 0 ; idxLoRes[1] < numPointsLoRes[1] ; ++ idxLoRes[1] )
                {
                    const unsigned offsetLoYZ = idxLoRes[1] * loRes.GetNumPoints( 0 ) + offsetLoZ ;
                    for( idxLoRes[0] = 0 ; idxLoRes[0] < numPointsLoRes[0] ; ++ idxLoRes[0] )
                    {   // For each cell in the loRes layer...
                        const unsigned  offsetLoXYZ   = idxLoRes[0] + offsetLoYZ ;
                        ItemT        &  rValLoRes  = loRes[ offsetLoXYZ ] ;
                        unsigned clusterMinIndices[ 3 ] ;
                        unsigned idxHiRes[3] ;

                        if( UniformGridGeometry::FASTER_LESS_ACCURATE == accuracyVsSpeed )
                        {
                            memset( & rValLoRes , 0 , sizeof( rValLoRes ) ) ;
                            NestedGrid::GetChildClusterMinCornerIndex( clusterMinIndices , pClusterDims , idxLoRes ) ;
                            idxHiRes[2] = clusterMinIndices[2] ;
                            idxHiRes[1] = clusterMinIndices[1] ;
                            idxHiRes[0] = clusterMinIndices[0] ;
                            const unsigned offsetZ      = idxHiRes[2] * numXYhiRes ;
                            const unsigned offsetYZ     = idxHiRes[1] * numXhiRes + offsetZ ;
                            const unsigned offsetXYZ    = idxHiRes[0] + offsetYZ ;
                            const ItemT &  rValHiRes    = hiRes[ offsetXYZ ] ;
                            rValLoRes = rValHiRes ;
                        }
                        else
                        { ... see archive for full code listing...
                        }
                    }
                }
            }
        }

These routines have an interesting twist compared to others in this series: They are methods of a templated class. That means that the functor class must also be templated. The syntax is much easier when the functor class is nested within the UniformGrid class. Then, the fact that it is templated is implicit: The syntax is formally identical to a nontemplated class.

Here is the functor class for DownSample. Note that it is defined inside the UniformGrid templated class:

class UniformGrid_DownSample_TBB
{
			  UniformGrid &                         mLoResDst               ;
		const UniformGrid &                         mHiResSrc               ;
		UniformGridGeometry::AccuracyVersusSpeedE   mAccuracyVersusSpeed    ;
	public:
		void operator() ( const tbb::blocked_range & r ) const
		{   // Perform subset of down-sampling
			SetFloatingPointControlWord( mMasterThreadFloatingPointControlWord ) ;
			SetMmxControlStatusRegister( mMasterThreadMmxControlStatusRegister ) ;
			mLoResDst.DownSampleSlice( mHiResSrc , mAccuracyVersusSpeed , r.begin() , r.end() ) ;
		}
		UniformGrid_DownSample_TBB( UniformGrid & loResDst , const UniformGrid & hiResSrc
								  , UniformGridGeometry::AccuracyVersusSpeedE accuracyVsSpeed )
			: mLoResDst( loResDst )
			, mHiResSrc( hiResSrc )
			, mAccuracyVersusSpeed( accuracyVsSpeed )
		{
			mMasterThreadFloatingPointControlWord = GetFloatingPointControlWord() ;
			mMasterThreadMmxControlStatusRegister = GetMmxControlStatusRegister() ;
		}
	private:
		WORD        mMasterThreadFloatingPointControlWord   ;
		unsigned    mMasterThreadMmxControlStatusRegister   ;
} ;

Here is the wrapper routine for DownSample:

void DownSample( const UniformGrid & hiResSrc , AccuracyVersusSpeedE accuracyVsSpeed )
  {
      const size_t numZ = GetNumPoints( 2 ) ;
# if USE_TBB
      {
          // Estimate grain size based on size of problem and number of processors.
          const size_t grainSize =  Max2( size_t( 1 ) , numZ / gNumberOfProcessors ) ;
          parallel_for( tbb::blocked_range( 0 , numZ , grainSize )
                      , UniformGrid_DownSample_TBB( * this , hiResSrc , accuracyVsSpeed ) ) ;
      }
# else
      DownSampleSlice( hiResSrc , accuracyVsSpeed , 0 , numZ ) ;
# endif
  }

Performance

Table 1 shows the duration (in milliseconds per frame) of various routines run on a computer with a 3.50‑GHz Intel® Core™ i7-3770K processor with four physical cores and two local cores per physical core.

No. of ThreadsFrameVorton SimVector PotentialUp-SamplePoissonRender
129.86.642.370.05540.22213.8
217.95.111.360.02050.1487.53
313.14.691.280.01960.1534.99
413.04.551.220.01160.1485.04
811.14.441.130.00230.1413.97

Table 1: Duration of Routines Run on an Intel® Core™ i7 Processor

Notice that Vorton Sim does not speed up linearly with the number of cores. Perhaps the algorithms have reached the point where data access (not instructions) is the bottleneck.

Summary and Options

This article presented a fluid simulation that combines integral and differential numerical techniques to achieve an algorithm that takes time linear in the number of grid points or particles. The overall simulation can’t be faster than that because each particle has to be accessed to be rendered. It also provides better results than the treecode because the latter uses approximations everywhere in the computational domain that the Poisson solver does not, and the Poisson solver has an inherently smoother and more globally accurate solution.

More work could be done to improve this algorithm. Currently, the numerical routines are broken up logically so that they’re easier to understand, but this causes the computer to revisit the same data repeatedly. After data-parallelizing the routines, their run times become bound by memory access instead of instructions. So, if instead all the fluid simulation operations were consolidated into a single monolithic routine that accessed the data only once and that super-routine were parallelized, it might lead to even greater speed.

About the Author

Dr. Michael J. Gourlay works at Microsoft as a principal development lead on HoloLens in the Environment Understanding group. He previously worked at Electronic Arts (EA Sports) as the software architect for the Football Sports Business Unit, as a senior lead engineer on Madden NFL*, and as an original architect of FranTk* (the engine behind Connected Careers mode). He worked on character physics and ANT* (the procedural animation system that EA Sports uses), on Mixed Martial Arts*, and as a lead programmer on NASCAR. He wrote Lynx* (the visual effects system used in EA games worldwide) and patented algorithms for interactive, high-bandwidth online applications.

He also developed curricula for and taught at the University of Central Florida, Florida Interactive Entertainment Academy, an interdisciplinary graduate program that teaches programmers, producers, and artists how to make video games and training simulations.

Prior to joining EA, he performed scientific research using computational fluid dynamics and the world’s largest massively parallel supercomputers. His previous research also includes nonlinear dynamics in quantum mechanical systems and atomic, molecular, and optical physics. Michael received his degrees in physics and philosophy from Georgia Tech and the University of Colorado at Boulder.

Follow Michael on Twitter: @MiJaGourlay.

Intel® Advisor Roofline feature Q&A

$
0
0

This document contains the FAQ asked by our customers about Roofline feature of Intel® Advisor.

How to use Roofline analysis?

How do you count FLOPS?

How to do roofline analysis for multithreaded application?

How high KNL bandwidth memory is treated in Roofline?

Does ITT* API works for roofline feature?

How to store Roofline analysis data to file?

Does the command line interface to specify only a particular loops works for roofline analysis?

How to use Roofline analysis?

It is described in the article https://software.intel.com/en-us/articles/getting-started-with-intel-advisor-roofline-feature

How do you count FLOPS?

We use classic approach for computing FLOPS.

We count fp-operations +,-,*,/ as 1 and FMA as 2 operations.

If a SIMD fp-operation is used number fp-operation is multiplied by number of vector elements used in the operation.

Single and double precision FLOPS are not counted into separate metrics, the single cumulative metrics is computed.

We count actual instructions got executed.

So, if math function is an instruction then it’s counted as 1 (exp, rsp, sqrt). If it is a math library call (sin, cos, …. ) then we count actual number of compute instructions in Chebyshev/Taylor decomposition in the routine implementation. But the FLOPS data will be assigned to the function call itself and not to the loop called the function. We call it selftime-based FLOPS counting. More information on selftime-based FLOPS counting can be found in the article https://software.intel.com/en-us/articles/selftime-based-flops-computing-vectorization-advisor.

How to do roofline analysis for multithreaded application?

...

How high KNL bandwidth memory is treated in Roofline?

...

How to store Roofline analysis data to file?

This is not possible so far. Contact vector_advisor@intel.com if you feel this is a critical feature for you. 

 

Does ITT* API works for roofline feature?

It should work in general. Could be problems for Trip Counts support.But it works for Survey, therefore FLOPS will not be computed for excluded loops.

 

Does the command line interface to specify only a particular loops works for roofline analysis?

No. It is not implemented. Contact vector_advisor@intel.com if you feel this is a critical feature for you.

 

 

 

French Hospital Uses Trusted Analytics Platform to Predict Emergency Department Visits and Admissions

$
0
0

For hospital administrators, predicting the number of patient visits to emergency departments, along with their admission rates, is critical for optimizing resources at all levels of staff. Ultimately, this reduces wait times in emergency departments and improves the quality of patient care.

Intel and the Assistance Publique-Hôpitaux de Paris (AP-HP), the largest university hospital in Europe, worked together to build a cloud-based solution for predicting the expected number of patient visits and hospital admissions using advanced data science methodologies and the Trusted Analytics Platform (TAP).

 

Open Source Downloads

$
0
0

This article makes available third-party libraries, executables and sources that were used in the creation of Intel® Software Development Products or are required for operation of those. Intel provides this software pursuant to their applicable licenses.

 

Required for Operation of Intel® Software Development Products

The following products require additional third-party software for operation.

Intel® Parallel Studio XE 2015 Composer Edition for C++ Windows* and
Intel® System Studio 2015 Composer Edition for Windows*:

The following binutils package is required for operation with Intel® Graphics Technology:
Téléchargerapplication/zipDownload
Please see Release Notes of the product for detailed instructions on using the binutils package.

The above binutils package is subject to various licenses. Please see the corresponding sources for more information:
Téléchargerapplication/zipDownload
 

Required for use of offload with Open Source Media Kernel Runtime for Intel® HD Graphics

The following products require additional third-party software for the mentioned operation.

Intel® Parallel Studio XE 2016 Composer Edition for C++ Linux* and
Intel® System Studio 2016 Composer Edition for Linux*:

The following installation guide together with the build and installation script are required:

Téléchargerapplication/pdfDownload

Téléchargerapplication/zipDownload
This file contains the otc_cmrt_build_and_install.sh script. Please unpack it.

Used within Intel® Software Development Products

The following products contain Intel® Application Debugger, Intel® Debugger for Heterogeneous Compute, Intel® Many Integrated Core Debugger (Intel® MIC Debugger), Intel® JTAG Debugger, and/or Intel® System Debugger tools which are using third party libraries as listed below.

Products and Versions:

Intel® System Studio IoT Edition

Intel® System Studio 2016

  • Intel® System Studio 2016 Composer Edition
    (Initial Release and higher)

Intel® Parallel Studio XE 2016 for Linux*

  • Intel® Parallel Studio XE 2016 Composer Edition for C++ Linux*/Intel® Parallel Studio XE 2016 Composer Edition for Fortran Linux*
    (Initial Release and higher)

Intel® Parallel Studio XE 2015 for Linux*

  • Intel® Parallel Studio XE 2015 Composer Edition for C++ Linux*/Intel® Parallel Studio XE 2015 Composer Edition for Fortran Linux*
    (Initial Release and higher)

Intel® Composer XE 2013 SP1 for Linux*

  • Intel® C++ Composer XE 2013 SP1 for Linux*/Intel® Fortran Composer XE 2013 SP1 for Linux*
    (Initial Release and higher; 13.0 Intel® Application Debugger)

Intel® Composer XE 2013 for Linux*

  • Intel® C++ Composer XE 2013 for Linux*/Intel® Fortran Composer XE 2013 for Linux*
    (Initial Release and higher; 13.0 Intel® Application Debugger)

Intel® Composer XE 2011 for Linux*

  • Intel® C++ Composer XE 2011 for Linux*/Intel® Fortran Composer XE 2011 for Linux*
    (Update 6 and higher; 12.1 Intel® Application Debugger)
  • Intel® C++ Composer XE 2011 for Linux*/Intel® Fortran Composer XE 2011 for Linux*
    (Initial Release and up to Update 5; 12.0 Intel® Application Debugger)

Intel® Compiler Suite Professional Edition for Linux*

  • Intel® C++ Compiler for Linux* 11.1/Intel® Fortran Compiler for Linux* 11.1
  • Intel® C++ Compiler for Linux* 11.0/Intel® Fortran Compiler for Linux* 11.0
  • Intel® C++ Compiler for Linux* 10.1/Intel® Fortran Compiler for Linux* 10.1

Intel® Embedded Software Development Tool Suite for Intel® Atom™ Processor:

  • Version 2.3 (Initial Release and up to Update 2)
  • Version 2.2 (Initial Release and up to Update 2)
  • Version 2.1
  • Version 2.0

Intel® Application Software Development Tool Suite for Intel® Atom™ Processor:

  • Version 2.2 (Initial Release and up to Update 2)
  • Version 2.1
  • Version 2.0

Intel® C++ Software Development Tool Suite for Linux* OS supporting Mobile Internet Devices (Intel® MID Tools):

  • Version 1.1
  • Version 1.0

Intel AppUp™ SDK Suite for MeeGo*

  • Initial Release (Version 1.0)

Used third-party libraries:
Please see the attachments for a complete list of third-party libraries.

Note: The packages posted here are unmodified copies from the respective distributor/owner and are made available for ease of access. Download or installation of those is not required to operate any of the Intel® Software Development Products. The packages are provided as is, without warranty or support.

[Series] Know Your Customers: Customer Acquisition

$
0
0



Know Your Customer! Step Three: Customer Acquisition

We’ve discussed how to find your target audience, and how to pick the right channels to reach them. That brings us to the third and final part of this series—how to use all of this knowledge about your potential customer in order to convert them into an actual customer.

The key thing to remember with user acquisition is that you want to move people from anonymous to known. With a structured content and channel approach you can guide them through this transition—being in the right place, with the right message, at the right time to win them over.

Read on to learn more about that all-important stage in the process—making the sale.

Remember the Journey

We talked a lot about the overall customer journey in the last post in this series, and about how to use your knowledge of the customer to pick the right channels. Now that you know which channels they use, it’s time to think about how to get onto those channels. It’s especially important for you to be strategic about how you’re responding to your customers’ needs. As they move along the journey, how will you respond? Let’s say you paid for search-engine marketing, and they clicked through. What now? What’s the next thing you want to communicate with them? How will you help move them along the funnel? Remember to think back to your persona, and consider their wants and actions.

If You Create a Buzz, They Will Come

Beyond building a great app, what else can you do to encourage conversion? You can create content to support your game or app, and you can make sure you have a presence in the places where your customers are spending time. There are several approaches worth considering, depending on how much money and time you have. Look for things you can do for free, be strategic about using limited resources, and also look into partnerships with trusted names to get even more traction and attention.

Spend Time, Not Money

No matter your budget, free is always a good sell. In this case, it also makes good business sense—the efforts that don’t cost money also tend to be the ones that people trust the most, making your message or content hold a lot more weight.

  • Content Creation. Publish your own content. This includes anything you can create yourself, including blog posts and images. Think about shareable content such as GIFs and infographics.
  • Testimonials. Get established bloggers or influencers to feature your product or write a review. Think about who you know …
  • Scarcity. Create a sense of scarcity as you launch or roll out features. The more limited the content or downloads feel, the more people will be interested.
  • Gamification. To build momentum, incentivize participation, such as encouraging users to post on social media or invite their friends to play.
  • Social. Create a social presence, and make sure you're sharing all of the above content. Join existing communities and get involved in the conversation. Again, credibility is key, and one of the most powerful tools you can have is word-of-mouth. The more your customers share, the more opportunities for new customers to find you.         

Make A Big Splash with a Little Coin

Not everything in life is free, of course. When you do need to spend money in order to acquire new customers, be smart about where you choose to focus your efforts.

  • Content Creation. With a small budget, you can create even more content, like trailers for YouTube, gaming sites and video channels. Depending on your product—and your target customers—these can be really crucial in gaining users.
  • Social media advertising. With effective targeting, and a message that’s crafted for your audience, these ads don’t have to cost a lot and can have a lot of impact.

Friends in High Places

One of the most powerful tools in your arsenal may be partnerships. Joining forces with a bigger business can be a great way to get access to more resources than you’d have otherwise and reach a wider audience. By joining forces with a bigger corporation, like Intel, you can collaborate on promotions to increase visibility and traffic, and you can also access their creative teams to create new assets. Take a look at our case study with Canadian independent gaming studio Torn Banner to learn more about their partnership with Intel and how that's worked for them.

Putting it All Together

Not all of these marketing efforts are necessary for every product. You may be partial to videos, but if your audience isn’t, or your interface is extremely simple, a video may not be worth your time or budget.

Return to the customer journey tool you created earlier. What actions can you take to make sure your message gets in front of your customer at each step of his or her particular journey? Take some time to identify a specific marketing effort you can take for each customer action you identified.

For instance, we determined that our fish lover, John, values expert opinions on new products. To capture his attention, we’ll want to make sure the experts he trusts are talking about our app. We can reach out to bloggers in the tropical fish community and ask them to try out our product and write a review. When John reads the review, he moves forward in the customer journey. We might also use our small budget to create a professional demo video for the app store, ensuring that John will get to see it action—making him more likely to download it after he goes searching for more information. Knowing your customer allows you to make a plan you can follow—with an understanding of your customer's likely behavior, you have a much better idea of how they act, where you can find them, and how you can appeal to them.

Always Be Optimizing!

It’s important to tag everything you’re doing, so you can understand which efforts are working best and what isn’t worth your time. Essentially, if you can’t measure it, you shouldn’t do it. As you have more actual customers, and more sales, continue to pay attention to the ROI of various efforts, so you can continue to fine tune and make sure you’re spending the bulk or your time and money in the places that yield the best results.

Have you seen a free marketing effort that knocked your socks off? Tell us in the comments!

Intel® Quark™ Microcontroller Developer Kit D2000: User Guide

$
0
0

This document describes Intel® Quark™ Microcontroller Developer Kit D2000 including the board, the hardware contained, and the toolchain required for software development and debugging. (v.3, May 2016)


Intel® System Studio for Microcontrollers: Product Brief

$
0
0

Development Environment for Intel® Quark™ Microcontroller Software Developers

Intel® System Studio for Microcontrollers, an Eclipse*-integrated software suite, is designed specifically to empower Intel® Quark™ microcontroller developers to create fast, intelligent things.

Using UARTs on Intel® Quark™ Microcontroller D2000

$
0
0

Contents

Introduction

The Intel® Quark™ Microcontroller D2000 features two UART interfaces. This document describes how to program these UART interfaces using the Intel® Quark™ Microcontroller Software Interface (QMSI).

Prerequisites

This document assumes that the reader is familiar with the Intel® System Studio for Microcontrollers suite – an integrated tool set for developing, optimizing and debugging applications for Intel® Quark™ Microcontrollers. The Intel® System Studio for Microcontrollers can be downloaded here: https://software.intel.com/en-us/intel-system-studio-microcontrollers (make sure to select D2000 as the target). For more information about setting up and using Intel® System Studio for Microcontrollers, please refer to the Getting Started with Intel® System Studio 2016 for Microcontrollers guide.

The document uses Intel® Quark™ Microcontroller Developer Kit D2000 board as a reference, but the information in this document can be applied to any Intel® Quark™ Microcontroller D2000 based projects.

The document discusses Intel® Quark™ Microcontroller Software Interface (QMSI) functions related to UART programming. It does not provide complete QMSI documentation. For more information, please refer to the Intel® Quark™ Microcontroller Software Interface guide.

Intel® Quark™ Microcontroller D2000 – UART Hardware Information

UART Capabilities

The UART interfaces integrated in Intel® Quark™ Microcontroller D2000 are software compatible with the 16550 standard. Each UART has 16 byte TX and RX FIFOs, supports 5 to 9 bit data format, and baud rates from 300 bps to 2 Mbps. CTS/RTS hardware flow control is available. RS485 mode is also supported.

Please refer to Intel® Quark™ Microcontroller D2000 datasheet, section 14 “UART” for more information about UART capabilities.

Intel® Quark™ Microcontroller Developer Kit D2000 Details

On the Intel® Quark™ Microcontroller Developer Kit D2000 board, the UART interface signals are connected as follows:

  • UART_A RXD and TXD signals are available on the Arduino breakout pins 0 and 1 respectively. These pins are marked as (1) on the picture below.
  • UART_A RST and CTS signals are available on the Arduino breakout pins A2 and A3 respectively.
  • UART_B signals are connected to the FTDI TTL-232R-3V3 compatible header J2. This header is marked as (3) on the picture below. These signals can be also connected to the on-board USB to UART/JTAG FT232H interface IC using the jumper group marked as (2) on the picture below.
    • The UART signals can be connected to the on-board FT232H IC by setting jumpers J9, J10, and J11 in the CTS, TXD, and N/C positions respectively.
    • When using an FTDI cable connected to the J2 header, the UART signals need to be disconnected from the on-board FT232H IC by removing jumpers J15 – RTS/TMS, J17 – RXD/TCK, and moving jumper J11 to the N/C position.
    • To use the on-board FT232H IC in JTAG mode, jumpers J9, J10, and J11 need to be set to the TDO, TDI, and TRST positions respectively, and jumpers J15 and J17 need to be connected.

I/O Pin Multiplexing

To enable multiple interfaces given a limited number of I/O pins, Intel® Quark™ Microcontroller D2000 multiplexes the functions of I/O pins. Each I/O pin can be assigned one of up to 3 different functions.

By default the only the RXD signal of the first UART interface (UART_A) is multiplexed to an I/O pin. The second UART interface (UART_B) signals are not connected to I/O pins by default – these pins are used for JTAG interface instead. The table below lists the I/O pins relevant to UART interfaces and their functions.

MCU Pin Number and NameQMSI Pin NameFunction 0Function 1Function 2Developer Kit D2000 - Arduino Breakout Pin Name

4 – F_12

QM_PIN_ID_12

GPIO12

AI12

UART_A_TXD*

1

5 – F_13

QM_PIN_ID_13

GPIO13*

AI13

UART_A_RXD

0

6 – F_14

QM_PIN_ID_14

GPIO14*

AI14

UART_A_RTS / UART_A_DE

A2

7 – F_15

QM_PIN_ID_15

GPIO15*

AI15

UART_A_CTS / UART_A_RE

A3

13 – F_20

QM_PIN_ID_20

TRST_N*

GPIO20

UART_B_TXD

UART_B header – pin 5

USB – FT232H (J11)

14 – F_21

QM_PIN_ID_21

TCK*

GPIO21

UART_B_RXD

UART_B header – pin 4

USB – FT232H (J17)

15 – F_22

QM_PIN_ID_22

TMS*

GPIO22

UART_B_RTS / UART_B_DE

UART_B header – pin 2

USB – FT232H (J15)

16 – F_23

QM_PIN_ID_23

TDI*

GPIO23

UART_B_CTS / UART_B_RE

UART_B header – pin 6

USB – FT232H (J9)

*default mode

Programming I/O Pin Multiplexing in QMSI

QMSI provides the following functions to set up pin multiplexing:

qm_rc_t qm_pmux_select(qm_pin_id_t pin, qm_pmux_fn_t fn)

The qm_pmux_select function set the I/O pin specified by the pin parameter to the function specified by the fn parameter. For example the following call selects function 2 (UART_A_RXD) on pin QM_PIN_ID_13:

qm_pmux_select(QM_PIN_ID_13, QM_PMUX_FN_2);

qm_rc_t qm_pmux_input_en(qm_pin_id_t pin, bool enable)

The qm_pmux_input_en function enables or disables input mode on the I/O pin specified by the pin parameter, according to the value of the enable parameter. For example the following call enables input mode on pin QM_PIN_ID_13:

qm_pmux_input_en(QM_PIN_ID_13, true);

Snippet for configuring pin multiplexing on UART_A

qm_pmux_select(QM_PIN_ID_12, QM_PMUX_FN_2);

qm_pmux_select(QM_PIN_ID_13, QM_PMUX_FN_2);

qm_pmux_input_en(QM_PIN_ID_13, true);

/* Following 3 calls only needed when using RTS/CTS handshake */

qm_pmux_select(QM_PIN_ID_14, QM_PMUX_FN_2);

qm_pmux_select(QM_PIN_ID_15, QM_PMUX_FN_2);

qm_pmux_input_en(QM_PIN_ID_14, true);

Snippet for configuring pin multiplexing on UART_B

qm_pmux_select(QM_PIN_ID_20, QM_PMUX_FN_2);

qm_pmux_select(QM_PIN_ID_21, QM_PMUX_FN_2);

qm_pmux_input_en(QM_PIN_ID_21, true);

/* Following 3 calls only needed when using RTS/CTS handshake */

qm_pmux_select(QM_PIN_ID_22, QM_PMUX_FN_2);

qm_pmux_select(QM_PIN_ID_23, QM_PMUX_FN_2);

qm_pmux_input_en(QM_PIN_ID_22, true);

Important Note

Once UART_B signals are multiplexed to the I/O pins, the JTAG functionality is no longer available, and the microcontroller cannot be controlled (flashed or debugged) using the JTAG interface. The boot ROM code provides a hook to allow reprogramming the microcontroller in this case. Use the following procedure to reprogram the microcontroller:

  1. Temporarily connect Intel® Quark™ Microcontroller D2000 pin number 5, package pin name F_13 to the ground.
    • On the Intel® Quark™ Microcontroller Developer Kit D2000, connect Arduino breakout pin 0 – RX to the GND pin using a jumper wire.
  2. Reboot the microcontroller.
    • On the Intel® Quark™ Microcontroller Developer Kit D2000, simply press the RESET button.
  3. Restart OpenOCD. In the Intel® System Studio for Microcontrollers, navigate to the Debug Perspective, find the OpenOCD Session window, click the red Stop OpenOCD button in the top right corner of that window, and then click the green Start OpenOCD button.
  4. Flash the microcontroller normally.
  5. Disconnect the pin connected in the step number 1 (otherwise the firmware will not run).

UART Clock Gating

To reduce the power consumption Intel® Quark™ Microcontroller D2000 allows enabling or disabling the clock for on-chip peripherals, including UART interfaces. The clock can be enabled for UART interfaces using the clk_periph_enable QMSI function as follows:

/* enable clock for UART_A */

clk_periph_enable(CLK_PERIPH_CLK | CLK_PERIPH_UARTA_REGISTER);

/* enable clock for UART_B */

clk_periph_enable(CLK_PERIPH_CLK | CLK_PERIPH_UARTB_REGISTER);

Default UART Configuration in QMSI BSP

By default, the QMSI BSP (board support package) configures UART_A as follows:

  • Only TX signal is multiplexed to the I/O pin (output only)
  • The baud rate is set to 115000 bps
  • The UART data format is configured for 8 data bits, no parity, and 1 stop bit

The following QMSI BSP functions handle UART setup:

  • _start() in sys/app_entry.c
  • stdout_uart_setup() in sys/newlib-syscalls.c

Writing data to UART Using QM_PRINTF

QMSI provides the QM_PRINTF function - a simplified version of the C standard library printf function that can be used to output data to the UART.

Default UART

By default, the QM_PRINTF function uses UART_A for the output. This is defined using the STDOUT_UART macro in include/qm_common.h. To use UART_B for the QM_PRINTF output, define STDOUT_UART_1 to 1. This should be done before #include <qm_common.h> directive. For example:

#define STDOUT_UART_1 (1)
#include <qm_common.h>

QM_PRINTF Limitations

Due to code size constraints QM_PRINTF supports only a subset of the printf format specifiers:

  • The following format specifiers are supported:
    • %d– signed integer; long “l” sub-specifier is ignored
    • %u– unsigned integer; long “l” sub-specifier is ignored
    • %X and %x– integer in hexadecimal format
    • %s– string
  • Padding is not supported. Format specifiers like %02d will result in an incorrect output.
  • The character format specifier %c is not supported.

Example using QM_PRINTF

The following code will print a message on the default UART.

QM_PRINTF("Welcome to Intel Quark Microcontroller D2000\r\n");

Configuring UART Parameters

Reading Current UART Configuration

QMSI uses the qm_uart_config_t structure to store the UART configuration. The current UART configuration can be read using the qm_uart_get_config function:

qm_rc_t qm_uart_get_config(const qm_uart_t uart, qm_uart_config_t *cfg)

The uart parameter specifies the UART interface: QM_UART_0 for UART_A, and QM_UART_1 for UARTB_B. The cfg parameter specifies the location of the qm_uart_config_t structure to store the configuration data into. For example, the following code will read the configuration of the UART_A.

qm_uart_config_t uart0_cfg;
qm_uart_get_config(QM_UART_0, &uart0_cfg);

Setting New UART Configuration

The new UART configuration can be set using the qm_uart_set_config function:

qm_rc_t qm_uart_set_config(const qm_uart_t uart, const qm_uart_config_t *cfg)

The uart parameter specifies the UART interface, and the cfg parameter specifies the location of the qm_uart_config_t structure with the new configuration parameters.

Setting Parameters in the qm_uart_config_t Structure

The qm_uart_config_t contains the following members:

line_control

The line_control variable contains the settings for the UART Line Control Register (LCR). This register configures the UART data format: number of data bits, number of stop bits, and parity settings. The include/qm_uart.h file provides definitions for commonly used LCR values:

/**
 * UART Line control.
 */
typedef enum {
	QM_UART_LC_5N1 = 0x00,   /**< 5 data bits, no parity, 1 stop bit */
	QM_UART_LC_5N1_5 = 0x04, /**< 5 data bits, no parity, 1.5 stop bits */
	QM_UART_LC_5E1 = 0x18,   /**< 5 data bits, even parity, 1 stop bit */
	QM_UART_LC_5E1_5 = 0x1c, /**< 5 data bits, even parity, 1.5 stop bits */
	QM_UART_LC_5O1 = 0x08,   /**< 5 data bits, odd parity, 1 stop bit */
	QM_UART_LC_5O1_5 = 0x0c, /**< 5 data bits, odd parity, 1.5 stop bits */
	QM_UART_LC_6N1 = 0x01,   /**< 6 data bits, no parity, 1 stop bit */
	QM_UART_LC_6N2 = 0x05,   /**< 6 data bits, no parity, 2 stop bits */
	QM_UART_LC_6E1 = 0x19,   /**< 6 data bits, even parity, 1 stop bit */
	QM_UART_LC_6E2 = 0x1d,   /**< 6 data bits, even parity, 2 stop bits */
	QM_UART_LC_6O1 = 0x09,   /**< 6 data bits, odd parity, 1 stop bit */
	QM_UART_LC_6O2 = 0x0d,   /**< 6 data bits, odd parity, 2 stop bits */
	QM_UART_LC_7N1 = 0x02,   /**< 7 data bits, no parity, 1 stop bit */
	QM_UART_LC_7N2 = 0x06,   /**< 7 data bits, no parity, 2 stop bits */
	QM_UART_LC_7E1 = 0x1a,   /**< 7 data bits, even parity, 1 stop bit */
	QM_UART_LC_7E2 = 0x1e,   /**< 7 data bits, even parity, 2 stop bits */
	QM_UART_LC_7O1 = 0x0a,   /**< 7 data bits, odd parity, 1 stop bit */
	QM_UART_LC_7O2 = 0x0e,   /**< 7 data bits, odd parity, 2 stop bits */
	QM_UART_LC_8N1 = 0x03,   /**< 8 data bits, no parity, 1 stop bit */
	QM_UART_LC_8N2 = 0x07,   /**< 8 data bits, no parity, 2 stop bits */
	QM_UART_LC_8E1 = 0x1b,   /**< 8 data bits, even parity, 1 stop bit */
	QM_UART_LC_8E2 = 0x1f,   /**< 8 data bits, even parity, 2 stop bits */
	QM_UART_LC_8O1 = 0x0b,   /**< 8 data bits, odd parity, 1 stop bit */
	QM_UART_LC_8O2 = 0x0f    /**< 8 data bits, odd parity, 2 stop bits */
} qm_uart_lc_t;

baud_divisor

The baud_divisor variable configures the UART baud rate. It contains the packed settings for UART divisor registers. Each UART has three divisor registers:

  • DLH - divisor latch high - 8 bit size
  • DLL- divisor latch low  - 8 bit size
  • DLF - divisor latch fraction - 4 bit size

These divisor registers define the ratio the system clock frequency needs to be divided by to obtain UART clock frequency (baud rate). The divisor registers’ values can be calculated as follows:

  1. Calculate the divisor, round to the nearest integer:
    divisor = system clock frequency / baud rate
  2. Calculate the divisor latch high value, round to the nearest smaller or equal integer (floor):
    dlh = divisor / 4096
  3. Calculate the divisor latch low value, round to the nearest smaller or equal integer (floor):
    dll = (divisor - dlh * 4096) / 16
  4. Calculate the divisor fraction value:
    dlf = divisor - dlh * 4096 - dll*16

The packed baud_divisor value can be then obtained using QM_UART_CFG_BAUD_DL_PACK(dlh, dll, dlf) macro.

For example, to get a 9600 baud rate, and assuming the system clock frequency of 32 MHz (as used in Intel® Quark™ Microcontroller Developer Kit D2000), the following calculation is used:

  1. divisor = 32000000 / 9600 = 3333 (rounded down from 3333.3333)
  2. dlh = divisor / 4096 = 3333 / 4096 = 0
  3. dll = (divisor - dlh * 4096) / 16 = (3333 - 0 * 4096) / 16 = 208
  4. dlf = divisor - dlh * 4096 - dll * 16 = 3333 - 0 * 4096 - 208 * 16 = 5

The table below gives divisor values for commonly used baud rates assuming 32 MHz system clock:

Baud ratedlh (divisor latch high)dll (divisor latch low)dlf (divisor latch fraction)

115200 bps

0

17

6

57600 bps

0

34

12

38400 bps

0

52

1

19200 bps

0

104

3

9600 bps

0

208

5

2400 bps

3

65

5

1200 bps

6

130

11

hw_fc

The hw_fc is a Boolean variable that enables or disables hardware flow control. Setting it to 0 disables hardware flow control, while setting it to 1 enables hardware flow control.

Configuring UART Example

The following snippet configures the UART_A to 9600 bps, 8 data bits, no parity, 1 stop bit, no hardware flow control. The calculation of the divisor values is given in the baud_divisor section above.

qm_uart_config_t uart0_cfg;

uart0_cfg.baud_divisor = QM_UART_CFG_BAUD_DL_PACK(0, 208, 5);

uart0_cfg.line_control = QM_UART_LC_8N1;

uart0_cfg.hw_fc = 0;

qm_uart_set_config(QM_UART_0, &uart0_cfg);

QMSI UART Input / Output Functions

Getting UART Status

QMSI provides the qm_uart_get_status function to obtain the current status of an UART.

qm_uart_status_t qm_uart_get_status(const qm_uart_t uart)

The uart parameter specifies the UART to get the status for.

The return value is a bitwise OR value of one or more of the following:

  • QM_UART_IDLE– UART TX FIFO and transmit shift registers are empty.
  • QM_UART_LSR_OE– Receiver overrun error
  • QM_UART_LSR_PE– Receiver parity error
  • QM_UART_LSR_FE– Receiver framing error
  • QM_UART_LSR_BI– “Break interrupt” condition
  • QM_UART_TX_BUSY– UART TX FIFO or transmit shift register are not empty
  • QM_UART_RX_BUSY– Received data is ready

Sending and Receiving Data

QMSI provides several functions with slightly different functionality to send and receive the data. They can be broken down into the following categories:

  • Blocking I/O – These functions wait for the data to be sent or received.
  • Non-blocking I/O – These functions return immediately. These functions do not check for errors, for the data availability, or for the space in TX FIFO. The caller should check for these conditions using the qm_uart_get_status function.
  • Interrupt-driven I/O – These functions send or receive data in the background, using UART interrupts to manage send and receive FIFOs.

Blocking I/O Functions

qm_rc_t qm_uart_write(const qm_uart_t uart, const uint8_t data)

The qm_uart_write function writes a single byte of data specified by the data parameter to the UART specified by the uart parameter. This is a blocking I/O function and it will wait (will not return) until the data is sent. Currently the qm_uart_write function always returns the QM_RC_OK value.

qm_uart_status_t qm_uart_read(const qm_uart_t uart, uint8_t *data)

The qm_uart_read function reads a single byte of data from the UART specified by the uart parameter. It stores the data in the location specified by the data pointer. This is a blocking I/O function and it will wait (will not return) until the data is available and can be read, or if an error occurs. The return value of this function is a bitwise OR value of one or more of the following:

  • QM_UART_OK– Data was read successfully
  • QM_UART_LSR_OE– Receiver overrun error
  • QM_UART_LSR_PE– Receiver parity error
  • QM_UART_LSR_FE– Receiver framing error
  • QM_UART_LSR_BI– “Break interrupt” condition

qm_rc_t qm_uart_write_buffer(const qm_uart_t uart, const uint8_t *const data, uint32_t len)

The qm_uart_write_buffer function is a blocking I/O function. It writes multiple bytes from a buffer specified by the data parameter. The number of the bytes to be written is specified by len parameter. Currently the qm_uart_write_buffer function always returns the QM_RC_OK value.

Non-blocking I/O Functions

qm_rc_t qm_uart_write_non_block(const qm_uart_t uart, const uint8_t data)

The qm_uart_write_non_block function writes a single byte of data specified by the data parameter to the UART specified by the uart parameter. This is a non-blocking I/O function. It will return immediately after writing the data to the UART’s transmitter holding register.  It does not check if the space is available in TX FIFO. Currently the qm_uart_write_non_block function always returns the QM_RC_OK value.

uint8_t qm_uart_read_non_block(const qm_uart_t uart)

The qm_uart_read_non_block function reads and returns a single byte of data from the UART specified by the uart parameter. This is a non-blocking I/O function. This function returns immediately, and does not check if received data is available, or if any receive errors had occurred. The return value is the content of the UART’s receive buffer register.

Interrupt-driven I/O – Transfer Structure

The interrupt-driven I/O functions use the qm_uart_transfer_t structure to pass the parameters for the data transfer. This structure contains the following members:

data

The data variable is a uint8_t* pointer to the buffer. For the write function, the buffer should contain the data to be sent. For the read function, it specifies the buffer to store the received data in.

data_len

The data_len variable is a uint32_t value specifying the number of the bytes to transfer.

fin_callback

The fin_callback variable contains the pointer to the callback function to be called once data transfer is complete. The application must define this callback (the err_callback cannot have NULL value).

The signature of the callback function is void fin_callback(uint32_t id, uint32_t len), where the id parameter specifies the transfer identifier (see id variable below), and the len parameter specifies the number of bytes successfully transferred.

err_callback

The err_callback variable contains the pointer to the callback function to be called in case of receive errors. Currently, it is not used for write operations. The application must define this callback for both read or write operations (the err_callback cannot have NULL value).

The signature of this callback function is void err_callback(uint32_t id, qm_uart_status_t status), where the id parameter specifies the transfer identifier (see id variable below), and the status parameter contains the error bits from line status register (LSR) – a bitwise OR value of one or more of the following:

  • QM_UART_LSR_OE– Receiver overrun error
  • QM_UART_LSR_PE– Receiver parity error
  • QM_UART_LSR_FE– Receiver framing error
  • QM_UART_LSR_BI– “Break interrupt” condition

id

The id parameter is a uint32_t transfer identifier. The application can use this parameter to identify transfers in callback functions. Note that QMSI supports only one active read and one active write transfer per UART.

Registering Interrupt Service Routine for Interrupt-driven I/O

Prior to using interrupt-driven I/O functions, it is necessary to register the QMSI UART interrupt service routine (ISR). This is done by calling the qm_irq_request function with the following parameters:

  • For UART_A
    qm_irq_request(QM_IRQ_UART_0, qm_uart_0_isr);
  • For UART_B:
    qm_irq_request(QM_IRQ_UART_1, qm_uart_1_isr);

Interrupt-driven I/O Functions

qm_uart_status_t qm_uart_irq_write(const qm_uart_t uart, const qm_uart_transfer_t *const xfer)

The qm_uart_irq_write function initiates interrupt-driven UART write transfer. The uart parameter specifies the UART to be used for the write transfer. The xfer parameter specifies the location of the transfer structure populated with data location, length, callback functions pointers, and transfer id.

The function returns one of the following values:

  • QM_UART_OK – Transfer was initiated successfully
  • QM_UART_TX_BUSY – UART TX FIFO or UART transmitter holding registers are busy. The transfer is not initiated in this case.

qm_uart_status_t qm_uart_irq_read(const qm_uart_t uart, const qm_uart_transfer_t *const xfer)

The qm_uart_irq_read function initiates interrupt-driven UART read transfer. The uart parameter specifies the UART to be used for the read transfer. The xfer parameter specifies the location of the transfer structure populated with data location, length, callback functions pointers, and transfer id.

The function returns one of the following values:

  • QM_UART_OK – Transfer was initiated successfully
  • QM_UART_RX_BUSY – Previous transfer has not been completed yet. The data length specified in previous request has not been read yet. The transfer is not initiated in this case.

qm_rc_t qm_uart_write_terminate(const qm_uart_t uart)

The qm_uart_write_terminate function terminates the current interrupt-driven write transfer for the UART specified by the uart parameter. It calls the transfer complete callback function, indicating the number of bytes that were actually written. Currently, qm_uart_write_terminate always returns QM_RC_OK.

qm_rc_t qm_uart_read_terminate(const qm_uart_t uart)

The qm_uart_read_terminate function terminates the current interrupt-driven read transfer for the UART specified by the uart parameter. It calls the transfer complete callback function, indicating the number of bytes that were actually read. Currently, the qm_uart_read_terminate function always returns the QM_RC_OK value.

Direct Access to UART Registers

To access the UART functionality that is not exposed through QMSI UART Input / Output functions, you can access the UART registers directly. The UART registers in the Intel® Quark™ Microcontroller D2000 are memory mapped and exposed to the QMSI programmer through the QM_UART array, containing qm_uart_reg_t structures for each one of the UARTs. The qm_uart_reg_t structure contains all the UART registers. For example, to access modem control register (MCR) of the UART_A, and set the loopback bit, you can use the following code:

QM_UART[QM_UART_0].mcr |= BIT(4);

Please refer to qm_soc_regs.h for the definition of the qm_uart_reg_t structure, and to the Intel® Quark™ Microcontroller D2000 Datasheet for the information about UART registers.

UART Programming Example: UART Setup and Blocking I/O

Description

The code below shows how to configure and use both UART interfaces. It configures UART_A for 115200 bps and UART_B for 9600 bps. Next, it scans both UART interfaces for available data. When a character becomes available on a UART, it is read, and written to the other UART.

Follow the next steps to run this code on the Intel® Quark™ Microcontroller Developer Kit D2000 board:

  1. Open Intel® Systems Studio for Microcontrollers.
  2. Create a new QMSI BSP project using hello_world as the template
  3. Replace main.c with the code below.
  4. Compile and flash the code to the microcontroller.
  5. Disconnect the board from the USB port, then move jumpers J9, J10, and J11 to CTS, TXD, and N/C positions respectively.
  6. Connect an FTDI USB TTL Serial Cable TTL-232R-3V3 cable to the Arduino breakout pins as follows:
    • FTDI cable pin 1 (GND) to GND pin on the Arduino breakout header
    • FTDI cable pin 3 (TX) to pin number 0 (RX) on the Arduino breakout header
    • FTDI cable pin 4 (RX) to pin number 1 (TX) on the Arduino breakout header
  7. Reconnect the board to the USB port. Connect the FTDI cable to the USB port as well.
    • On Windows OS, you must change the driver for the board to the FTDI CDM driver.
  8. Start terminal emulator software (for example PuTTY, screen, or minicom) on both USB virtual serial ports. Configure the port connected to the FTDI cable to 115200 bps, and the port connected to the developer kit board to 9600 bps.
  9. Reset the board. Observe the sign-in message on both terminals.
  10. Type some characters on one of the terminals. The typed characters should appear on the other terminal.
  11. Once done experimenting with this sample, reprogram the board using for example hello_world code, as described in the Important Note in the I/O Pin Multiplexing section.
    • On Windows OS, you must change the driver for the board to the WinUSB driver.

Code

/* UART configuration example */

#include <qm_common.h>
#include <qm_pinmux.h>
#include <qm_uart.h>
#include <qm_scss.h>

int main(void)
{
	qm_uart_config_t uart0_cfg;
	qm_uart_config_t uart1_cfg;
	uint8_t uart0_message[] =
	    "This is UART_A. Characters typed here will appear on UART_B.\r\n";
	uint8_t uart1_message[] ="This is UART_B. Characters typed here will appear on UART_A.\r\n";
	uint8_t data;

	/* UART_A I/O pins multiplexing setup
	 * QMSI BSP only configures UART_A_TXD signal.
	 * The following code configures both TXD and RXD signals
	 * and sets up RXD pin in input mode
	 */
	qm_pmux_select(QM_PIN_ID_12, QM_PMUX_FN_2); /* configure UART_A_TXD */
	qm_pmux_select(QM_PIN_ID_13, QM_PMUX_FN_2); /* configure UART_A_RXD */
	qm_pmux_input_en(QM_PIN_ID_13, true); /* UART_A_RXD is an input */

	/* UART_B I/O pins multiplexing setup
	 * By default UART_B pins are multiplexed to JTAG.
	 * The following code configures all relevant pins to UART_B signals
       * and sets up RXD and RTS pins in input mode
	 */
	qm_pmux_select(QM_PIN_ID_20, QM_PMUX_FN_2); /* configure UART_B_TXD */
	qm_pmux_select(QM_PIN_ID_21, QM_PMUX_FN_2); /* configure UART_B_RXD */
	qm_pmux_select(QM_PIN_ID_22, QM_PMUX_FN_2); /* configure UART_B_RTS */
	qm_pmux_select(QM_PIN_ID_23, QM_PMUX_FN_2); /* configure UART_B_CTS */
	qm_pmux_input_en(QM_PIN_ID_21, true); /* UART_B_RXD is an input */
	qm_pmux_input_en(QM_PIN_ID_22, true); /* UART_B_RTS is an input */

	/* Stores current UART_A configuration to uart0_cfg */
	qm_uart_get_config(QM_UART_0, &uart0_cfg);

	/* Configures UART_A for 115200 bps */
	uart0_cfg.baud_divisor = QM_UART_CFG_BAUD_DL_PACK(0, 17, 6);

	/* Configures UART_B for 9600 bps,
	 * 8 data bits, no parity, 1 stop bit
	 */
	uart1_cfg.baud_divisor = QM_UART_CFG_BAUD_DL_PACK(0, 208, 5);
	uart1_cfg.line_control = QM_UART_LC_8N1;
	uart1_cfg.hw_fc = 1;

	qm_uart_set_config(QM_UART_0, &uart0_cfg);
	qm_uart_set_config(QM_UART_1, &uart1_cfg);

	/* enable clock for UART_A */
	clk_periph_enable(CLK_PERIPH_CLK | CLK_PERIPH_UARTA_REGISTER);
	/* enable clock for UART_B */
	clk_periph_enable(CLK_PERIPH_CLK | CLK_PERIPH_UARTB_REGISTER);

	/* this message will only be printed on the default UART */
	QM_PRINTF("Welcome to Intel Quark D2000 UART configuration demo.\r\n");

	qm_uart_write_buffer(QM_UART_0, uart0_message, sizeof(uart0_message));
	qm_uart_write_buffer(QM_UART_1, uart1_message, sizeof(uart1_message));

	while(1) {
		/* checks if received data is available on UART_A */
		if (qm_uart_get_status(QM_UART_0) && QM_UART_RX_BUSY) {
			/* reads byte from UART_A */
			if (qm_uart_read(QM_UART_0, &data) == QM_UART_OK) {
				/* and sends to to UART_B */
				qm_uart_write(QM_UART_1, data);
				/* adds line feed in case of carriage return */
				if (data == '\r') {
					qm_uart_write(QM_UART_1, '\n');
				}
			}
		}
		/* checks if received data is available on UART_B */
		if (qm_uart_get_status(QM_UART_1) && QM_UART_RX_BUSY) {
			/* reads byte from UART_B */
			if (qm_uart_read(QM_UART_1, &data) == QM_UART_OK) {
				/* and sends to to UART_A */
				qm_uart_write(QM_UART_0, data);
				/* adds line feed in case of carriage return */
				if (data == '\r') {
					qm_uart_write(QM_UART_0, '\n');
				}
			}
		}
	}

	return 0;
}

UART Programming Example: Interrupt-driven I/O

Description

The code below shows how to use interrupt-driven I/O. It configures UART_A for 115200 bps, and prints a sign-in message. Next, it configures UART for loopback mode, sets up and starts the interrupt-driven transmit operation, and then sets up and starts interrupt driver receive operation. When both operations are complete, it disables the loopback mode and prints the received data.

Follow the next steps to run this code on the Intel® Quark™ Microcontroller Developer Kit D2000 board:

  1. Open Intel® Systems Studio for Microcontrollers.
  2. Create a new QMSI BSP project using hello_world as the template.
  3. Replace main.c with the code below.
  4. Compile and flash the code to the microcontroller.
  5. Connect an FTDI USB TTL Serial Cable TTL-232R-3V3 cable to the Arduino breakout pins as follows:
    • FTDI cable pin 1 (GND) to GND pin on the Arduino breakout header
    • FTDI cable pin 3 (TX) to pin number 0 (RX) on the Arduino breakout header
    • FTDI cable pin 4 (RX) to pin number 1 (TX) on the Arduino breakout header
  6. Start terminal emulator software (for example PuTTY, screen, or minicom) on the FTDI cable serial port. Configure the port connected to the FTDI cable to 115200 bps.
  7. Reset the board. Observe the messages on the terminal.

Code

/* UART interrupt driven I/O example */

#include <qm_pinmux.h>
#include <qm_uart.h>
#include <qm_interrupt.h>
#include <qm_scss.h>
#include <qm_power.h>

#define TX_XFER_ID 1
#define RX_XFER_ID 2

static void uart0_tx_callback(uint32_t id, uint32_t len);
static void uart0_rx_callback(uint32_t id, uint32_t len);
static void uart0_error_callback(uint32_t id, qm_uart_status_t status);
void uart_set_loopback_mode(const qm_uart_t uart, bool enable);

static uint8_t tx_buffer[] =
		"This is the test data being send and received by the UART.";
static uint8_t rx_buffer[64];

static bool tx_xfer_complete = false;
static bool rx_xfer_complete = false;
static bool rx_xfer_error = 0;

int main(void)
{
	qm_uart_config_t uart0_cfg;
	qm_uart_transfer_t tx_xfer, rx_xfer;

	/* Configures UART_A for 115200 bps */
	uart0_cfg.baud_divisor = QM_UART_CFG_BAUD_DL_PACK(0, 17, 6);
	uart0_cfg.line_control = QM_UART_LC_8N1;
	uart0_cfg.hw_fc = false;

	/* Multiplexes the UARTA RXD and RXD pins and enables input for RXD */
	qm_pmux_select(QM_PIN_ID_12, QM_PMUX_FN_2);
	qm_pmux_select(QM_PIN_ID_13, QM_PMUX_FN_2);
	qm_pmux_input_en(QM_PIN_ID_13, true);

	clk_periph_enable(CLK_PERIPH_CLK | CLK_PERIPH_UARTA_REGISTER);
	qm_uart_set_config(QM_UART_0, &uart0_cfg);

	QM_PRINTF("Welcome to Intel Quark D2000 UART interrupt driven I/O demo.\r\n");

	/* Enables UART loopback mode */
	uart_set_loopback_mode(QM_UART_0, true);

	/* Sets up interrupt service routine for UART_A */
	qm_irq_request(QM_IRQ_UART_0, qm_uart_0_isr);

	/* Sets up interrupt driven transmit request */
	tx_xfer.data = tx_buffer;
	tx_xfer.data_len = sizeof(tx_buffer);
	tx_xfer.fin_callback = uart0_tx_callback;
	tx_xfer.err_callback = uart0_error_callback;
	tx_xfer.id = TX_XFER_ID;

	/* Initiates interrupt driven transmit request */
	qm_uart_irq_write(QM_UART_0, &tx_xfer);

	/* Sets up interrupt driven receive request */
	rx_xfer.data = rx_buffer;
	rx_xfer.data_len = sizeof(tx_buffer);
	rx_xfer.fin_callback = uart0_rx_callback;
	rx_xfer.err_callback = uart0_error_callback;
	rx_xfer.id = RX_XFER_ID;

	/* Initiates interrupt driven receive request */
	qm_uart_irq_read(QM_UART_0, &rx_xfer);


	/* Waits for transfers to complete */
	while (!tx_xfer_complete || !rx_xfer_complete) {
		cpu_halt();
	}

	/* Disables UART loopback mode */
	uart_set_loopback_mode(QM_UART_0, false);

	if (0 == rx_xfer_error) {
		QM_PRINTF("UART interrupt driven transmit and receive complete.\r\n");
		QM_PRINTF("Buffer: %s\r\n", rx_buffer);
	} else {
		QM_PRINTF("UART interrupt driven I/O error.\r\n");
		QM_PRINTF("UART LSR error bits: 0x%d\r\n", rx_xfer_error);
	}

	while(1) {
		cpu_halt();
	}

	return 0;
}

void uart0_tx_callback(uint32_t id, uint32_t len)
{
	switch (id) {

	case TX_XFER_ID:
		tx_xfer_complete = true;
		break;

	default:
		break;
	}
}

void uart0_rx_callback(uint32_t id, uint32_t len)
{
	switch (id) {

	case RX_XFER_ID:
		rx_xfer_complete = true;
		break;
	default:
		break;
	}
}

void uart0_error_callback(uint32_t id, qm_uart_status_t status)
{
	rx_xfer_error = status;
}

void uart_set_loopback_mode(const qm_uart_t uart, bool enable)
{
	if (enable) {
		QM_UART[uart].mcr |= BIT(4);
	} else {
		QM_UART[uart].mcr &= ~BIT(4);
	}
}

 

Recipe: Optimized Caffe* for Deep Learning on Intel® Xeon Phi™ processor x200

$
0
0

Overview

The computer learning code Caffe* has been optimized for Intel® Xeon Phi™ processors. This article provides detailed instructions on how to compile and run this Caffe* optimized for Intel® architecture to obtain the best performance on Intel Xeon Phi processors.

Introduction

Caffe is a popular open source deep learning framework developed by the Berkeley Vision and Learning Center (BVLC) and community contributors. Together with AlexNet, a neural network topology for image recognition, and ImageNet, a database of labeled images, Caffe is often used as a benchmark in the domain of image classification. An Intel version of BVLC Caffe, referred to as Caffe optimized for Intel architecture in the rest of this article, has been created to optimize the framework performance for Intel architectures. These optimizations are available on Github for the broader deep learning user community.

Intel Xeon Phi processors x200 are the latest generation of Intel® Many Integrated Core Architecture (Intel® MIC Architecture) family of architecture. Continuing the performance leadership demonstrated by previous generations of Intel® Xeon® and Intel® Xeon Phi™ product family, Intel Xeon Phi processors x200 are targeting high performance computing applications and the emerging machine learning and deep learning applications. Intel Xeon Phi processors x200 introduce several state-of-the-art features – a compute core with two 512-bit Vector Processing Units (VPU) capable of doing a total of 2 Fused-Multiply-Add (FMA) per clock cycle per core and an on chip Multi-Channel DRAM (MCDRAM) memory which provides significantly higher bandwidth than DDR4 memory.

Preliminaries

Download the latest version of Caffe optimized for Intel architecture by cloning the repository:

git clone https://github.com/intelcaffe/caffe

Caffe depends on several external libraries that can be installed from your Linux* distribution repositories. The required pre-requisites are well documented and are posted here for user convenience.

  1. On RHEL*/CentOS*/Fedora* systems:

    sudo yum install protobuf-devel leveldb-devel snappy-devel opencv-devel boost-devel hdf5-devel gflags-devel glog-devel lmdb-devel

  2. On Ubuntu* systems:
    1. sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler libgflags-dev libgoogle-glog-dev liblmdb-dev
    2. sudo apt-get install --no-install-recommends libboost-all-dev

Apart from the above listed dependencies, Caffe optimized for Intel architecture requires Intel® Math Kernel Library (Intel® MKL) 2017 Beta update 1 or later releases and OpenMP* run-time library to obtain optimal performance on Intel Xeon Phi processor x200. These libraries are provided in Intel® Parallel Studio XE 2017 Beta software suite and can be downloaded by filling the registrationform.

After the registration and download is complete, follow the instructions provided with the package to install Intel® C++ Compiler 17.0 Pre-Release (Beta) Update 1 and Intel Math Kernel Library 2017 Pre-Release (Beta) Update 1 for C/C++.

Build Caffe optimized for Intel architecture for Intel Xeon Phi processor

Setup the shell environment to use Intel C/C++ Compilers and Intel MKL by sourcing the corresponding shell script (assuming the installation directory is /opt/intel/), for example:

For sh/bash: source /opt/intel/bin/compilervars.sh intel64

For c/tcsh: source /opt/intel/bin/compilervars.csh intel64

Change directory to the location where the Caffe optimized for Intel architecture repository is cloned and build the framework to use Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) API’s which provide optimized implementations of convolution, pooling, normalization and other key DNN operations on Intel Xeon Phi processor x200. The below commands will generate a executable binary, named caffe, at /opt/caffe/build/tools/ (assuming the repository is cloned to /opt/caffe):

cd /opt/caffe; mkdir build; cd build;
cmake -DCPU_ONLY=on -DBLAS=mkl -DUSE_MKL2017_AS_DEFAULT_ENGINE=on  /opt/caffe/
make –j 68 all

We will use the AlexNet network topology for image classification to benchmark the performance of Caffe optimized for Intel architecture on Intel Xeon Phi processor x200. Caffe optimized for Intel architecture provides AlexNet topology files located at /opt/caffe/models/mkl2017_alexnet/ which sets the “engine” parameter for the different layers of the neural network (direct batched convolution, maximum pooling, local response normalization across channels (LRN), rectified linear unit (ReLU)) to “MKL2017” corresponding to the usage of Intel MKL-DNN API’s at run-time. AlexNet input file uses image data stored in Lightning Memory-mapped Database (lmdb) format files (data.mdb, lock.mdb) and are required for benchmarking. The ImageNet dataset files can be obtained from here.

Run Caffe optimized for Intel architecture on the Intel Xeon Phi processor

The Intel Xeon Phi processor x200 supports different memory modes, to obtain the best performance on Caffe optimized for Intel architecture it is recommended to run out of MCDRAM memory in “Flat” mode. The standard Linux utility, “numactl” is used to allocate memory buffers in MCDRAM. In MCDRAM Flat mode, DDR and MCDRAM memory are exposed as distinct, addressable NUMA nodes (numactl -H shows this info). More information about MCDRAM and Flat, Cache and Hybrid modes can be found here.

Before running the executable, set the OpenMP environment variables for numbers of threads and thread pinning to physical processor cores:

export OMP_NUM_THREADS=<number_of_cores which implies 64 or 68  depending on Intel Xeon Phi x200 SKU>
export KMP_AFFINITY=granularity=fine,compact,1,0

Since the goal of this benchmark is to measure performance and not to train an end-to-end image classification model, we will use the Caffe “time” mode with the default of 50 iterations comprised of forward and backward passes:

numactl –m 1 /opt/caffe/build/tools/caffe time --model=/opt/caffe/models/mkl2017_alexnet/train_val.prototxt

The above step produces timing statistics (in milliseconds) for average forward (FW) and backward (BW) passes across 50 iterations for processing a batch of images. Currently, the input files provided in models/mkl2017_alexnet/ directory are set to use 256 images, which is the recommended batch size to obtain ideal performance (refer to the /opt/caffe/models/mkl2017_alexnet/train_val.prototxt file for future changes in the number of images). The time spent in FW, BW passes is used in calculating the training rate as:

More Details

For more details on various configuration and run parameters of Caffe framework, please refer to this in-depth article.

About the Author

Vamsi Sripathi is a software engineer at Intel since 2010. He has a Masters' degree in Computer Science from North Carolina State University, USA. During his tenure at Intel, he worked on the performance optimization of Basic Linear Algebra Subroutines (BLAS) in Intel Math Kernel Library (MKL) spanning multiple generations of Intel Xeon and Intel Xeon Phi architectures. Recently, he has been working on the optimization of deep learning algorithms and frameworks for Intel architectures

Masked Software Occlusion Culling

$
0
0

By Jon Hasselgren, Magnus Andersson, and Tomas Akenine-Möller
Intel Corporation

Efficient occlusion culling in dynamic scenes is a very important topic to the game and real-time graphics community in order to accelerate rendering. We present a novel algorithm inspired by recent advances in depth culling for graphics hardware, but adapted and optimized for SIMD-capable CPUs. Our algorithm has very low memory overhead and is 3x faster than previous work, while culling 98% of all triangles culled by a full resolution depth buffer approach. It supports interleaving occluder rasterization and occlusion queries without penalty, making it easy to use in scene graph traversal or rendering code.

Source code
The source code for this paper is available online through Intel's developer zone GitHub. There is a light weight Masked Occlusion Culling Library, which has also been integrated into Intel's Occlusion Culling Sample. Please note that the performance may not exactly match the results of the paper, as we have reworked the code to allow for a simple API.

Intel® Quark™ Microcontroller D2000 - Accelerometer Tutorial

$
0
0

Intel® System Studio for Microcontrollers includes multiple samples to help you get up to speed with its basic functionality and become familiar with the set of Intel® Quark™ Microcontroller Software Interface (QMSI) APIs that work with your board. This example reads and outputs accelerometer data to the serial port, and can optionally use the Intel® Integrated Performance Primitives (Intel IPP) to compute root mean square, variance, and mean for the last 15 Z axis readings.

Requirements

Instructions

1. Connect the FTDI cable for serial output by connecting the cables in the following configuration:

  • Connect GND (black) to the serial to the board's GND pin.
  • Connect TXD (orange) to the serial cable to the board's RX pin.
  • Connect RXD (yellow) to the serial cable to the board's TX pin.

2. If you haven’t already, launch the Intel ® System Studio for Microcontrollers software.

3. Create a project with the "Accelerometer" sample project file, as follows:

a. From the File menu, select New, and then select Intel QMSI/BSP Project. The Create new Intel QMSI/BSP Project dialog box appears.

b. Specify the following values in the Intel QMSI/BSP Project dialog box:

    •  Project Name: Accelerometer

    • Template: hello_world

    •  Intel Quark target: D2000

    •  Create launch configuration: selected (checked)

    •  Connection: USB Onboard

c. Click Finish.

4. Set up your terminal to view sensor output as follows:

a. From the toolbar, click the Terminal button and choose Serial Terminal.

b. Configure the serial connection (defaults):

  • Port: The active port should be displayed.

    Tip: The port will vary depending on the serial hardware used, and there may be more than one listed.

    Linux*: Use the ‘dmesg’ command to view your port status.

    Windows*: Open Device Manager to view the Ports (COM & LPT) status.

  • Baud Rate: 115200
  • Data Bits: 8
  • Parity: None
  • Stop Bits: 1

5. Build and deploy your project.

a. Select the "Accelerometer" project in the Project Explorer.

b. Click the Build button to compile your project.

c. From the Run drop-down list, select "Accelerometer (flashing)".

Note: you can also deploy and debug. From the Debug drop-down list, select "Accelerometer (flashing)".

d. View X, Y, and Z values from the accelerometer in the terminal.

How It Works

The accelerometer sample uses the onboard Bosch BMC150 or Bosch BMI160 accelerometers connected to the microcontroller using the I2C interface, and the RTC (real time clock) integrated in Intel® Quark™ microcontroller. It also uses the integrated UART module for the data output over serial port.

The sample begins with setting up RTC parameters in an rtc configuration structure:

<code>
      rtc.init_val = 0;
      rtc.alarm_en = true;
      rtc.alarm_val = INTERVAL;
      rtc.callback = print_accel_callback;
      rtc.callback_data = NULL;</code>

This configuration enables the RTC alarm, and sets print_accel_callback as the callback function for the RTC alarm. It is used to periodically print accelerometer data.

Next, the code requests an interrupt for the RTC by using a QMSI API call, and also enables the RTC clock:

<code>
      qm_irq_request(QM_IRQ_RTC_0, qm_rtc_isr_0);
      /* Enable the RTC. */
      clk_periph_enable(CLK_PERIPH_RTC_REGISTER | CLK_PERIPH_CLK);</code>

After that, it configures the accelerometer parameters depending on the accelerometer type (BMC150 or BMI160):

<code>
#if (QUARK_D2000)
      cfg.pos = BMC150_J14_POS_0;
#endif /* QUARK_D2000 */

       /* Initialise the sensor config and set the mode. */
      bmx1xx_init(cfg);
      bmx1xx_accel_set_mode(BMX1XX_MODE_2G);

#if (BMC150_SENSOR)
       bmx1xx_set_bandwidth(BMC150_BANDWIDTH_64MS); /* Set the bandwidth. */
#elif(BMI160_SENSOR)
       bmx1xx_set_bandwidth(BMI160_BANDWIDTH_10MS); /* Set the bandwidth. */
#endif /* BMC150_SENSOR */
</code>

The cfs.pos parameter is used to configure the accelerometer address for the BMC150 accelerometer. The Intel® Quark™ Microcontroller D2000 Developer Kit board has a jumper that allows changing the address. The BMC150_J14_POS_0 is the default (no jumper, I2C address 0x10) configuration.

Next, the main() function sets up the RTC configuration, thus enabling the RTC alarm:

<code>
       /* Start the RTC. */
      qm_rtc_set_config(QM_RTC_0, &rtc);</code>

A while loop is used to wait for the defined number of samples from the accelerometer to be read and printed to the serial console output:

<code>
       /* Wait for the correct number of samples to be read. */
       while (!complete)
       ;</code>

Each time the 125-millisecond interval is reached and the RTC alarm is triggered, the following accel_callback function is invoked. The accel data structure defined at the start of the function is passed into the bmx1xx_read_accel function, which populates it with the current accelerometer data read. If this read is successful, the accelerometer data is printed to the serial console output; otherwise, an error message is printed.

<code>
/* Accel callback will run every time the RTC alarm triggers. */
static void accel_callback(void *data)
{
       bmx1xx_accel_t accel = {0};

       if (0 == bmx1xx_read_accel(&accel)) {
              QM_PRINTF("x %d y %d z %d\n", accel.x, accel.y, accel.z);
       } else {
              QM_PUTS("Error: unable to read from sensor");
       }</code>

Next, if IPP is enabled, it prints the statistics for the Z axis by calling the print_axis_stats function. (See the print_axis_stats function description, below.)

<code>
#if (__IPP_ENABLED__)
       print_axis_stats(accel.z);
#endif /* __IPP_ENABLE__ */</code>

The callback function checks whether the defined number of samples have been read; if not, the RTC alarm is reset and the count is incremented, otherwise the complete variable is set to true.

<code>
       /* Reset the RTC alarm to fire again if necessary. */
       if (cb_count < NUM_SAMPLES) {
              qm_rtc_set_alarm(QM_RTC_0,
                            (QM_RTC[QM_RTC_0].rtc_ccvr + INTERVAL));
              cb_count++;
       } else {
              complete = true;
       }</code>

Note that the application by default reads 500 samples (NUM_SAMPLES) before exiting.

Finally, when the complete variable is set to true, the while loop exits and the applications prints a final statement to the serial console output, and exits.

<code>
       QM_PUTS("Finished: Accelerometer example app");

       return 0;
}
</code>

The print_axis_stats function uses the Intel IPP (Intel® Integrated Performance Primitives) library to print the statistics of the last 15 (set by NUM_SAMPLES) Z axis readings.

First, it updates the samples array with the new Z axis sample, and if needed, updates the samples count:

<code>
static void print_axis_stats(int16_t value)
{
       static uint32_t index = 0;
       static uint32_t count = 0;
       float32_t mean, var, rms;

       /* Overwrite the oldest sample in the array. */
       samples[index] = value;
       /* Move the index on the next position, wrap around if necessary. */
       index = (index + 1) % SAMPLES_SIZE;

       /* Store number of samples until it reaches SAMPLES_SIZE. */
       count = count == SAMPLES_SIZE ? SAMPLES_SIZE : count + 1;
</code>

Next, it calculates and prints root mean square, variance, and mean values for the collected samples:

<code>
/* Get the root mean square (RMS), variance and mean. */
       ippsq_rms_f32(samples, count, &rms);
       ippsq_var_f32(samples, count, &var);
       ippsq_mean_f32(samples, count, &mean);

       QM_PRINTF("rms %d var %d mean %d\n", (int)rms, (int)var, (int)mean);
}</code>

Code

/*
* Copyright (c) 2016, Intel Corporation
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* 1. Redistributions of source code must retain the above copyright notice,
*     this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright notice,
*     this list of conditions and the following disclaimer in the documentation
*    and/or other materials provided with the distribution.
* 3. Neither the name of the Intel Corporation nor the names of its
*     contributors may be used to endorse or promote products derived from this
*     software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE INTEL CORPORATION OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/

/*
* QMSI Accelerometer app example.
*
* This app will read the accelerometer data from the onboard BMC150/160 sensor
* and print it to the console every 125 milliseconds. The app will complete
* once it has read 500 samples.
*
* If the app is compiled with the Intel(R) Integrated Performance Primitives
* (IPP) library enabled, it will also print the Root Mean Square (RMS),
* variance and mean of the last 15 samples each time.
*/

#include
#if (__IPP_ENABLED__)
#include
#endif
#include "clk.h"
#include "qm_interrupt.h"
#include "qm_isr.h"
#include "qm_rtc.h"
#include "qm_uart.h"
#include "bmx1xx/bmx1xx.h"

#define INTERVAL (QM_RTC_ALARM_SECOND >> 3) /* 125 milliseconds. */
#define NUM_SAMPLES (500)
#if (__IPP_ENABLED__)
/* Number of samples to use to generate the statistics from. */
#define SAMPLES_SIZE (15)
#endif /* __IPP_ENABLED__ */

static volatile uint32_t cb_count = 0;
static volatile bool complete = false;

#if (__IPP_ENABLED__)
static float32_t samples[SAMPLES_SIZE];

static void print_axis_stats(int16_t value)
{
       static uint32_t index = 0;
       static uint32_t count = 0;
       float32_t mean, var, rms;

       /* Overwrite the oldest sample in the array. */
       samples[index] = value;
       /* Move the index on the next position, wrap around if necessary. */
       index = (index + 1) % SAMPLES_SIZE;

       /* Store number of samples until it reaches SAMPLES_SIZE. */
       count = count == SAMPLES_SIZE ? SAMPLES_SIZE : count + 1;

       /* Get the root mean square (RMS), variance and mean. */
       ippsq_rms_f32(samples, count, &rms);
       ippsq_var_f32(samples, count, &var);
       ippsq_mean_f32(samples, count, &mean);

       QM_PRINTF("rms %d var %d mean %d\n", (int)rms, (int)var, (int)mean);
}
#endif /* __IPP_ENABLE__ */

/* Accel callback will run every time the RTC alarm triggers. */
static void accel_callback(void *data)
{
       bmx1xx_accel_t accel = {0};

       if (0 == bmx1xx_read_accel(&accel)) {
              QM_PRINTF("x %d y %d z %d\n", accel.x, accel.y, accel.z);
       } else {
              QM_PUTS("Error: unable to read from sensor");
       }

#if (__IPP_ENABLED__)
       print_axis_stats(accel.z);
#endif /* __IPP_ENABLE__ */

       /* Reset the RTC alarm to fire again if necessary. */
       if (cb_count < NUM_SAMPLES) {
              qm_rtc_set_alarm(QM_RTC_0,
                            (QM_RTC[QM_RTC_0].rtc_ccvr + INTERVAL));
              cb_count++;
       } else {
              complete = true;
       }
}

int main(void)
{
       qm_rtc_config_t rtc;
       bmx1xx_setup_config_t cfg;

       QM_PUTS("Starting: Accelerometer example app");

       /* Configure the RTC and request the IRQ. */
       rtc.init_val = 0;
       rtc.alarm_en = true;
       rtc.alarm_val = INTERVAL;
       rtc.callback = accel_callback;
       rtc.callback_data = NULL;

       qm_irq_request(QM_IRQ_RTC_0, qm_rtc_isr_0);

       /* Enable the RTC. */
       clk_periph_enable(CLK_PERIPH_RTC_REGISTER | CLK_PERIPH_CLK);

#if (QUARK_D2000)
       cfg.pos = BMC150_J14_POS_0;
#endif /* QUARK_D2000 */

       /* Initialise the sensor config and set the mode. */
       bmx1xx_init(cfg);
       bmx1xx_accel_set_mode(BMX1XX_MODE_2G);

#if (BMC150_SENSOR)
       bmx1xx_set_bandwidth(BMC150_BANDWIDTH_64MS); /* Set the bandwidth. */
#elif(BMI160_SENSOR)
       bmx1xx_set_bandwidth(BMI160_BANDWIDTH_10MS); /* Set the bandwidth. */
#endif /* BMC150_SENSOR */

       /* Start the RTC. */
       qm_rtc_set_config(QM_RTC_0, &rtc);

       /* Wait for the correct number of samples to be read. */
       while (!complete)
              ;

       QM_PUTS("Finished: Accelerometer example app");

       return 0;
}

 

Troubleshooting on FLEXlm License Manager Error: "Vendor daemon can't talk to lmgrd"

$
0
0

Problem

Start Flexlm license manager by clicking "Start" button few times but the prompt "License....can not start" is still there. Closed license manager, then re-open it, the License manager is running some how, button "Stop" appear.

From task manager, I see the process "LMGRD.exe"(Service of license manager) is running. After waiting for a moment, INTEL.exe just start/stop (appeared / deleted) rapidly. After a little more, LMGRD.exe is disappeared too.

From license server log, found below error message after license manager started:

11:35:36 (lmgrd) Starting vendor daemons ...
11:35:36 (lmgrd) License server manager (lmgrd) startup failed:
11:35:36 (lmgrd) File not found, 28000
11:35:36 (lmgrd) Started INTEL (pid 11396)
11:35:36 (INTEL) FlexNet Licensing version v11.12.0.0 build 136775 x64_n6
...
11:35:39 (INTEL) Vendor daemon can't talk to lmgrd (Cannot connect to license server system. (-15,10:10061 "WinSock: Connection refused"))
11:35:39 (INTEL) EXITING DUE TO SIGNAL 28 Exit reason 5
11:35:41 (lmgrd) INTEL exited with status 28 (Communications error)

Firewall is already turned off.

Version

Windows 7

FlexNet Licensing v11.12.0.0

Solution

1. Stop the license service using LMTOOLS.EXE or in Windows Services.
2. Stop any process in Task Manager like lmgrd.exe and adskflex.exe (or whatever the vendor daemon is named).
3. Start the DEP program from Control Panel > System > Advanced > Performance Settings > Data Excution Prevention.
4. Add an exception for LMGRD.exe, INTEL.exe and possibly also LMUTIL.EXE, LMTOOLS.EXE where the license manager is installed. Or you may select turn off DEP for all programs.

Hack the Senses - Hackathon

$
0
0

The Hack the Senses hackathon was themed around perception and the senses and took place 24-26 June in FabLab, the City of London’s first purpose built digital fabrication and rapid prototyping workspace. The event was organized within the framework of a Wellcome Trust public engagement project that explores the ways in which sensory augmentation technology can expand the scope of human perception. Participants were given the challenge of prototyping new devices and applications that augment an existing sense, allow us to acquire a new sense, or translate between sensory modalities. In order to accomplish this, they were provided with a broad range of equipment, such as microcontrollers of different types and sizes, a variety of sensors, from heat, through light, UV, IR, touch, force, etc., as well thermal imaging cameras, vibrating motors, heat pads, and other goodies. In addition, they also had access to virtual and augmented reality headsets and brain-computer interfaces. The event turned out to be a great success and participants brought a very interesting mix of backgrounds and skills, from neuroscience and psychology through coding and engineering to art and design. Close to 40% of attendees were female. The Code for Good prize was awarded to a team called Belka, whose project is capable of adapting an audiovisual VR environment on the basis of a person’s brain activity recorded via an OpenBCI EEG kit. The prototypes coming out of the event will be used in a string of public engagement activities in the United Kingdom to facilitate discussions about the possibilities and significance of sensory augmentation technologies. We are grateful to Intel for sponsoring a prize at our event!

 

Team Belka - Winners of the Code for Good prize working on their prototype

 

Hack the Senses participants immersed in their work
Help from Dr Giles Hamilton-Fletcher on prototyping sensory augmentation hardware
A new way to do vision to sound sensory substitution

Intel® XDK FAQs - App Designer

$
0
0

Which App Designer framework should I use? Which Intel XDK layout framework is best?

There is no "best" App Designer framework. Each framework has pros and cons. You should choose that framework which serves your application needs best. The list below provides a quick list of pros and cons for each of the frameworks that are available as part of App Designer.

The "non-deprecated" UI frameworks are shown in the recommended order of "easiest to apply" to "most difficult to apply" as part of an Intel XDK app.

  • Twitter Bootstrap 3 -- PRO: a very clean UI framework that relies primarily on CSS with very little JavaScript trickery. Thriving third-party ecosystem with many plugins and add-ons, including themes. Probably the best place to start, especially for UI beginners. CON: some advanced mobile UI mechanisms (like swipe delete) are not part of this framework.

  • Framework 7 -- PRO: provides pixel perfect layout with device-specific UI elements for Android and iOS platforms. CON: difficult to customize and modify; requires a strict adherence to the Framework 7 "rules of layout" for best results. You should have a good understanding of how Framework 7 works before using this framework to build your app!

  • Ionic -- PRO: a very sophisticated mobile framework with many features. If you are familiar with and comfortable with Angular this framework may be a good choice for you. CON: tightly coupled with Angular, many features can only be accessed by writing JavaScript Angular directives. If you are not familiar or comfortable with Angular this is not a good choice!

  • App Framework 3 -- This UI framework has been deprecated and will be retired from App Designer in a future release of the Intel XDK. You can always use this (or any mobile) framework with the Intel XDK, but you will have to do so manually, without the help of the Intel XDK App Designer UI layout tool. If you wish to continue using App Framework please visit the App Framework project page and the App Framework GitHub repo for documentation and help.

  • Topcoat -- This UI framework has been deprecated and will be retired from App Designer in a future release of the Intel XDK. You can always use this (or any mobile) framework with the Intel XDK, but you will have to do so manually, without the help of the Intel XDK App Designer UI layout tool. If you wish to continue using Topcoat please visit the Topcoat project page and the Topcoat GitHub repo for documentation and help.

  • Ratchet -- This UI framework has been deprecated and will be retired from App Designer in a future release of the Intel XDK. You can always use this (or any mobile) framework with the Intel XDK, but you will have to do so manually, without the help of the Intel XDK App Designer UI layout tool. If you wish to continue using Ratchet please visit the Ratchet project page and the Ratchet GitHub repo for documentation and help.

  • jQuery Mobile -- This UI framework has been deprecated and will be retired from App Designer in a future release of the Intel XDK. You can always use this (or any mobile) framework with the Intel XDK, but you will have to do so manually, without the help of the Intel XDK App Designer UI layout tool. If you wish to continue using jQuery Mobile please visit the jQuery Mobile API page and jQuery Mobile GitHub page for documentation and help.

What does the Google* Map widget’s "center type" attribute and its values "Auto calculate,""Address" and "Lat/Long" mean?

The "center type" parameter defines how the map view is centered in your div. It is used to initialize the map as follows:

  • Lat/Long: center the map on a specific latitude and longitude (that you provide on the properties page)
  • Address: center the map on a specific address (that you provide on the properties page)
  • Auto Calculate: center the map on a collection of markers

This is just for initialization of the map widget. Beyond that you must use the standard Google maps APIs to move and/or modify the map. See the "google_maps.js" code for initialization of the widget and some calls to the Google maps APIs. There is also a pointer to the Google maps API at the beginning of the JS file.

To get the current position, you have to use the Geo API, and then push that into the Maps API to display it. The Google Maps API will not give you any device data, it will only display information for you. Please refer to the Intel XDK "Hello, Cordova" sample app for some help with the Geo API. There are a lot of useful comments and console.log messages.

How do I size UI elements in my project?

Trying to implement "pixel perfect" user interfaces with HTML5 apps is not recommended as there is a wide array of device resolutions and aspect ratios and it is impossible to insure you are sized properly for every device. Instead, you should use "responsive web design" techniques to build your UI so that it adapts to different sizes automatically. You can also use the CSS media query directive to build CSS rules that are specific to different screen dimensions.

Note:The viewport is sized in CSS pixels (aka virtual pixels or device independent pixels) and so the physical pixel dimensions are not what you will normally be designing for.

How do I create lists, buttons and other UI elements with the Intel XDK?

The Intel XDK provides you with a way to build HTML5 apps that are run in a webview on the target device. This is analogous to running in an embedded browser (refer to this blog for details). Thus, the programming techniques are the same as those you would use inside a browser, when writing a single-page client-side HTML5 app. You can use the Intel XDK App Designer tool to drag and drop UI elements.

Why is the user interface for Chrome on Android* unresponsive?

It could be that you are using an outdated version of the App Framework* files. You can find the recent versions here. You can safely replace any App Framework files that App Designer installed in your project with more recent copies as App Designer will not overwrite the new files.

How do I work with more recent versions of App Framework* since the latest Intel XDK release?

You can replace the App Framework* files that the Intel XDK automatically inserted with more recent versions that can be found here. App designer will not overwrite your replacement.

Is there a replacement to XPATH in App Framework* for selecting nodes from an XML document?

This FAQ applies only to App Framework 2. App Framework 3 no longer includes a replacement for the jQuery selector library, it expects that you are using standard jQuery.

App Framework is a UI library that implements a subset of the jQuery* selector library. If you wish to use jQuery for XPath manipulation, it is recommend that you use jQuery as your selector library and not App Framework. However, it is also possible to use jQuery with the UI components of App Framework. Please refer to this entry in the App Framework docs.

It would look similar to this:

<script src="lib/jq/jquery.js"></script><script src="lib/af/jq.appframework.js"></script><script src="lib/af/appframework.ui.js"></script>

Why does my App Framework* app that was previously working suddenly start having issues with Android* 4.4?

Ensure you have upgraded to the latest version of App Framework. If your app was built with the now retired Intel XDK "legacy" build system be sure to set the "Targeted Android Version" to 19 in the Android-Crosswalk build settings. The legacy build targeted Android 4.2.

How do I manually set a theme?

If you want to, for example, change the theme only on Android*, you can add the following lines of code:

  1. $.ui.autoLaunch = false; //Stop the App Framework* auto launch right after you load App Framework*
  2. Detect the underlying platform using either navigator.userAgent or intel.xdk.device.platform or window.device.platform. If the platform detected is Android*, set $.ui.useOSThemes=false todisable custom themes and set <div id=”afui” class=”android light”>
  3. Otherwise, set $.ui.useOSThemes=true;
  4. When device ready and document ready have been detected, add $.ui.launch();

How does page background color work in App Framework?

In App Framework the BODY is in the background and the page is in the foreground. If you set the background color on the body, you will see the page's background color. If you set the theme to default App Framework uses a native-like theme based on the device at runtime. Otherwise, it uses the App Framework Theme. This is normally done using the following:

<script>
$(document).ready(function(){ $.ui.useOSThemes = false; });</script>

Please see Customizing App Framework UI Skin for additional details.

What kind of templates can I use to create App Designer projects?

Currently, you can only create App Designer projects by selecting the blank 'HTML5+Cordova' template with app designer (select the app designer check box at the bottom of the template box) and the blank 'Standard HTML5' template with app designer. 

There were app designer versions of the layout and user interface templates that were removed in the Intel XDK 3088 version. 

My AJAX calls do not work on Android; I'm getting valid JSON data with an invalid return code.

The jQuery 1 library appears to be incompatible with the latest versions of the cordova-android framework. To fix this issue you can either upgrade your jQuery library to jQuery 2 or use a technique similar to that shown in the following test code fragment to check your AJAX return codes. See this forum thread for more details. 

The jQuery site only tests jQuery 2 against Cordova/PhoneGap apps (the Intel XDK builds Cordova apps). See the How to Use It section of this jQuery project blog > https://blog.jquery.com/2013/04/18/jquery-2-0-released/ for more information.

If you built your app using App Designer, it may still be using jQuery 1.x rather than jQuery 2.x, in which case you need to replace the version of jQuery in your project. Simply download and replace the existing copy of jQuery 1.x in your project with the equivalent copy of jQuery 2.x.

Note, in particular, the switch case that checks for zero and 200. This test fragment does not cover all possible AJAX return codes, but should help you if you wish to continue to use a jQuery 1 library as part of your Cordova application.

function jqueryAjaxTest() {

     /* button  #botRunAjax */
     $(document).on("click", "#botRunAjax", function (evt) {
         console.log("function started");
         var wpost = "e=132&c=abcdef&s=demoBASICA";
         $.ajax({
             type: "POST",
             crossDomain: true, //;paf; see http://stackoverflow.com/a/25109061/2914328
             url: "http://your.server.url/address",
             data: wpost,
             dataType: 'json',
             timeout: 10000
         })
         .always(function (retorno, textStatus, jqXHR) { //;paf; see http://stackoverflow.com/a/19498463/2914328
             console.log("jQuery version: " + $.fn.jquery) ;
             console.log("arg1:", retorno) ;
             console.log("arg2:", textStatus) ;
             console.log("arg3:", jqXHR) ;
             if( parseInt($.fn.jquery) === 1 ) {
                 switch (retorno.status) {
                    case 0:
                    case 200:
                        console.log("exit OK");
                        console.log(JSON.stringify(retorno.responseJSON));
                        break;
                    case 404:
                        console.log("exit by FAIL");
                        console.log(JSON.stringify(retorno.responseJSON));
                        break;
                    default:
                        console.log("default switch happened") ;
                        console.log(JSON.stringify(retorno.responseJSON));
                        break ;
                 }
             }
             if( (parseInt($.fn.jquery) === 2) && (textStatus === "success") ) {
                 switch (jqXHR.status) {
                    case 0:
                    case 200:
                        console.log("exit OK");
                        console.log(JSON.stringify(jqXHR.responseJSON));
                        break;
                    case 404:
                        console.log("exit by FAIL");
                        console.log(JSON.stringify(jqXHR.responseJSON));
                        break;
                    default:
                        console.log("default switch happened") ;
                        console.log(JSON.stringify(jqXHR.responseJSON));
                        break ;
                 }
             }
             else {
                console.log("unknown") ;
             }
         });
     });
 }

What do the data-uib and data-ver properties do in an App Designer project?

App Designer adds the data-uib and data-ver properties to many of the UI elements it creates. These property names only appear in the index.html file on various UI elements. There are other similar data properties, like data-sm, that only are required when you are using a service method.

The data-uib and data-ver properties are used only by App Designer. They are not needed by the UI frameworks supported by App Designer; they are used by App Designer to correctly display and apply widget properties when you are operating in the "design" view within App Designer. These properties are not critical to the functioning of your app; however, removing them will cause problems with the "design" view of App Designer.

The data-sm property is inserted by App Designer, and it may be used by data_support.js, along with other support libraries. The data-sm property is relevant to the proper functioning of your app.

Unable to select App Designer UI option when I create a new App Designer project.

If you previously created an App Designer project named 'ui-test' that you then delete and then create another App Designer project using the same name (e.g., 'ui-test'), you will not be given the option to select the UI framework for the new project named 'ui-test.' This is because the Intel XDK remembers a framework name for each project name that has been used and does not delete that entry from the global-settings.xdk file when you delete a project (e.g. if you chose "Framework 7" the first time you created an App Designer project with the name 'ui-test' then deleting 'ui-test' and creating a new 'ui-test' will result in another "Framework 7" project).

Because the UI framework name is not removed from the global-settings.xdk file when you delete the project, you must either use a new unique project name or edit the global-settings.xdk file to delete that old UI framework association. This is a bug that has been reported, but has not been fixed. Following is a workaround:

"FILE-/C/Users/xxx/Downloads/pkg/ui-test/www/index.html": {"canvas_width": 320,"canvas_height": 480,"framework": "framework 7"
}
  • Remove the last line ("framework": "framework 7") from the JSON object (remember to remove the comma at the end of the preceding line or you won't have a proper JSON file and your global-settings.xdk file will be considered corrupt).
  • Save and close the global-settings.xdk file.
  • Launch the Intel XDK.
  • Create a new project with old name you are reusing.

You should now see the list of App Designer framework UI selection options when you create the new project with a previously used project name that you have deleted.

Back to FAQs Main

Flylab : Drones as a Service - Success Story

$
0
0

FLYLAB, Drones as a Service

FLYLAB

Challenge

At the beginning, civilian drones were equipped with micro-controllers which could control simple flight but they lacked the computing power for the artificial intelligence necessary for autonomous flight. The first challenge was to imagine a solution to load significant computing power in the drones. The second was to create a simple-to-use software framework for developers to create customized applications.

 

Solution

Collaborating with Intel® Software, Flylab took the best from Intel's IoT (Internet of Things) platform not just in simple prototyping but also in rapid production. This enabled us to quickly develop a product with the innovation of a startup and the reliability of an industry leader.

Intel's hardware platforms and software are easily accessible, but it was the collaboration which provided us with expert advice, assistance in rapidly prototyping and promoting us with invitations to international trade fairs.

Learn more

software.Intel®.com/iot

www.flylab.io

STORY

Flylab is a young startup based in Paris. Our venture began three years ago with the creation of the first European drone fablab (digital fabrication laboratory). « We are rapidly moving from the status of expert enthusiasts to building an elite team around the the three pillars of the drone : the hardware, the software and the aeronautics, » explained Hakim AMRANI-MONTANELLI, the founder andCEO of Flylab. «We realised early on that the drone was not yet reliable ; it was severely lacking in computing power and didn't allow us to develop apps as easily as on a computer. Years down the track and hundreds of custom-made drones and prototypes later, we decided to develop our own flight controller which rapidly evolved into a flight computer. »

The whole team set about analysing the various technical options designed to enhance flight controllers and drones in general but at the time these options were rather limited. Based what was available in the market, we decided to choose Intel® and in particular the IoT Intel® Edison platform.

This platform enabled Flylab to develop a controller capable of reacting quickly with all the functionality of an ordinary computer, but in a computer no larger than the size of a hand ! Intel® Software was very welcoming and readily available. It was from this constructive collaboration that Flylab designed and developed  its own flight computer with the enormous potential of the Intel® Edison platform.

Having achieved this, we finally discovered how to increase our on-board computing power. Flylab drones were now quick and agile. However we wanted to go further.

During our discussions with the Intel® teams, we soon learnt of Intel’s RealSense technology. This endowed our future drones not only with brains but gave them eyes. All this can be neatly fitted into a compact format with two Intel® RealSense cameras, the F200 for short-range vision and the R200 for long-range vision.

The creative synergies of our 8 team members produced an environment conducive to the development of drones. The project ran for a couple of months and comprised a number of phases over which we developed multiple versions of our framework. Flylab then commenced the test phase in which we further explored the range of Intel® technology. After adopting Intel® Edison, we also took onboard Intel® NUC (Next Unit Computing) which offers even more powerful computing, necessary in certain cases. Thanks to our software development platform for autonomous flying robots, we found our first clients. We were now able to create with them customized solutions to serve their requirements.

Taking the time to listen and understand a client enables us to tailor the hardware and software to meet their requirements. This is imperative with partners such as the SNCF (French National Railways), the SAMU (French Emergency Medical Aid Service) and the French Fire and Rescue Service.

In addition to Intel’s technical and software development support, they also accompanied us to international trade fairs such as the 2015 IFA (Global Trade Fair for the Digital World) in Berlin and 2015's Maker Faire in Rome. Being present and visible at these major events gave us the opportunity to promote our products and find new clients. The backing of Intel® with its technical support and co-marketing opportunities instantly accelerated the growth of our startup.

When Flylab started, drones were low-tech machines with a tendency to break down and were essentially piloted by humans. Flylab immediately foresaw the evolution of the drone into potentially autonomous flying robots.

« Drones were hardly reliable, so our team turned its vision to the future. To change how drones functioned, we decided to place our stakes on a large-scale developer outreach for new ideas and share our platform as widely as possible, » explains Hakim AMRANI-MONTANELLI, CEO of Flylab. This is how our company helped unite a community of expertise in drone technology. We had two objectives : « We wanted to transform these banal automatons into real autonomous robots. We also wanted to be able to load our technology onto everything from small-sized drones to large aircraft by concentrating on innovation in « machine learning », « computer vision » and AI. »

Flylab created a system capable of operating in a number of programming languages in either JavaScript or in Python to cater for the needs of our clients and the various utilisations of the system. This is the first step as our company will continue to support other languages. Our software architecture was constructed around Unix for security, flexibility and transparent integration for both users and developers. We designated our OS to give it exceptional capacities. Our OS comprises a series of software layers from low-level demons to communication systems right up to user code via the Flylab SDK.

The « flykit » layer facilitates communication between the layers of the system. The Flykit API passes data and coordinates each process in the runtime environment from system agent to user code. FlyKit is responsible for effective system hardware communication. Simply put, each running process is separated from the others and tested via a sandbox. To prevent interested third party programs, making drones fly is all about telemetry and commands from the ground or the Internet. FlyKit is entirely written in C but the application is also accessible via C++,  Node.js or Python.

With Intel® technology, we were able to start in maker mode and then make the smooth transition to professional solutions with compatibility in technology and hardware.

Flylab is in the process of tendering for a contract with the French Railway Company. At present, the company uses drones which can only be operated by pilots with advanced theoretical expertise. As a solution, our company has created a number of drone prototypes which will be simple to use and able to be produced in series.

We were also able to rapidly devise a custom-made drone with our platform and Intel's technology for the French Fire and Rescue Services. This is a drone which can save time when each second is critical to save lives !

« With this superior technology, our drones are becoming increasingly intelligent. It will not be long before their senses are developed. They will have the capability to interact with their environment. And it may well be our solution which loads on small and medium-sized systems which will become the heart and brain of the flying cars of the future, » projects Hakim AMRANI-MONTANELLI.

 

Intel® Software Guard Extensions Tutorial Series: Part 1, Intel® SGX Foundation

$
0
0

The first part in the Intel® Software Guard Extensions (Intel® SGX) tutorial series is a brief overview of the technology. For more detailed information, see the documentation provided in the Intel Software Guard Extensions SDK. Find the list of all the tutorials in this series in the article Introducing the Intel® Software Guard Extensions Tutorial Series.

Understanding Intel® Software Guard Extensions Technology

Software applications frequently need to work with private information such as passwords, account numbers, financial information, encryption keys, and health records. This sensitive data is intended to be accessed only by the designated recipient. In Intel SGX terminology, this private information is referred to as an application’s secrets.

The operating system’s job is to enforce security policy on the computer system so that these secrets are not unintentionally exposed to other users and applications. The OS will prevent a user from accessing another user’s files (unless permission to do so has been explicitly granted), one application from accessing another application’s memory, and an unprivileged user from access OS resources except through tightly controlled interfaces. Applications often employ additional safeguards, such as data encryption, to ensure that data sent to storage or over a network connection cannot be accessed by third parties even if the OS and hardware are compromised.

Despite these protections, there is still a significant vulnerability present in most computer systems: while there are numerous  guards in place that protect one application from another, and the OS from an unprivileged user, an application has virtually no protection from processes running with higher privileges, including the OS itself. Malware that obtains administrative privileges has unrestricted access to all system resources and all applications running on the system. Sophisticated malware can target an application’s protection schemes to extract encryption keys and even the secret data itself directly from memory.

To enable the high-level protection of secrets and help defend against these software attacks, Intel designed Intel SGX. Intel SGX is a set of CPU instructions that enable applications to create enclaves: protected areas in the application’s address space that provide confidentiality and integrity even in the presence of privileged malware. Enclave code is enabled by using special instructions, and it is built and loaded as a Windows* Dynamic Link Library (DLL) file.

Intel SGX can reduce the attack surface of an application. Figure 1 demonstrates the dramatic difference between attack surfaces with and without the help of Intel SGX enclaves.

Attack-surface areas with and without Intel® Software Guard Extensions enclaves.

Figure 1: Attack-surface areas with and without Intel® Software Guard Extensions enclaves.

How Intel Software Guard Extensions Technology Helps Secure Data

Intel SGX offers the following protections from known hardware and software attacks:

  • Enclave memory cannot be read or written from outside the enclave regardless of the current privilege level and CPU mode.
  • Production enclaves cannot be debugged by software or hardware debuggers. (An enclave can be created with a debug attribute that allows a special debugger—the Intel SGX debugger—to view its content like a standard debugger. This is intended to aid the software development cycle.)
  • The enclave environment cannot be entered through classic function calls, jumps, register manipulation, or stack manipulation. The only way to call an enclave function is through a new instruction that performs several protection checks.
  • Enclave memory is encrypted using industry-standard encryption algorithms with replay protection. Tapping the memory or connecting the DRAM modules to another system will yield only encrypted data (see Figure 2).
  • The memory encryption key randomly changes every power cycle (for example, at boot time, and when resuming from sleep and hibernation states). The key is stored within the CPU and is not accessible.
  • Data isolated within enclaves can only be accessed by code that shares the enclave.

There is a hard limit on the size of the protected memory, set by the system BIOS, and typical values are 64 MB and 128 MB. Some system providers may make this limit a configurable option within their BIOS setup. Depending on the footprint of each enclave, you can expect that between 5 and 20 enclaves can simultaneously reside in memory.

How Intel® Software Guard Extensions helps secure enclave data in protected applications.

Figure 2: How Intel® Software Guard Extensions helps secure enclave data in protected applications.

Design Considerations

Application design with Intel SGX requires that the application be divided into two components (see Figure 3):

  • Trusted component. This is the enclave. The code that resides in the trusted code is the code that accesses an application’s secrets. An application can have more than one trusted component/enclave.
  • Untrusted component. This is the rest of the application and any of its modules. It is important to note that, from the standpoint of an enclave, the OS and the VMM are considered untrusted components.

The trusted component should be as small as possible, limited to the data that needs the most protection and those operations that must act directly on it. A large enclave with a complex interface doesn’t just consume more protected memory: it also creates a larger attack surface.

Enclaves should also have minimal trusted-untrusted component interaction. While enclaves can leave the protected memory region and call functions in the untrusted component (through the use of a special instruction), limiting these dependencies will strengthen the enclave against attack.

Intel® Software Guard Extensions application execution flow.

Figure 3: Intel® Software Guard Extensions application execution flow.

Attestation

In the Intel SGX architecture, attestation refers to the process of demonstrating that a specific enclave was established on a platform. There are two attestation mechanisms:

  • Local attestation occurs when two enclaves on the same platform authenticate to each other.
  • Remote attestation occurs when an enclave gains the trust of a remote provider.

Local Attestation

Local attestation is useful when applications have more than one enclave that need to work together to accomplish a task or when two separate applications must communicate data between enclaves. Each enclave must verify the other in order to confirm that they are both trustworthy. Once that is done, they establish a protected session and use an ECDH Key Exchange to share a session key. That session key can be used to encrypt the data that must be shared between the two enclaves.

Because one enclave cannot access another enclave’s protected memory space, even when running under the same application, all pointers must be dereferenced to their values and copied, and the complete data set must be marshaled from one enclave to the other.

Remote Attestation

With remote attestation, a combination of Intel SGX software and platform hardware is used to generate a quote that is sent to a third-party server to establish trust. The software includes the application’s enclave, and the Quoting Enclave (QE) and Provisioning Enclave (PvE), both of which are provided by Intel. The attestation hardware is the Intel SGX-enabled CPU. A digest of the software information is combined with a platform-unique asymmetric key from the hardware to generate the quote, which is sent to a remote server over an authenticated channel. If the remote server determines that the enclave was properly instantiated and is running on a genuine Intel SGX-capable processor, it can now trust the enclave and choose to provision secrets to it over the authenticated channel.

Sealing Data

Sealing data is the process of encrypting it so that it can be written to untrusted memory or storage without revealing its contents. The data can be read back in by the enclave at a later date and unsealed (decrypted). The encryption keys are derived internally on demand and are not exposed to the enclave.

There are two methods of sealing data:

  • Enclave Identity. This method produces a key that is unique to this exact enclave.
  • Sealing Identity. This method produces a key that is based on the identity of the enclave’s Sealing Authority. Multiple enclaves from the same signing authority can derive the same key.

Sealing to the Enclave Identity

When sealing to the Enclave Identity, the key is unique to the particular enclave that sealed the data and any change to the enclave that impacts its signature will result in a new key. With this method, data sealed by one version of an enclave is inaccessible by other versions of the enclave, so a side effect of this approach is that sealed data cannot be migrated to newer versions of the application and its enclave. This is intended for applications where old, sealed data should not be used by newer versions of the application.

Sealing to the Sealing Identity

When sealing to the sealing identity, multiple enclaves from the same authority can transparently seal and unseal each other’s data. This allows data from one version of an enclave to be migrated to another, or to be shared among applications from the same software vendor.

If older versions of the software and enclave need to be prevented from accessing data that is sealed by newer application versions, the authority can choose to include a Software Version Number (SVN) when signing the enclave. Enclave versions older than the specified SVN will not be able to derive the sealing key and thus will be prevented from unsealing the data.

How We’ll Use Intel Software Guard Extensions Technology in the Tutorial

We’ve described the three key components of Intel SGX: enclaves, attestation, and sealing. For this tutorial, we’ll focus on implementing enclaves since they are at the core of Intel SGX. You can’t do attestation or sealing without establishing an enclave in the first place. This will also keep the tutorial to a manageable size.

Coming Up Next

Part 2 of the tutorial will focus on the password manager application that we’ll be building and enabling for Intel SGX. We’ll cover the design requirements, constraints, and the user interface. Stay tuned!

 

Implementing User Experience Guidelines in Intel® RealSense™ Applications

$
0
0

Download sample application ›

Introduction

User experience (UX) guidelines exist for the implementation of Intel® RealSense™ technology in applications. However, these guidelines are hard to visualize for four main reasons: (a) You have to interpret end-user interaction in a non-tactile environment during the application design phase where you don’t yet have a prototype for end-user testing, (b) the application could be used on different form factors like laptops and All-In-Ones where the Field-of-View (FOV) and user placement for interaction are different, (c) you have to work with the different fidelities and FOVs of a color and depth camera, and (d) different Intel® RealSense™ SDK modalities have different UX requirements. Having a real-time feedback mechanism to gauge this impact is therefore critical. In this article, we cover an application that is developed for the use of Intel® RealSense™ application developers to help visualize the UX requirements and implement these guidelines in code. The source code for the application is available for download through this article.

The Application

The application works for the user-facing cameras only. Both the F200 and SR300 cameras are covered in the scope of the application. Provision is made to seamlessly switch between the two cameras within the application. If using the F200 camera, the application works on Windows* 8 or Windows® 10. However, if using the SR300 camera, the application requires Windows 10.

There are two windows within the application. One window provides the real-time camera feed where the user can interact. This section also provides visual indicators, which are analogous to visual feedback you will provide in your application. In each of the scenarios below, we call out the visual feedback that has been implemented. The other window provides the code snippets that are required to implement a specific UX scenario. In the sections below, I will walk you through the scenarios covered. WPF is the framework used for development.

Version of the Intel RealSense SDK: Build 8.0.24.6528

Version of Intel® RealSense™ Depth Camera Manager (DCM) (F200): Version 1.4.27.52404

Version of the Intel RealSense Depth Camera Manager (SR300): Version 3.1.25.2599

The application is expected to work on more recent versions of the SDK but has been validated on the version above.

Scenarios

General scenarios

Depth and RGB resolution

The RGB and the depth cameras support different resolutions and have different aspect ratios. Different modalities also have different resolution requirements for each of these cameras.

Below is a snapshot of the stream resolution and frame rate as indicated in the SDK documentation on working with multiple modalities:

The UX problem:

How do I know which areas of the screen real estate should be used for 3D interactions and which ones for UI placement? How can I indicate to the end user visually or through auditory feedback when they have moved out of the interaction zone?

Implementation:

The application uses the SDK API (mentioned below) to obtain the color and depth resolution data for each modality and weaves the depth map over the color map to show superimposing areas. Within the camera feed window, look for the yellow boundary that indicates the space that overlaps the color and depth map. This is your visual feedback. From a UX perspective, you can now visually identify areas of the screen that have to be used for FOV 3D interactions as opposed to UI element placements. Experiment by selecting the different modalities in the first column and choosing from available color and depth resolutions to understand the implications of RGB to depth mapping for your desired usage. The snapshots below show some of the examples of how this overlap changes with the change in inputs.

Example using both depth and color:

Experiment with how the mapping changes as the user switches between different color and depth resolutions. Also choose other modalities that use both depth and RGB to see how the supported color and depth resolution lists change.

Example using only depth:

An example where this is handy is when you are using the hand skeletal tracking. You do not need the color camera for this use case; however, you can switch between the available depth resolutions to see how the screen mapping changes.

Example using only color:

If your application is restricted to using only facial detection, 2D capability will suffice as all you need is the bounding box for the faces. However, if you need the 78 landmarks, you will need to switch to using the 3D example.

The sample application available for download from this article walks through the code required to implement this in your application. As a highlight, the two APIs you will need to create the depth and color resolution iterative lists for each modality are PXCMDevice.QueryCaptureProfile (int i) and the PXCMVideoModule.QueryCaptureProfile(). However, for the visual representation of how the two maps overlap, you will have to use the Projection interface. We know that each pixel has a color and a depth value associated with it. In order to apply the overlap of the depth map on the color map, in this example, we just choose only one value of depth. In order to implement this, the application uses the blob module. The workaround uses the closest blob to the camera (say your hand) and maps the center of this blob (observable as a cyan dot on the screen). The depth value of this pixel is then used as a single depth value to map the depth map to the color map.

Optimal Lighting

The Intel RealSense SDK does not provide any direct API to identify the lighting situation in the environment where the camera is operating. Bad lighting can result in a lot of noise within the color data.

The UX problem:

From within an application, it would be nice to provide visual feedback to the user asking them to move to the ideal lighting environment. Within the application, watch how the camera feed provides the current luminance value displayed on the screen.

Implementation:

The application uses the RGB values and applies the log average luminance to identify the lighting conditions. More information on the use of log average luminance could be found here.

The formula used to identify the log average luminance value for each pixel is:

L = 0.27R + 0.67G + 0.06B;

The values range from 0 for pitch black to 1 for very bright light. We do not define a threshold in this sample because this is something the developers would have to experiment with. Some factors that could affect luminance values is backlight, black clothing (resulting in many pixels giving a close to 0 rating, thus bringing down the average value), outdoor versus indoor lighting conditions, and so on.

Since we have to perform this calculation per pixel in each frame of data, this is a compute-intensive operation. The application shows how to implement this computation using the GPU for optimal performance.

Raw Streams

The Intel RealSense SDK provides APIs for capturing the color and depth streams. However, in some cases, it may be necessary to capture the raw streams to perform low-level computation. The Intel RealSense SDK provides C++ API with .NET wrappers. This means that the memory containing the images live in unmanaged memory. This is non-optimal when displaying images in WPF.

One way to work through this is using the PXCMImage.ToBitmap() API to create an unmanaged HBITMAP wrapped around the image data and use System.Windows.Interop.Imaging.CreateBitmapSourceFromHBitmap() to copy the data into the managed heap and then wrap a WPF BitmapSource object around it.

The UX problem:

The problem with the above-mentioned approach is that the YUY2-> RGB conversion is done on the CPU following which we have to handle an unmanaged to managed memory copy for the image data. This slows down the process a lot and could result in lost data and jittery displays.

The Implementation:

The application shows an alternate implementation using the Direct3D* Image Source introduced in Service Pack 1 of the Microsoft .NET framework version 3, which allows arbitrary DirectX* 9 surfaces to be included in WPF. We implement an unmanaged DirectX library to do the color conversion to display on the GPU. This approach also allows for the GPU accelerated image processing via Pixel Shaders for any custom manipulation needed (example: processing depth image data). The snapshot below shows the raw color, IR and depth streams, and the depth images as shown by the custom shader.

Facial Recognition

One of the most commonly used modalities within the Intel RealSense SDK is the face module. This module allows recognizing up to four people in the FOV while also providing 78 landmark points for each face. Using these data points, it is possible to integrate a facial recognition implementation within applications. Windows Hello* in the Windows 10 OS uses these landmarks to identify templates that can be used to identify people at login. More information on how Windows Hello works can be found here. In this application, we focus on some of the UX issues around this module and how to provide visual feedback to correct end-user interaction for better UX.

The UX problem:

The most prominent UX challenge comes from the fact that your end users may not understand where the FOV of the camera is. They may be completely outside this frustrum or be too far away from the computer, thus being out of range. The Intel RealSense SDK provides many alerts to capture these scenarios. However, implementing these to provide visual feedback to the end user when out of the FOV is critical. In the application, when the end user is in the FOV and in the allowed range, a green bounding box is provided indicating you are within the interaction zone. Experiment with moving your head toward the edges of your computer or by moving farther away—you will notice a red bounding box appear as soon as the camera loses face data.

The implementation:

The Intel RealSense SDK provides the following alerts for effectively handling user errors: ALERT_FACE_OUT_OF_FOV, ALERT_FACE_OCCLUDED, ALERT_FACE_LOST. For more information on alerts, refer to the PXCMFaceModule. The application uses a simple ViewModel architecture to capture the errors and act on them in the XAML code.

Immersive Collaboration

Imagine a photo booth setup where you are trying to obtain a background segmented image of yourself. As mentioned in the Depth and RGB scenario above, the range for each of the Intel RealSense modalities is different. So how do we indicate to the end user what the optimal range for the 3D camera is, so they can position themselves accordingly within the FOV?

The UX problem:

As with the facial detection scenario, providing a visual indicator to the end user when they move in and out of range is important. In this application, note that the slider is set to the optimal range for the camera FOV for 3D segmentation (indicated in green). To identify the lowest minimal range, move the left slider toward the end with the picture of the camera. Note how the pixels turn white. On the other hand, if I want to identify the maximum optimal range, I move the right slider toward the right. Beyond the optimal point, the pixel are pigmented red. The range in between the two sliders now provides the optimal range for segmentation.

Take a look at the last image for a second. You notice another UX issue when using BGS. As I move closer to the background, in this case, the chair, the 3D segmentation module creates one blob from the foreground as well as the background object. You will also notice this in cases where you have a black background and are wearing a black shirt. Identifying depth with uniform pixels is hard. We do not address that scenario in this application, but we want to mention this UX challenge as something to be aware of.

The implementation:

The 3D segmentation module provides alerts to handle UX scenarios. Some of the important alerts we implement here are: ALERT_USER_IN_RANGE, ALERT_USER_TOO_CLOSE, and ALERT_USER_TOO_FAR. The application implements these alerts to provide the pigmentation as well as textual feedback to indicate when the user is too close or too far.

3D Scanning

The 3D scanning module for front-facing cameras provide for scanning the face and small objects. In this application, we will use the face scanning example to demonstrate some of the UX challenges and how to provide an implementation in code to add visual and auditory feedback.

The UX problem:

One of the key challenges in getting a good scan involves detecting the scan area. Usually this gets locked after a few seconds after the scan begins. Here is a snapshot of the region the camera needs to detect for a good scan:

If the user cannot determine the correct scan area, the scan module fails. An example scenario of how things could go wrong: While scanning a face, the user is required to face the camera until the camera detects the face, then turn “slowly” to the left and through the center to the right. Providing visual feedback in the form of a bounding box for the face when the user is within the camera FOV is therefore important when looking at the screen. Note that this is feedback that is required before we can even start the scan. Once the scan begins, when the user turns to the left or to the right, the user cannot see the screen and hence visual feedback is useless. In the sample application, we build both visual and audio feedback to assist with this scenario.

The implementation:

The PXCM3DScan module incorporates the following alerts: ALERT_IN_RANGE, ALERT_TOO_CLOSE, ALERT_TOO_FAR, ALERT_TRACKING, and ALERT_TRACKING_LOST. Within the application, we capture these alerts asynchronously and provide both the visual and audio feedback as necessary. Here is a snapshot of the application capturing the alerts and providing feedback.

Visual feedback before starting the scan and while the scan is in progress:

Note that in this example, we are not demonstrating how you can save the mesh and render it. You can learn more about the specifics of implementing the 3D scan module in your apps through the SDK API documentation.

Summary

The use of Intel RealSense technology in applications poses many UX challenges, both from the perspective of understanding non-tactile feedback and how end users use and interpret the technology. Through a real-time demonstration of some of the UX challenges and code snippets showing potential ways to address those challenges, we hope this application will help developers and UI designers gain a better understanding of Intel RealSense technology.

Additional Resources

Designing Apps for Intel® RealSense™ Technology – User Experience Guidelines with Examples for Windows*

UX Best Practices for Intel® RealSense™ Camera (User Facing) - Technical Tips

Why Apps Fail? And What You Can Do to Succeed

$
0
0

It only takes a quick visit to any app store to see how many apps are out there—and to imagine how many of them will inevitably fail. With literally thousands of apps in the market, and new ones being produced every day, most of them won’t survive. Whether you’re just getting started, or your app is already well underway, considering why apps might fail can be critical to your own success—and can help you avoid making the same mistakes.

Patrick DeFreitas has been at Intel for nearly twelve years, and is responsible for helping Intel understand what makes apps successful, as well as app targeting for gaming and PC. That means he’s kept his finger on the pulse of publishers and titles who are doing it right. We talked to him to get his thoughts on common mistakes that cause most apps to fail and what developers can do to prevent them. Read on to learn more about the types of apps that make it, the major trends to be aware of, and how to maintain success once you have it.

The Busy, Cluttered Marketplace

No app is created or marketed in a vacuum, so it’s important to first understand the overall marketplace and the trends that may have an impact on your success.

  • Way too many choices. Think back to that app store. There are only a handful of categories, but within each, thousands upon thousands of choices. It can be hard to make sure your app will even be found in the first place. How will you stand out? (Take a look at this article on how to maximize your app store listing for more insight.)

     
  • Hard to find quality. In a traditional store, customers know the products on the shelf have been selected to meet a certain standard of quality. That’s not true in an app store, and as a result, consumer behavior has changed. Because they don’t know what they’ll find, consumers tend to download widely, take a quick look, and then delete. Or they’ll download an app and never look at it again. With such a range of quality out there, even gems can remain undiscovered.

     
  • Lack of developer support. On the flip side, many of the apps out there aren’t treated like proper products by their own developers—which means crashes aren't fixed and no one responds to consumer concerns. Or they try to go for quantity over quality, making as many apps as possible to see what sticks, which only further erodes consumer trust and makes the problem worse.

Look to the Winners!

There are success stories of all types, but there are a few categories of apps that DeFreitas has seen succeed time and time again. If you’re still in the early idea-generation phase, consider directing your efforts to one of these categories.

  • Quick and Easy Games – Simple and addictive is key. Games that can easily be picked up and played by a wide variety of consumers—kids and grandparents alike.
     
  • Utilities and Securities – Apps designed to make your device function better, like extending battery life or closing apps that aren’t in use. Phones and tablets are still in their infancy, and consumers are always looking for ways to use them better.
     
  • Functional Apps – Apps that satisfy a specific need, and do it well, such as flashlights, cameras, and content-aggregating apps. If you can identify a need, you can tap into a new market.

Clues that an App Is on the Right Track

When you look at an app store listing, there are a few simple clues that can tell you whether an app is headed for success.

  • Is the developer responding to reviews? Are they fixing problems and responding to customers? The more responsive they are, and the more they’re engaging with their target audience, the more aligned they can be with what their audience needs and wants from their app.
  • Are they expanding their service offering? As a developer, this is one of the key ways to support an app as a product and grow the business. If they’re introducing new characters, new levels, and new rules—or new services, such as communication, security, and privacy—chances are good that they’re on the right track.

Grow, Grow, Grow—And Maintain Your Success

Once your app has launched, and you’ve achieved a certain degree of success, the work isn’t over. There are some important steps you can take to keep growing your business and keep engaging your customers.

  • Introduce or reintroduce content. Reward your customers’ time and interest by giving them new compelling content, such as new levels to unlock or new features and capabilities that can improve or enhance the experience.
     
  • Stay true to your values. Build on the app you have, and expand your content or service offering, but remember to stay true to the values and core that drove you to build this particular app and attract your customer base in the first place. Don’t fall prey to a scattershot approach—there’s no longevity in that.
     
  • Listen to your customers. Pay attention to reviews and ratings. If your customers are taking the time to give you feedback, make sure you’re listening. Address crashes and consider new features that are being requested. That doesn’t mean you’ll do everything they ask, but it does mean that you’re paying attention and respecting their input. Simply responding to reviews (in a respectful way!) can go a long way.
     
  • Establish a true brand. Incredibly important in such a large marketplace. Be thoughtful about the product you’re offering, and about the associated content as well, such as your logo and marketing materials. Does everything feel like it’s part of the same package? Does it represent the core values of your product?
    • Start simple. Don’t try to solve every problem at once.
       
    • Be original.  Get ahead of the trends. 
       
    • Treat what you’re creating as a product. That means you develop your app with a long-term goal in mind, you support it, and you have plans to continue and manage growth.
       
    • Focus on your customers. No one operates in a vacuum, and happy customers are your best asset. 
  • Look for co-sponsorships.  Are there opportunities to partner with a bigger brand? Look to vehicles outside of the app world, too. This can allow you to effectively reach new markets you may not otherwise have access to.

You’re Ready—But Is Your Product?

If you have a finished app and you’re ready to go, make sure your product is ready, too. You’ll need a web presence and social media plan.

  • Website
  • Facebook
  • Email
  • Twitter

Remember: You Have a Product, Not Just an Idea

Treating your app like a product is one of the most important things you can do. As DeFreitas says, “Having the mindset that you have a product—and you believe in the services, or the experience that your product brings into the marketplace—that’s the foundation.” Nurture what you’ve built, and to listen to your customers. There are many apps out there, and many of them fail. But if you treat your app like the product that it is, and stay true to your core, and listen to the people who are buying and using it—yours can be one of the apps that thrives.

Viewing all 3384 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>