Quantcast
Channel: Intel Developer Zone Articles
Viewing all 3384 articles
Browse latest View live

Set Up SaffronArchitect Staging Database Credentials

$
0
0

Prerequisite:
You must know the MySQL staging database account username and password for the application account intended to access the staging database table.

  1. From the machine running the SaffronArchitect container, run the following command, replacing <saffron-architect-container-name-or-id> with the actual name or id of the container. Follow the prompts and enter the information requested.
     -docker exec -it <saffron-architect-container-name-or-id> bin/setStagingDatabaseCredentials.sh
  2. Restart the container.
    -docker restart <saffron-architect-container-name-or-id>

Example Success Output

INFO com.saffrontech.websdk.cli.DatabaseConnectivityTester [main] Using credentials to test connection to jdbc:mysql://mysqlhost.mycorp.com:3306/IDR

INFO com.saffrontech.websdk.cli.DatabaseConnectivityTester [main] Connected successfully!

INFO com.saffrontech.websdk.cli.StagingDatabaseCredentialsSetup [main] Credentials set

Next Step

Set up Saffron Admin.

Previous Step

Prime SaffronMemoryBase and the Thought Process Engine (TPE).

Related topics

Installing and Deploying SaffronArchitect


Load the SaffronArchitect Docker Image

Intel® RealSense™ SDK Web Component Uninstall Instructions

$
0
0

Intel has made the decision to discontinue marketing and development of Intel® RealSense™ Web Component and related samples and applications. Other components, specifically Intel® RealSense™ SDK versions 2016-R2 and 2016-R3, are not impacted by this decision.

Because this application contains a security vulnerability in one of its dependencies, it is recommended that all customers remove the components from their systems.

Intel PSIRT published security advisory INTEL-SA-00066 found at www.intel.com/security with details regarding the impact to affected systems and announcing that Intel® RealSense™ Web component is no longer supported.

This notice and related actions are effective February 1st, 2017.

Removal instructions for Windows® 10:

  1. Use the Microsoft Windows Settings or Search to navigate to "System -> Apps & Features"
  2. Remove two components
    1. Intel(R) Technology Access
    2. Intel® RealSense™ SDK Web Runtime

Removal instructions for Windows* 7:

  1. Go to Windows Start Menu -> Control Panel -> Programs and Features
  2. Remove two components from the installed Programs list
    1. Intel® RealSense™ SDK Web Runtime
    2. Intel(R) Technology Access

Recipe: Building and Running MILC on Intel® Xeon® Processors and Intel® Xeon Phi™ Processors

$
0
0

Introduction

MILC software represents a set of codes written by the MIMD Lattice Computation (MILC) collaboration used to study quantum chromodynamics (QCD), the theory of the strong interactions of subatomic physics. It performs simulations of four-dimensional SU lattice gauge theory on MIMD (Multiple Instruction, Multiple Data) parallel machines. “Strong interactions” are responsible for binding quarks into protons and neutrons and holding them all together in the atomic nucleus. MILC applications address fundamental questions in high energy and nuclear physics, and is directly related to major experimental programs in these fields. MILC is one of the largest compute cycle users at many U.S. and European supercomputing centers.

This article provides instructions for code access, build, and run directions for the “ks_imp_rhmc” application on Intel® Xeon® processors and Intel® Xeon Phi™ processors. The “ks_imp_rhmc” is a dynamical RHMC (rational hybrid Monte Carlo algorithm) code for staggered fermions. In addition to the naive and asqtad staggered actions, the highly improved staggered quark (HISQ) action is also supported.

Currently, the conjugate gradient (CG) solver in the code uses the QPhiX library. Efforts are ongoing to integrate other operations (gauge force (GF), fermion force (FF)) with the QPhiX library as well.

The QPhiX library provides sparse solvers and Dslash kernels for Lattice QCD simulations optimized for Intel® architectures.

Code Access

The MILC Software and QPhiX library are primarily required. The MILC software can be downloaded from GitHub here: https://github.com/milc-qcd/milc_qcd. Download the master branch. QPhiX support is integrated into this branch for CG solvers.

The QPhiX library and code generator for use with Wilson-Clover fermions (for example, for use with chroma) are available from https://github.com/jeffersonlab/qphix.git and https://github.com/jeffersonlab/qphix-codegen.git, respectively. For the most up to date version, we suggest you use the devel branch of QPhiX. The MILC version is currently not open source. Please contact the MILC collaboration group for access to the QPhiX (MILC) branch.

Build Directions

Compile the QPhiX Library

Users need to build QPhiX first before building the MILC package.

The QPhiX library will have two tar files, mbench*.tar and qphix-codegen*.tar.

Untar the above.

Build qphix-codgen

The files with intrinsics for QPhiX are built in the qphix-codegen directory.

Enter the qphix-codegen directory.

Edit line #3 in “Makefile_xyzt”, enable “milc=1” variable.

Compile as:

source /opt/intel/compiler/<version>/bin/compilervars.sh intel64
source /opt/intel/impi/<version>/mpi/intel64/bin/mpivars.sh
make –f Makefile_xyzt avx512 -- [for Intel® Xeon Phi™ Processor]
make –f avx2 -- [for Intel® Xeon® v3 Processors /Intel® Xeon® v4 Processor]

Build mbench

Enter the mbench directory.

Edit line #3 in “Makefile_qphixlib”, set “mode=mic” to compile with Intel® AVX-512 for Intel® Xeon Phi™ Processor and “mode=avx” to compile with Intel® Advanced Vector Extensions 2 (Intel® AVX2) for Intel® Xeon® Processors.

Edit line #13 in “Makefile_qphixlib” to enable MPI. Set ENABLE_MPI = 1.

Compile as:

make -f Makefile_qphixlib mode=mic AVX512=1 -- [Intel® Xeon Phi™ Processor]
make -f Makefile_qphixlib mode=avx AVX2=1 -- [Intel® Xeon® Processors]

Compile MILC Code

Install/download the master branch from the above GitHub location.

Download the Makefile.qphix file from the following location:

http://denali.physics.indiana.edu/~sg/MILC_Performance_Recipe/.

Copy the Makefile.qphix to the corresponding application directory. In this case, copy the Makefile.qphix to the “ks_imp_rhmc” application directory and rename it as Makefile.

Make the following changes to the Makefile:

  • On line #17 - Add/uncomment the appropriate ARCH variable:
    • For example, ARCH = knl (compile with Intel AVX-512 for Intel® Xeon Phi™ Processor architecture).
    • For example, ARCH = bdw (compile with Intel AVX2 for Intel® Xeon® Processor architecture).
  • On line #28 - Change MPP variable to “true” if you want MPI.
  • On line #34 - Pick the PRECISION you want:
    • 1 = Single, 2 = Double. We use Double for our runs.
  • Starting line #37 - Compiler is set up and this should work:
    •  If directions above were followed. If not, customize starting at line #40.
  • On line #124 - Setup of Intel compiler starts:
    • Based on ARCH it will use the appropriate flags.
  • On line #395 - QPhiX customizations starts: 
    • On line #399 – Set QPHIX_HOME to correct QPhiX path (Path to mbench directory).
    • The appropriate QPhiX FLAGS will be set if the above is defined correctly.

Compile as:

Enter the ks_imp_rhmc. The Makefile with the above changes should be in this directory. Source the latest Intel® compilers and Intel® MPI Library.

make su3_rhmd_hisq -- Build su3_rhmd_hisq binary
make su3_rhmc_hisq -- Build su3_rhmc_hisq binary

Compile the above binaries for Intel® Xeon Phi™ Processor and Intel® Xeon® Processor (edit Makefile accordingly).

Run Directions

Input Files

There are two required input files, params.rest, and rat.m013m065m838.

They can be downloaded from here:

http://denali.physics.indiana.edu/~sg/MILC_Performance_Recipe/.

The file rat.m013m065m838 defines the residues and poles of the rational functions needed in the calculation. The file params.rest sets all the run time parameters, including the lattice size, the length of the calculation (number of trajectories), and the precision of the various conjugate-gradient solutions.

In addition, a params.<lattice-size> file with required lattice size will be created during runtime. This file essentially has the params.rest appended to it with the lattice size (Nx * Ny * Nz * Ny) to run.

The Lattice Sizes

The size of the four-dimensional space-time lattice is controlled by the “nx, ny, nz, nt” parameters.

As an example, consider a problem as (nx x ny x nz x nt) = 32 x 32 x 32 x 64 running on 64 MPI ranks. To weak scale this problem a user would begin by multiplying nt by 2, then nz by 2, then ny by 2, then nx by 2 and so on, such that all variables get sized accordingly in a round-robin fashion.

This is illustrated in the table below. The original problem size is 32 x 32 x 32 x 64, to keep the elements/rank constant (weak scaling); for 128 rank count, first multiply nt by 2 (32 x 32 x 32 x 128). Similarly, for 512 ranks, multiply ntby 2, nz by 2, ny by 2 from the original problem size to keep the same elements/rank.

Ranks64128256512
nx32323232
ny32323264
nz32326464
nt64128128128
     
Total Elements20971524194304838860816777216
Multiplier1248
Elements/Rank32768327683276832768

Table: Illustrates Weak Scaling of Lattice Sizes

Running with MPI x OpenMP*

The calculation takes place on a four-dimensional hypercubic lattice, representing three spatial dimensions and one time dimension. The quark fields have values on each of the lattice points and the gluon field has values on each of the links connecting nearest-neighbors of the lattice sites. 

The lattice is divided into equal subvolumes, one per MPI rank. The MPI ranks can be thought of as being organized into a four-dimensional grid of ranks. It is possible to control the grid dimensions with the params.rest file. Of course, the grid dimensions must be integer factors of the lattice coordinate dimensions.

Each MPI rank executes the same code. The calculation requires frequent exchanges of quark and gluon values between MPI ranks with neighboring lattice sites. Within a single MPI rank, the site-by-site calculation is threaded using OpenMP* directives, which have been inserted throughout the code. The most time-consuming part of production calculations is the CG solver. In the QPhiX version of the CG solver, the data layout and the calculation at the thread level is further organized to take advantage of the Intel Xeon and Intel Xeon Phi processors SIMD(single instruction, multiple data) lanes.

Running the Test Cases

  1. Create a “run” directory in the top-level directory and add the input files obtained from above.
  2. cd <milc>/run
    P.S: Run the appropriate binary for each architecture.
  3. Create the lattice volume:
    cat << EOF > params.$nx*$ny*$nz*$nt
    prompt 0
    nx $nx
    ny $ny
    nz $nz
    nt $nt
    EOF
    cat params.rest >> params.$nx*$ny*$nz*$nt

    For this performance recipe, we evaluate the single node and multinode (16 nodes) performance with the following weak scaled lattice volume:

    Single Node (nx * ny * nz * nt): 24 x 24 x 24 x 60

    Multinode [16 nodes] (nx * ny * nz * nt): 48 x 48 x 48 x 120

  4. Run on Intel Xeon processor (E5-2697v4).
    Source the latest Intel compilers and Intel MPI Library
    • Intel® Parallel Studio 2017 and above recommended

    Single Node:

    mpiexec.hydra –n 12 –env OMP_NUM_THREADS 3 –env KMP_AFFINITY 'granularity=fine,scatter,verbose'<path-to>/ks_imp_rhmc/su3_rhmd_hisq.bdw < params.24x24x24x60

    Multinode (16 nodes, via Intel® Omni-Path Host Fabric Interface (Intel® OP HFI)):

    # Create a runScript (run-bdw) #<path-to>/ks_imp_rhmc/su3_rhmd_hisq.bdw < params.48x48x48x120
    #Intel® OPA fabric-related environment variables#
    export I_MPI_FABRICS=shm:tmi
    export I_MPI_TMI_PROVIDER=psm2
    export PSM2_IDENTIFY=1
    export I_MPI_FALLBACK=0
    #Create nodeconfig.txt with the following#
    -host <hostname1> -env OMP_NUM_THREADS 3 -env KMP_AFFINITY 'granularity=fine,scatter,verbose' -n 12 <path-to>/run-bdw
    …..
    …..
    …..
    -host <hostname16> -env OMP_NUM_THREADS 3 -env KMP_AFFINITY 'granularity=fine,scatter,verbose' -n 12 <path-to>/run-bdw
    #mpirun command#
    mpiexec.hydra –configfile nodeconfig.txt
  5. Run on Intel Xeon Phi processor (7250).
    Source Intel compilers and Intel MPI Library
    • Intel® Parallel Studio 2017 and above recommended

    Single Node:

    mpiexec.hydra –n 20 –env OMP_NUM_THREADS 3 –env KMP_AFFINITY 'granularity=fine,scatter,verbose' numactl –p 1 <path-to>/ks_imp_rhmc/su3_rhmd_hisq.knl < params.24x24x24x60

    Multinode (16 nodes, via Intel OP HFI):

    # Create a runScript (run-knl) #
    numactl –p 1 <path-to>/ks_imp_rhmc/su3_rhmd_hisq.knl < params.48x48x48x120
    #Intel OPA fabric-related environment variables#
    export I_MPI_FABRICS=shm:tmi
    export I_MPI_TMI_PROVIDER=psm2
    export PSM2_IDENTIFY=1
    export I_MPI_FALLBACK=0
    #Create nodeconfig.txt with the following#
    -host <hostname1> -env OMP_NUM_THREADS 3 -env KMP_AFFINITY 'granularity=fine,scatter,verbose' -n 20 <path-to>/run-knl
    …..
    …..
    …..
    -host <hostname16> -env OMP_NUM_THREADS 3 -env KMP_AFFINITY 'granularity=fine,scatter,verbose' -n 20 <path-to>/run-knl
    #mpirun command#
    mpiexec.hydra –configfile nodeconfig.txt

Performance Results and Optimizations

The output prints the total time to solution for the entire application, which takes into account the time for the different solvers and operators (for example, CG solver, fermion force, link fattening, gauge force, and so on).

The performance chart below is the speedup w.r.t 2S Intel Xeon processor E5-2697v4 based on the total run time.

 Speedup w.r.t 2S Intel® Xeon® processor E5-2697v4

The optimizations as part of the QPhiX library include data layout changes to target vectorization and generation of packed aligned loads/stores, cache blocking, load balancing and improved code generation for each architecture (Intel Xeon processor, Intel Xeon Phi processor) with corresponding intrinsics, where necessary. See References and Resources section for details.

Testing Platform Configurations

The following hardware was used for the above recipe and performance testing.

ProcessorIntel® Xeon® Processor E5-2697 v4Intel® Xeon Phi™ Processor 7250F
Sockets / TDP2S / 290W1S / 215W
Frequency / Cores / Threads2.3 GHz / 36 / 721.4 a / 68 / 272
DDR48x16 GB 2400 MHz6x16 GB 2400 MHz
MCDRAMN/A16 GB Flat
Cluster/Snoop ModeHomeQuadrant
Memory Mode Flat
TurboOFFOFF
BIOSSE5C610.86B.01.01.0016.033
120161139
GVPRCRB1.86B.0010.R02.1
606082342
Operating SystemOracle Linux* 7.2
(3.10.0-229.20.1.el6.x86_64)
Oracle Linux* 7.2
(3.10.0-229.20.1.el6.x86_64)

MILC Build Configurations

The following configurations were used for the above recipe and performance testing.

MILC VersionMaster version as of 28 January 2017
Intel® Compiler Version2017.1.132
Intel® MPI Library Version2017.0.098
MILC Makefiles UsedMakefile.qphix, Makefile_qphixlib, Makefile

References and Resources

  1. MIMD Lattice Computation (MILC) Collaboration: http://physics.indiana.edu/~sg/milc.html
  2. QPhiX Case Study: http://www.nersc.gov/users/computational-systems/cori/application-porting-and-performance/application-case-studies/qphix-case-study/
  3. MILC Staggered Conjugate Gradient Performance on Intel Intel® Xeon Phi™ Processor: https://anl.app.box.com/v/IXPUG2016-presentation-10

Set Up and Develop Thought Processes

$
0
0

This page describes how to set up the Thought Process Engine (TPE) service (knowns as THOP service) and how to develop THOPs using the THOPBuilder plug-in. To gain an overall understanding of Thought Processes and view examples, go to Thought Processes Overview and Examples.

Set up the Thought Process Engine Service

THOPs 1.0

To develop THOPs 1.0, you must install THOPBuilder 1.0 in IntelliJ and point it to the standard ws service in SMB. Follow the instructions under Develop THOPs using THOPBuilder.

THOPs 2.0

To develop THOPs 2.0 (for asynchronous and potentially forver-running THOPs), you or your system administrator must install saffron-ws. It has a REST API to store and run THOPs. You must also install IntelliJ® and Java 8. Follow the instructions under Develop THOPs using THOPBuilder.

Install Saffron-WebServices

Prerequisites

In order for the TPE service to run, a standard Java 8 installation is required.

To install Java 8 (OpenSDK) on CentOS, do the following:

$ sudo yum install java-1.8.0-openjdk-headless.x86_64

To verify if Java 8 has been correctly installed, run:

$ java -version

Verify that the version is 1.8.0_*

If a different java version is installed, run the alternatives command to select the correct version:

$ sudo alternatives --config java

(alternatives is part of the chkconfig package)

If you are installing Oracle 8 instead, follow the instructions here: http://tecadmin.net/install-java-8-on-centos-rhel-and-fedora/#

Note: If you do not install this on a system where Saffron is already installed, create a 'saffron' user:

$ adduser saffron

Installation

Download the Saffron WS RPM: Saffron-WS--<version>-<build>.noarch.rpm

$ sudo service saffron-ws stop 

Note: The stop command will fail if you do not have an earlier version of Saffron-WS installed.

$ sudo yum install Saffron-WS--<version>-<build>.noarch.rpm
$ sudo service saffron-ws start

The service is available by default on port 8888.

To check if the initial installation was successful, use `curl`:

$ curl 'http://localhost:8888/ws/rest/proc'
{"stored_procedures" : [ ]
}

Note: If you have already configured user authentication as outlined in the Security section below, use curl -u user:pwd 'http://localhost:8888/ws/rest/proc'. The user must have the ROLE_ADMIN or the ROLE_API_USER role.

Configuration 

Configuration files for the TPE are available in /etc/saffron-ws.d

The standard configuration files can be found in the defaults sub-directory.

To make changes to a configuration file, copy the appropriate file from /etc/saffron-ws.d/defaults to /etc/saffron-ws.d and edit it there.

Check the configuration files for documentation about all of its settings.

It is recommended to adjust the following settings to tune the performance and security of the TPE service:

server.properties:

# Maximum number of concurrent HTTP connections to SMB instances - per host
max.connections.per.host=20
# Set the authentication/authorization provider.
# Possible values:
# none - no authentication for all REST resources
# smb10 - authenticate using the login services of a Saffron v10 instance. Needs a valid smbURL configured (see below)
security=none

# Optional URL to the SMB system used for authentication purposes.
# Important: URL should end in a slash, default value: http://localhost:8080/ws/
#
# NOTE:
# This setting is only used if security=smb10 (see above)
smbURL=

You can also change the IP interfaces the service binds to and port number if there are port number conflicts.

Lastly, if THOPs need to make connections to HTTPS services requiring private certificates, change the client.keystore.file setting (this is often not required as THOPs are typically only making calls to a local SaffronMemoryBase instance using HTTP).

cache.properties:

# Settings for the query results cache.
# Caching is based on ETags received from SMB instances
# Size of the cache in bytes, k, kb, kib, m, mb, mib, g, gb, gib
# This is the amount of memory reserved for storing query results.
# The cache might allocated more than the configured amount if it can fit one more entry before filling up.
# Cache entries are removed using an LRU algorithm
cache.size = 10mb

If the TPE installation is expected to see large amount of queries or large query results from SaffronMemoryBase, increasing this value is recommended. The cache is an in-memory, compressing LRU cache and its contents are lost on service restart.

Note: For the config changes to take effect, issue:

$ service saffron-ws restart

Security

By default, a TPE installation is not secured and anyone can upload and run THOPs. 

To secure TPE and tie its user management to an existing SMB system, change the security and smbURL setting in server.properties to point to an existing SMB system.

Sample configuration:

# Maximum number of concurrent HTTP connections to SMB instances - per host
max.connections.per.host=20

# Set the authentication/authorization provider.

# Possible values:

# none - no authentication for all REST resources

# smb10 - authenticate using the login services of a Saffron v10 instance. Needs a valid smbURL configured (see below)
security=smb10


# Optional URL to the SMB system used for authentication purposes.

# Important: URL should end in a slash, default value: http://localhost:8080/ws/

#

# NOTE:

# This setting is only used if security=smb10 (see above)
smbURL=http://localhost:8080/ws/

This will secure the `/ws/rest/proc` paths using HTTP Basic Authentication.  

Note: In order for this config change to take effect, issue:

$ service saffron-ws restart

Exposing TPE service when part of a SaffronMemoryBase cluster (HTTP/S support)

The TPE service can be used stand-alone or as part of an SaffronMemoryBase cluster. 

If part of an SaffronMemoryBase cluster (typically installed on the SaffronMemoryBase head node) the service can be exposed via HTTP/S using the SaffronMemoryBase cluster Apache proxy.

To do so, add proxy pass rules to the <VirtualHost> sections (one for port 80 and one for port 443) in /etc/httpd/conf.d/saffron_https_proxy.conf

ProxyPass        /thop            http://127.0.0.1:8888/ws
ProxyPassReverse /thop            http://127.0.0.1:8888/ws

Refer to the Apache documentation for more information about these settings.

Maintenance

Log files for this service can be found in /var/log/saffron-ws

Control over the service is accomplished through regular unix tools like service and chkconfig.

Thought Processes uploaded via the REST API are stored in/var/tmp/thops (subject to change in future versions of this package) 

Backup/Restore

The THOPs stored in the current version of the TPE service are considered ephemeral (hence the /var/tmp/thops location).

Users of the TPE service are expected to re-upload missing THOPs during bootstrapping.

If a snapshot of the state of the TPE service is required, use any available tool to store the contents of /var/tmp/thops.

To guarantee consistency of the snapshot, stop the service first.

Troubleshooting

If service saffron-ws start fails, check the log files for a more detailed problem description.

If error contains this:
com/saffrontech/ws/vertx/Server : Unsupported major.minor version 52

make sure that the saffron user is using the correct version of Java 8.

Compare paths from which Java is loaded of root & saffron user:

$ which java
$ su saffron
$ java -version
$ which java

Adjust ~saffron/.bashrc if necessary.

Use NPM as a package manager with THOPs 2.0

THOPs 2.0 supports access to libraries that can be used from within THOPs. See Use NPM with THOPs.

Develop THOPs using THOPbuilder

Prerequisites

The following must be installed before you install THOPBuilder:

  • IntelliJ IDE

    Saffron offers an Intellij IDE plug-in in order to develop, test, and run Thought Processes (THOPs). This plug-in works for all editions of IntelliJ IDEA. The Community Edition is available for free.
  • Java 8 (THOPs 2.0 only)

    You must install Java 8 in order to use THOPS 2.0.

Install the THOP Builder plug-in

  1. Download the attached zip file below (do not unzip this file):
    THOPBuilder-2.0.2.zip  Make sure Java 8 is already installed.
    THOPBuilder-1.3.4.zip  (Java 8 is not necessary.) 
  2. Add the THOP builder as a plugin to IntelliJ:
    Open Preferences / Plugins / Install plugin from disk

    Select the downloaded THOPBuilder zip file from Step 1. (Do not unzip the file.)

  3. Restart IntelliJ.
  4. Re-open the Preferences / Plugins window and confirm that it contains the THOP Builder plug-in.

Set up a connection to a SaffronMemoryBase

Connections to SaffronMemoryBase are stored per IDEA project.

  1. Create a new static web project:
    Open File / New / Project. You do not need to enter a Project SDK. Simply create an empty project.
  2. Open Tools and select Connect to SMB in the Project you just created:

    Enter the connection data, as shown in the example below.

    If you are not running SaffronMemoryBase on your local machine, point it to the URL used to log in to the Saffron Admin tool. Use /ws instead of /admin at the end of the URL.

    While the connection is being made with Username and Password, the Access Key is also required for THOPs to make Queries to SaffronMemoryBase.


     Connect to SaffronMemoryBase
  3. Click Test to confirm a successful connection.

    Click OK and follow the instructions to download THOPs to test if the connection works as expected. Make certain your user has the ROLE_API_USER permissions.

    Note: You must have SaffronMemoryBase 10.2 for this to work properly. If you are running 10.1, the test result might be 'Can't log in' even if you provide the correct credentials.
  4. Click OK and continue.

THOP Builder Features

  • Download THOPs to Intellij
    1. Create a directory in the new project you just created (the Download THOPs action only works on directories).
    2. Right-click that directory and choose Download THOPs.

      The currently-installed Thought Processes for the configured SMB instance are downloaded as files and saved to the directory.
       
  • Upload THOPs to SMB
    1. Right-click one or more of the existing THOPs or create a new .js.
    2. Choose Upload THOP in the context menu.
      This uploads the current contents of the selected file(s) and overwrites any existing Thought Processes by the same name. The name is taken from the filename without extension.
       
  • Delete THOPs from SMB
    1. Select one or more THOPs you want to delete.
    2. Right-click the selected THOPs and select Delete THOPs from the context menu. This remotely removes the THOPs by the same name.

      Note: Local files are not deleted. If you delete a THOP by accident, you can re-upload it.
  • Run THOPs

    Right-click a THOP (for example, HelloWorld.js) and perform one of the following:

    • Run HelloWorld ... Right-click HelloWorld and select Run HelloWorld. This creates a default run configuration and then uploads and runs the Hello World THOP. The console window shows the results of running the THOP.
    • Debug HelloWorld ... Right-click HelloWorld and select Debug HelloWorld. This creates a default run configuration and uploads and runs the Hello World THOP with debug information enabled. The console window shows the result of the THOP as well as any debug information collected using debug(...) calls within the THOPs code. Note: Debug output in THOPs is displayed only if you use the Debug action to run the THOP.
  • Edit the Run Configuration

    The Run/Debug Configurations window captures parameters and other available options to run a THOP.

    • To open, select Edit Configurations for a THOP.

      Edit the run and debug configurations for a THOP

      The Run/Debug Configurations window opens:

      THOPs Run/Debug Configurations window

      Parameters can be specified one per line using the format: parameter=value. Values are automatically URL-encoded and can be specified with blanks and other reserved characters.

      Upload THOP before running is selected by default. When the THOP is run, the file specified in Local THOP is uploaded first.

      Defaults: Set the default SMB (and default values) to which the THOP communicates. You can add arbitrary JSON objects here (THOP Builder will only support simple properties though). They can be accessed inside the THOP using the new defaults variable. If you upload via POST/PUT, add JSON object defaults to your existing JSON payload.

      Together with the main toolbar's Run feature, this process leads to a seamless coding and test cycle without having to upload the THOP manually. Note that you can also save a Run configuration and create multiple run configurations for the same THOP for testing.

      Troubleshoot Issues

      No THOPs are downloaded: Check the Event Log in Intellij. Check your connection settings in Tools / Connect SMB...

      Test button doesn't work, i.e. an error message is displayed: If the error is Can't log in, you might be connecting to an older version of SMB. Open your browser, enter the SMB's hostname and port number followed by /ws/rest/spaces. Then, log in with your credentials.

      I still can't log in: Verify that your user has the correct roles. Prior to 10.2, only the ROLE_API_USER is required. After 10.2, both the ROLE_API_USER and ROLE_DEVELOPER are required.

      My user is an admin and I can't log in: Prior to 10.2 admin users need to be given the ROLE_API_USER role. Admin users don't require special roles when using SMB 10.2 or later.

      Advanced Topics

      Use NPM with THOPs

      THOPs 2.0 supports access to libraries that can be used from within THOPs. NPM provides JavaScript libraries that can be used by THOPs 2.0. Install NPM as the package manager (https://www.npmjs.com/package/npm).

      To use packages maintained by NPM in THOPs 2.0, do the following:

      • In your shell/IDE, set the environment variable NODE_PATH to ./node_modules (or wherever you want npm modules to exist).
      • 2. Start Saffron-WS.

      To install a package, go to the home directory of Saffron-WS (adjust if you changed NODE_PATH above) and run:

      npm install <package>

      This installs the named package in ./node_modules.

      To use the package, use the require function in your THOP (more specific instructions are usually in the package documentation).

      Example of using an NPM package

      cd where/my/ws-fat.jar/file/is
      export NODE_PATH=./node_modules
      npm install n-gram

      THOP code:

      function run(p) {
          var nGram = require('n-gram');
          return nGram.bigram('Saffron Rocks');
       }

      Note: There's currently no simple flag that tells you if a particular package runs on Nashorn or not. Check the dependencies of the package (listed on npmjs.com). Preferably there are none. Also, if the docs mention that the code runs in the browser, there's a good chance it might run in Nashorn.

Can Technology Replace The Eye?

$
0
0

Data is Much More Than Numbers

Data is the tool we use to communicate a story effectively. Data is the tool that enables us to make informed decisions.

Galileo had the all-time best grasp of data analysis. He observed the stars of the Pleiades in 1610. Six stars are bright enough to be seen with the naked eye, or at most nine if it is very dark and a person’s eyesight is very good. Using a telescope, Galileo counted, scattered between these six bright stars, over forty fainter points of light. He recorded the positions of 36 stars in his sketch of the cluster and drew outlines around the stars that had been known since ancient times.

Galileo was essentially doing data pattern analysis using data points and data attributes to classify the stars into bright and dim stars, and eventually classifying those bright stars into constellations.

Same approach, except now with many more data points, much more complex attributes, and machine-led classification is what we often study today as data science and machine learning.


Figure 1.Galileo’s sketch of the Pleiades. (Image credit: Octavo Corp./Warnock Library.)

The Data Science Craze

Data science, as the name suggests, is the study of data: Data about the source of information, data about the meaning and relevance of this information, and an analysis of the value hidden behind this valuable information. 

An organization can extract high business value by mining large amounts of structured and unstructured data points to identify patterns. Data analysis and data science seek to learn from experience. Data can not only help us understand what happened in the past but also why it happened. And, in many cases, this knowledge can guide us in predicting the future and, in effect, in managing the future. 

From cost reduction to increasing efficiency to unleashing new opportunities to expanding business and customer lifetime value, data science, done right, can be a powerful tool in increasing an organization's competitive advantage. It’s no wonder that we are witnessing a growing interest in data science within both technical and business organizations worldwide.

But better analysis requires better inferences. Better inference requires better thinking and that, in turn, needs better tools, better pattern recognition, and better algorithms.

In this paper, we will cover some of these core data science concepts followed by an application of data science principles—the image classifier, with a special zoom-in on a vehicle classifier, which you will be able to use to solve a pattern recognition problem of a similar complexity.

Core Concepts

Before taking a deeper dive into the image classifier, let’s first understand some core data science concepts and terminology. 

Classifier

A classifier is a tool in data mining that takes a bunch of data representing things we want to classify, and attempts to predict which class the new data belongs to.

Take, for example, a data set that contains a set of emails. We know certain attributes about each email like sender, date, subject, content, and so on. Using these attributes, we can identify two or more types (classes) of email a person receives. Furthermore, given these attributes about email, we can predict the class of an email received. The data science algorithm that does such a classification is called the classifier.

A classifier automatically classifies the email for a user using these attributes. Gmail* is a good example of an email system that offers auto-classification; for example, a promotion, junk mail, a social media update, or just regular email.

Supervised Learning

A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples.

In this approach, data patterns are inferred from labeled training data.

To train the model, we feed the model with data sets and the corresponding correct answers. Once the model is trained, it can predict future values.

Multiple existing data points are used to define simple rules for predicting future values; this is also known as a Heuristics or rules-based model.

Unsupervised Learning

An alternate approach for data modeling is to let the data decide what to do. In this approach, many data points are fed to the machines, which in turn suggest potential clusters of information based on complex analysis and pattern recognition techniques.

This ability to programmatically make intelligent decisions by learning complex decision rules from big data is a primary selling point of machine learning.

With machine learning, the predictive accuracy of supervised learning techniques has improved multifold, especially deep learning. These human-unsupervised learning techniques now yield classifiers that outperform human predictive accuracy on many tasks.

Such unsupervised models are now being used in many walks of life. They enable us to guess how an individual will rate a movie (Netflix*), classify images (Like.com*, Facebook *), recognize speech (Siri*), and more!

Neural Networks

A neural network is a programming paradigm inspired by the structure and functioning of our brain and nervous system (neurons) which enables a computer to learn from observational data.

The goal of the neural network is to make decisions, and hence solve problems like the human brain does. Modern neural network projects typically work with a few thousand to a few million neural units and millions of connections, which is still several orders of magnitude less complex than the human brain, and closer to the computing power of a worm.

Dr. Robert Hecht-Nielsen, the inventor of one of the first neurocomputers, in "Neural Network Primer: Part I" by Maureen Caudill, AI Expert, Feb. 1989 describes a neural network as "...a computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs.”

Neural networks are organized in layers made of numerous interconnected nodes that contain an activation function.

  • Patterns are presented to the network via the input layer
  • Input layers communicate to one or more hidden layers, where the actual processing is done via a system of weighted connections.
  • The hidden layers then link to an output layer where the answer is output.

 

Neurons

A neuron is the basic working unit of the human brain. It’s a specialized cell designed to transmit information to other nerves, muscle, or gland cells.

In the programming world of neural networks, these biological neurons are emulated in interconnected and interacting components called nodes or artificial neurons. They take input data, perform simple operations on the data, and selectively pass the results on to other neurons or nodes.

The type of artificial neurons may vary, depending on the type of model. The perception and the sigmoid neuron are two of the commonly used ones.

Deep Learning

Neural networks can be shallow or deep, depending on the number of layers they have. The diagram above shows a single input layer, two hidden layers, and a single output layer. Networks could also have multiple hidden layers. Intermediate layers could be used to build multiple layers of abstraction, just like we do with Boolean circuits.

Deep learning is a powerful set of techniques for learning in neural networks when we create multiple layers of abstraction.

For example, to build a network for visual shape recognition, the neurons in the first layer could learn to recognize edges, the neurons in the second layer could learn to recognize angles, the next layer could further recognize complex shapes like a triangle, or a circle, or a rectangle, and another one could use this information to recognize the final object, such as a skateboard or a cycle.

Intuitively, we expect the deep networks with more hidden layers to be more powerful than shallow networks. However, training deep networks can be rather challenging, owing to the difference in speed at which every hidden layer learns. For example, in the phenomenon known as the vanishing gradient problem, the neurons in the earlier layers learn much more slowly than neurons in later layers. So, while it may be easier to differentiate between a triangle, square, or circle, learning the difference between edges and angles could be more difficult.

Case Study: An Image Classifier

Image Recognition

Human brains are powerful. And so is the human eye! Our visual system is possibly the least bragged about the wonder of the world.

Consider these digits: 

Most of us can effortlessly recognize these digits as 8754308. This effortlessness is deceptive, though. Each hemisphere of our brain has a primary visual cortex known as V1, which contains 140 million neurons and tens of billions of connections between them. Furthermore, human vision involves not just one visual cortex, but a series of visual cortices, which perform progressively more complex image processing. Furthermore, this supercomputer in our visual system has been trained (evolved) over hundreds of millions of years to adapt to our visual world.

This powerful image processor called our eye can tell an apple apart from a bear, a lion from a jaguar, read numerous signs, perceive colors, and do all sorts of pattern recognition. While this seems easy for our brains, these are in fact hard problems to solve with a computer.

These advancements in data analysis, data science, and machine learning are made possible through complex processing models. A type of model that can achieve reasonable performance on hard visual recognition tasks is called a deep convolutional neural network.

Convolutional Neural Network

The convolutional neural network is a kind of neural network that uses multiple identical copies of the same neuron, allowing the network to keep the number of actual parameters small while enabling it to have large numbers of neurons required to express computationally large models.

This kind of architecture is particularly well-adapted to classifying images, as this architecture makes the neural network fast to train. A faster training speed helps in training deep (many) layers of the network, which are required for recognizing and classifying images.

An Application: Vehicle Classifier

To build our model we used TensorFlow*, an open source software library for numerical computation using data flow graphs.

Training Set

The first step in building an image classifier is to build a training data set. Some things to consider when building the training data include the following:

  • The more images we have, the more accurate the trained model is likely to be. At a minimum, have at least 100 images of each object type.
  • Be aware of training bias. If one image has a tree in the garden and another a tree on the roadside, the training process could end up basing its prediction on the background instead of the specific object features that matter, as it will pick up anything that the images have in common. So, take images in as wide a variety of situations as you can; different locations, times, and from different devices, and so on.
  • Split the training data into granular and visually distinct categories, instead of big categories that cover a lot of different physical forms; otherwise, you risk ending up with abstract objects of no meaning.
  • Finally, ensure that all images are labeled correctly.

Here are some examples from our data set:


Figure 2:Random images of a bus.


Figure 3:Random images of a car.


Figure 4:Random images of a bicycle.


Figure 5:Random images of a motorbike.

To build our vehicle classifier, we downloaded about 500 images each of the four different types of vehicles—a car, a bicycle, a bus, and a motorbike, from ImageNet*.

Training The Model

Next, we pointed our script to pick these randomly selected images to train the model.

Object recognition models could sometimes take weeks to train as they can have thousands of parameters. A technique called transfer learning shortcuts this work considerably by taking a pre-existing, fully-trained model for a set of categories, like ImageNet, and retraining it for new classes using the existing weights.

We leveraged this technique to train our model.

Also in this case study, we retained only the final layer from scratch; the others were left untouched. We removed the old top layer and trained a new one on our vehicle photos, none of which were in the original ImageNet classes that the full network was trained on. Using these transfer techniques, the lower layers that were previously trained to distinguish between a set of objects could now be leveraged for alternate recognition tasks with little alteration.

Validating The Model

As we were training (fitting) the model to the sample images, we also needed to test that the model worked on images not present in the sample set. That, after all, was the entire goal of our vehicle classifier!

So, we divided the images into three different sets.

  1. Training set—the images used to train the network, the results of which are used to update the model's weights. We normally dedicate 80 percent of the images for training.
  2. Validation set—these images are used to validate the model frequently while we are training the model. Usually, 10 percent of the data is used for this.
  3. Testing set—used less often as a testing set to predict the real-world performance of the classifier.

TensorFlow Commands

Following are the commands we used for the transfer learning functionality of TensorFlow.

Prerequisite

- Install bazel ( check tensorflow's github for more info )
    Ubuntu 14.04:
        - Requirements:
            sudo add-apt-repository ppa:webupd8team/java
            sudo apt-get update
            sudo apt-get install oracle-java8-installer
        - Download bazel, ( https://github.com/bazelbuild/bazel/releases )
tested on: https://github.com/bazelbuild/bazel/releases/download/0.2.0/bazel-0.2.0-jdk7-installer-linux-x86_64.sh
        - chmod +x PATH_TO_INSTALL.SH
        - ./PATH_TO_INSTALL.SH --user
        - Place bazel onto path ( exact path to store shown in the output)

Note: Bazel helped us run TensorFlow from the command line.

Training The Model

We prepared the folder structure as follows:

- root_folder_name
        - class 1
            - file1
            - file2
        - class 2
            - file1
            - file2
- Clone tensorflow
- Go to root of tensorflow
- bazel build tensorflow/examples/image_retraining:retrain
- bazel-bin/tensorflow/examples/image_retraining/retrain --image_dir /path/to/root_folder_name  --output_graph /path/output_graph.pb -- output_labels /path/output_labels.txt -- bottleneck_dir /path/bottleneck

Each class (Class1, Class2) represents a vehicle type like car, bus, and so on. Each file is an image of that type. We pointed the TensorFlow model to the image folder here.

Testing Through Bazel

bazel build tensorflow/examples/label_image:label_image && \
bazel-bin/tensorflow/examples/label_image/label_image \
--graph=/path/output_graph.pb --labels=/path/output_labels.txt \
--output_layer=final_result \
--image=/path/to/test/image

For this, we used another TensorFlow program called Label*. It is a C++ program in the TensorFlow directory.

Results

We tested the model on a few images. Attached is an example image of each type and the result we got from this model.


I tensorflow/examples/label_image/main.cc:205] car (0): 0.729847
I tensorflow/examples/label_image/main.cc:205] motor bike (1): 0.140029
I tensorflow/examples/label_image/main.cc:205] bicycle (2): 0.0864567
I tensorflow/examples/label_image/main.cc:205] bus (3): 0.0436665

This implies that the model predicts this image to be a car with 72 percent confidence.


I tensorflow/examples/label_image/main.cc:205] bus (3): 0.933695
I tensorflow/examples/label_image/main.cc:205] car (0): 0.0317426
I tensorflow/examples/label_image/main.cc:205] motor bike (1): 0.0192131
I tensorflow/examples/label_image/main.cc:205] bicycle (2): 0.0153493

As we can see, the model predicts this image to be a bus with 93 percent confidence; there is only 1.9 percent confidence that it could be a motorbike.


I tensorflow/examples/label_image/main.cc:205] bicycle (2): 0.999912
I tensorflow/examples/label_image/main.cc:205] car (0): 4.71345e-05
I tensorflow/examples/label_image/main.cc:205] bus (3): 2.30646e-05
I tensorflow/examples/label_image/main.cc:205] motor bike (1): 1.80958e-05


I tensorflow/examples/label_image/main.cc:205] motor bike (1): 0.979943
I tensorflow/examples/label_image/main.cc:205] bicycle (2): 0.019588
I tensorflow/examples/label_image/main.cc:205] bus (3): 0.000264289
I tensorflow/examples/label_image/main.cc:205] car (0): 0.000204627

A New Possible

The evolution of the Internet made knowledge accessible for everyone and anyone. Social media made it easy for us to stay in touch with each other anywhere, anytime. And now, with the evolution of the machine learning phenomenon, a new wave of possible is visible in many areas. 

Let’s see some applications of image classification applied to vehicles!

Toll Bridge

Automated tolls for cars are not uncommon anymore. But have you noticed the separate, often long, lines for trucks on a toll bridge?

The toll is often different for a car, SUV, truck or other vehicle types. Technology has made it possible to auto-detect toll from sensors. How about extending that convenience further by classifying the amount of toll to be charged? USD 5 for cars and USD 25 for trucks…all automated.

Autonomous Cars

Google* and Uber* are actively working on autonomous cars. Imagine the future where the road is full of autonomous cars. With some humans on the road, we can at least rely on our visual systems to drive carefully around driverless cars, or reckless driver cars.

But when the super-power computer residing in our eye is no longer watching for them, we will need a vehicle classifier to help the autonomous cars differentiate between the types of traffic on the road!

Our Security

FBI, police departments and many security companies keep us and our country safe today.

Even in day-to-day life, we see amber alerts on the highway about vehicles carrying kidnapped kids or some other situation.

Imagine this search being conducted in a fully automated manner, as suggested in science fiction movies, but now a real possibility with a mashup of google maps, data science-based image recognition, and vehicle classification algorithms.

Parking

Parking is a growing problem, especially for cities. With a robust vehicle classifier, we can imagine fully automated and highly efficient parking, with certain sections and floors dedicated to certain types of vehicles. Sports cars need low heights, while sports trucks are huge. The gates at each floor could use vehicle-detecting sensors to allow only certain types of vehicles to enter the lot. And with the inches saved, many more cars can be parked in the same space!

Conclusion

These examples are just a beginning. We look forward to hearing from you on the enhancements you make and the complexities you deal with as you unleash the power hidden in the data.

In any case, based on the topics covered and the examples cited in this paper, hopefully, you are convinced, as are we, that the technology advancements, especially those emulating the human brain and eye, are evolving at a fast pace and may soon replace the human eye. Automation will replace manual tasks, especially for the repetitive tasks involving visualization and classification. 

The human brain structure and our complex visualization system, though, will remain a mystery for us to unfold as we continue to learn more about it.

Citations

"CHAPTER 1" Nielsen, Michael, Neural Networks and Deep Learning, Jan. 2016, accessed Oct. 23, 2016.

Visualizing and Understanding Convolutional Networks” Zeiler, Matthew, and Fergus, Rob, Nov. 2013, accessed Oct. 23, 2016.

"Image Recognition" TensorFlow Tutorials, Dec. 2016, accessed Oct. 23, 2016.

"Artificial Neuron" Wikipedia, Wikimedia Foundation, Oct. 2016, accessed Oct. 23, 2016.

"Neural Network Structures" Neural Networks for RF and Microwave Design, IEEE.CZ.

"How to Retrain Inception's Final Layer for New Categories" TensorFlow Tutorials, Dec. 2016, accessed Oct. 30, 2016.

How to evaluate Intel® Software Development Products

$
0
0

Before purchasing a license for one of Intel® Software Development products, you can choose to first evaluate the compilers, libraries and tools. You can evaluate an entire suite or a standalone tool. The free 30 days evaluation period will let you experience the products firsthand and determine which tool is right for you. To request a free evaluation license visit the Intel® Software Development Tools page and choose from a wide selection of products. Note: You are not limited to one product but can evaluate the entire range of tools.

The free 30 days evaluation period comes with Priority customer support. You can access this support by creating an Intel® Registration Center account after receiving the evaluation license. Once your account is created you can submit a ticket at our Online Support Center web site, www.intel.com/supporttickets. You can also visit the Intel® Developer Zone Forums for support. The forums are a meeting place for developers like you, to share information and ask questions.

Please note that the product will cease to function at the end of the evaluation period. An evaluation license is not renewable.

Have questions?

Check out the Licensing FAQ
Or ask* in our Intel® Software Development Products Download, Registration & Licensing forum

* If you have a question be sure to start a new forum thread.

 

  •  

I have registered my evaluation license but I see no login id listed for getting support

$
0
0

Creating an Account During Evaluation

Account creation is not automatic for evaluation licenses. If you would like to create a full account (recommended) you can find a link to account creation in the registration email below the Download button. Once your account is created you will be entitled to to receive Priority Support at Online Service Center www.intel.com/supporttickets during the evaluation period. You can also get free support by visiting the Intel® Developer Zone Forums. Please note that the product will cease to function at the end of the evaluation period, and the evaluation license is not renewable.


Set Up and Develop Thought Processes

$
0
0

This page describes how to set up the Thought Process Engine (TPE) service (knowns as THOP service) and how to develop THOPs using the THOPBuilder plug-in. To gain an overall understanding of Thought Processes and view examples, go to Thought Processes Overview and Examples.

       THOPs 1.0
       THOPs 2.0
            Install Saffron-WebServices
                 Prerequisites
            Installation
            Configuration 
            Security
            Backup/Restore
            Troubleshooting
 
            Download THOPs to Intellij
            Run THOPs
            Edit the Run Configuration
 
       Use NPM with THOPs
 

Set up the Thought Process Engine Service

THOPs 1.0

To develop THOPs 1.0, you must install THOPBuilder 1.0 in IntelliJ and point it to the standard ws service in SMB. Follow the instructions under Develop THOPs using THOPBuilder.

THOPs 2.0

To develop THOPs 2.0 (for asynchronous and potentially forver-running THOPs), you or your system administrator must install saffron-ws. It has a REST API to store and run THOPs. You must also install IntelliJ® and Java 8. Follow the instructions under Develop THOPs using THOPBuilder.

Install Saffron-WebServices

Prerequisites

In order for the TPE service to run, a standard Java 8 installation is required.

To install Java 8 (OpenSDK) on CentOS, do the following:

$ sudo yum install java-1.8.0-openjdk-headless.x86_64

To verify if Java 8 has been correctly installed, run:

$ java -version

Verify that the version is 1.8.0_*

If a different java version is installed, run the alternatives command to select the correct version:

$ sudo alternatives --config java

(alternatives is part of the chkconfig package)

If you are installing Oracle 8 instead, follow the instructions here: http://tecadmin.net/install-java-8-on-centos-rhel-and-fedora/#

Note: If you do not install this on a system where Saffron is already installed, create a 'saffron' user:

$ adduser saffron

Installation

Download the Saffron WS RPM: Saffron-WS--<version>-<build>.noarch.rpm

$ sudo service saffron-ws stop

Note: The stop command will fail if you do not have an earlier version of Saffron-WS installed.

$ sudo yum install Saffron-WS--<version>-<build>.noarch.rpm
$ sudo service saffron-ws start

The service is available by default on port 8888.

To check if the initial installation was successful, use `curl`:

$ curl 'http://localhost:8888/ws/rest/proc'
{"stored_procedures" : [ ]
}

Note: If you have already configured user authentication as outlined in the Security section below, use curl -u user:pwd 'http://localhost:8888/ws/rest/proc'. The user must have the ROLE_ADMIN or the ROLE_API_USER role.

Configuration 

Configuration files for the TPE are available in /etc/saffron-ws.d

The standard configuration files can be found in the defaults sub-directory.

To make changes to a configuration file, copy the appropriate file from /etc/saffron-ws.d/defaults to /etc/saffron-ws.d and edit it there.

Check the configuration files for documentation about all of its settings.

It is recommended to adjust the following settings to tune the performance and security of the TPE service:

server.properties:

# Maximum number of concurrent HTTP connections to SMB instances - per host

max.connections.per.host=20

# Set the authentication/authorization provider.

# Possible values:

# none - no authentication for all REST resources

# smb10 - authenticate using the login services of a Saffron v10 instance. Needs a valid smbURL configured (see below)
security=none

# Optional URL to the SMB system used for authentication purposes.

# Important: URL should end in a slash, default value: http://localhost:8080/ws/

#

# NOTE:

# This setting is only used if security=smb10 (see above)

smbURL=

You can also change the IP interfaces the service binds to and port number if there are port number conflicts.

Lastly, if THOPs need to make connections to HTTPS services requiring private certificates, change the client.keystore.file setting (this is often not required as THOPs are typically only making calls to a local SaffronMemoryBase instance using HTTP).

cache.properties:

# Settings for the query results cache.

# Caching is based on ETags received from SMB instances
# Size of the cache in bytes, k, kb, kib, m, mb, mib, g, gb, gib
# This is the amount of memory reserved for storing query results.
# The cache might allocated more than the configured amount if it can fit one more entry before filling up.
# Cache entries are removed using an LRU algorithm

cache.size = 10mb

If the TPE installation is expected to see large amount of queries or large query results from SaffronMemoryBase, increasing this value is recommended. The cache is an in-memory, compressing LRU cache and its contents are lost on service restart.

Note: For the config changes to take effect, issue:

$ service saffron-ws restart

Security

By default, a TPE installation is not secured and anyone can upload and run THOPs.

To secure TPE and tie its user management to an existing SMB system, change the security and smbURL setting in server.properties to point to an existing SMB system.

Sample configuration:

# Maximum number of concurrent HTTP connections to SMB instances - per host
max.connections.per.host=20

# Set the authentication/authorization provider.
# Possible values:
# none - no authentication for all REST resources
# smb10 - authenticate using the login services of a Saffron v10 instance. Needs a valid smbURL configured (see below)
security=smb10

# Optional URL to the SMB system used for authentication purposes.
# Important: URL should end in a slash, default value: http://localhost:8080/ws/
#
# NOTE:
# This setting is only used if security=smb10 (see above)
smbURL=http://localhost:8080/ws/

This will secure the `/ws/rest/proc` paths using HTTP Basic Authentication.  

Note: In order for this config change to take effect, issue:

$ service saffron-ws restart

Exposing TPE service when part of a SaffronMemoryBase cluster (HTTP/S support)

The TPE service can be used stand-alone or as part of an SaffronMemoryBase cluster. 

If part of an SaffronMemoryBase cluster (typically installed on the SaffronMemoryBase head node) the service can be exposed via HTTP/S using the SaffronMemoryBase cluster Apache proxy.

To do so, add proxy pass rules to the <VirtualHost> sections (one for port 80 and one for port 443) in /etc/httpd/conf.d/saffron_https_proxy.conf

ProxyPass /thop http://127.0.0.1:8888/ws
ProxyPassReverse /thop http://127.0.0.1:8888/ws

Refer to the Apache documentation for more information about these settings.

Maintenance

Log files for this service can be found in /var/log/saffron-ws

Control over the service is accomplished through regular unix tools like service and chkconfig.

Thought Processes uploaded via the REST API are stored in/var/tmp/thops (subject to change in future versions of this package) 

Backup/Restore

The THOPs stored in the current version of the TPE service are considered ephemeral (hence the /var/tmp/thops location).

Users of the TPE service are expected to re-upload missing THOPs during bootstrapping.

If a snapshot of the state of the TPE service is required, use any available tool to store the contents of /var/tmp/thops.

To guarantee consistency of the snapshot, stop the service first.

Troubleshooting

If service saffron-ws start fails, check the log files for a more detailed problem description.

If error contains this:
com/saffrontech/ws/vertx/Server : Unsupported major.minor version 52

make sure that the saffron user is using the correct version of Java 8.

Compare paths from which Java is loaded of root & saffron user:

$ which java
$ su saffron
$ java -version
$ which java

Adjust ~saffron/.bashrc if necessary.

Use NPM as a package manager with THOPs 2.0

THOPs 2.0 supports access to libraries that can be used from within THOPs. See Use NPM with THOPs.

Develop THOPs using THOPbuilder

Prerequisites

The following must be installed before you install THOPBuilder:

  • IntelliJ IDE

    Saffron offers an Intellij IDE plug-in in order to develop, test, and run Thought Processes (THOPs). This plug-in works for all editions of IntelliJ IDEA. The Community Edition is available for free.
  • Java 8 (THOPs 2.0 only)

    You must install Java 8 in order to use THOPS 2.0.

Install the THOP Builder plug-in

  1. Download one of the attached .zip files below (do not unzip this file):

    THOPBuilder-2.0.2.zip (Make sure Java 8 is already installed.)
    Downloadapplication/zipDownload 
    THOPBuilder-1.3.4.zip (Java 8 is not necessary.)
    Downloadapplication/zipDownload  
  2. Add the THOP builder as a plugin to IntelliJ:

    Open Preferences / Plugins / Install plugin from disk

    Select the downloaded THOPBuilder zip file from Step 1. (Do not unzip the file.)

  3. Restart IntelliJ.
  4. Re-open the Preferences / Plugins window and confirm that it contains the THOP Builder plug-in.

Set up a connection to a SaffronMemoryBase

Connections to SaffronMemoryBase are stored per IDEA project.

  1. To create a new static web project, click File / New / Project.

    You do not need to enter a Project SDK. Simply create an empty project.
  2. Open Tools and select Connect to SMB in the Project you just created.
  3. Enter the connection data, as shown in the example below.

    If you are not running SaffronMemoryBase on your local machine, point it to the URL used to log in to the Saffron Admin tool. Use /ws instead of /admin at the end of the URL.

    While the connection is being made with Username and Password, the Access Key is also required for THOPs to make Queries to SaffronMemoryBase.


     Connect to SaffronMemoryBase
  4. Click Test to confirm a successful connection.
  5. Click OK and follow the instructions to download THOPs to test if the connection works as expected. Make certain your user has the ROLE_API_USER permissions.

    Note: You must have SaffronMemoryBase 10.2 for this to work properly. If you are running 10.1, the test result might be 'Can't log in' even if you provide the correct credentials.
  6. Click OK and continue.

THOP Builder Features

Download THOPs to Intellij

  1. Create a directory in the new project you just created (the Download THOPs action only works on directories).
  2. Right-click that directory and choose Download THOPs.

    The currently-installed Thought Processes for the configured SaffronMemoryBase instance are downloaded as files and saved to the directory.

Upload THOPs to SaffronMemoryBase

  1. Right-click one or more of the existing THOPs or create a new .js.
  2. Choose Upload THOP in the context menu.

    This uploads the current contents of the selected file(s) and overwrites any existing Thought Processes by the same name. The name is taken from the filename without extension.

Delete THOPs from SaffronMemoryBase

  1. Select one or more THOPs you want to delete.
  2. Right-click the selected THOPs and select Delete THOPs from the context menu. This remotely removes the THOPs by the same name.

    Note: Local files are not deleted. If you delete a THOP by accident, you can re-upload it.

Run THOPs

Right-click a THOP (for example, HelloWorld.js) and perform one of the following:

  • Run HelloWorld ... Right-click HelloWorld and select Run HelloWorld. This creates a default run configuration and then uploads and runs the Hello World THOP. The console window shows the results of running the THOP.
  • Debug HelloWorld ... Right-click HelloWorld and select Debug HelloWorld. This creates a default run configuration and uploads and runs the Hello World THOP with debug information enabled. The console window shows the result of the THOP as well as any debug information collected using debug(...) calls within the THOPs code. Note: Debug output in THOPs is displayed only if you use the Debug action to run the THOP.

Edit the Run Configuration

The Run/Debug Configurations window captures parameters and other available options to run a THOP.

To open, select Edit Configurations for a THOP.

Edit the run and debug configurations for a THOP

The Run/Debug Configurations window opens:

THOPs Run/Debug Configurations window

Parameters can be specified one per line using the format: parameter=value. Values are automatically URL-encoded and can be specified with blanks and other reserved characters.

Upload THOP before running is selected by default. When the THOP is run, the file specified in Local THOP is uploaded first.

Defaults: Set the default SMB (and default values) to which the THOP communicates. You can add arbitrary JSON objects here (THOP Builder will only support simple properties though). They can be accessed inside the THOP using the new defaults variable. If you upload via POST/PUT, add JSON object defaults to your existing JSON payload.

Together with the main toolbar's Run feature, this process leads to a seamless coding and test cycle without having to upload the THOP manually. Note that you can also save a Run configuration and create multiple run configurations for the same THOP for testing.

Troubleshoot Issues

No THOPs are downloaded: Check the Event Log in Intellij. Check your connection settings in Tools / Connect SMB...

Test button doesn't work, i.e. an error message is displayed: If the error is Can't log in, you might be connecting to an older version of SMB. Open your browser, enter the SMB's hostname and port number followed by /ws/rest/spaces. Then, log in with your credentials.

I still can't log in: Verify that your user has the correct roles. Prior to 10.2, only the ROLE_API_USER is required. After 10.2, both the ROLE_API_USER and ROLE_DEVELOPER are required.

My user is an admin and I can't log in: Prior to 10.2 admin users need to be given the ROLE_API_USER role. Admin users don't require special roles when using SMB 10.2 or later.

Advanced Topics

Use NPM with THOPs

THOPs 2.0 supports access to libraries that can be used from within THOPs. NPM provides JavaScript libraries that can be used by THOPs 2.0. Install NPM as the package manager (https://www.npmjs.com/package/npm).

To use packages maintained by NPM in THOPs 2.0, do the following:

  1. In your shell/IDE, set the environment variable NODE_PATH to ./node_modules (or wherever you want npm modules to exist).
  2. Start Saffron-WS.

To install a package, go to the home directory of Saffron-WS (adjust if you changed NODE_PATH above) and run:

npm install <package>

This installs the named package in ./node_modules.

To use the package, use the require function in your THOP (more specific instructions are usually in the package documentation).

Example of using an NPM package

cd where/my/ws-fat.jar/file/is
export NODE_PATH=./node_modules
npm install n-gram

THOP code:

function run(p) {
    var nGram = require('n-gram');
    return nGram.bigram('Saffron Rocks');
 }

Note: There is currently no simple flag that tells you if a particular package runs on Nashorn or not. Check the dependencies of the package (listed on npmjs.com). Preferably there are none. Also, if the documentation mentions that the code runs in the browser, there's a good chance it will run in Nashorn.

Thought Processes Programming Guide

$
0
0

Parameters
     Pre-Defined Parameters
     Pre-Defined Objects
     Concise Syntax
     params Function
     Implicit Space Parameter
     Multiple Values for Parameters

JavaScript Environment

Debug
     Debugging THOPs
     What's my Query Again?
     Use Debug to Troubleshoot THOPs

Testing
     Test a THOP from the Command Line

Tips
     Dos and Don'ts
     THOP Editor Hidden Gems
     Unsupported REST Calls
     Using .get() on Connection and Other Query Objects

Asynchronous THOPs (2.0)
     Calling THOPs from THOPs
     Writing Non-Blocking THOP Code
     Why Async I/0?

JavaScript Help
 

Parameters

The run function of a Thought Process (THOP) takes a single parameter, which is a multi-map of key/value pairs that were specified when running the THOPs 1.0.

function run(p) {...}

Given the following REST call to a THOP:

/proc/find_connections/result?name=Saffron&category=company&category=location

The parameter p will look like this:

   { name: "Saffron", category: [ "company", "location" ] }

'p' is never null, but it might not have a property.

Pre-defined Parameters

The following are reserved parameter names when calling THOPs:

space
When set, you can omit the space name when using queries. That is, instead of writing connections(p.space), you can just use connections().

access_key
The key used to run this THOP.
 
signature
The signature of the call (this is not useful per se; make sure your THOP does not rely on a parameter called signature.

thop.result=raw
Returns the result of a THOP directly, without wrapping it inside a { result: "..." } object.

Pre-defined Objects

local
THOPs 1.0 only.
This is an instance of SaffronMemoryBase pointing to the local SaffronMemoryBase system.

local.connections(…) -> connections(…)

query methods in REST API (using local)
THOPs 1.0 only. 
Analogies, attributes, classifications, connections, episodes, information, networks, trends, resources.
 
debug
 
quote
This takes a string and returns a quoted string.

All queries also have a .params(..) function that can be used to define query parameters as a JavaScript object:

connections(‘uksecurity’).params({ q: ‘city:london’, c: ‘country’ }).get();

Concise Syntax

For example, var p = params.p ? params.p : 1; can be written more concisely as follows:

var p = params.p || 1

params Function

You can specify all parameters for a query using the params({...}) function. Note that params({...}) overwrites any parameters set, so use with care.

Compare the following 2 lines of code:

connections(space).q('john').c('person').get()
connections(space).params({ q: 'john', c: 'person' }).get();

Implicit Space Parameter

You can write THOP code that works for any space by using the implicit space parameter.

Example:

return connections().q('person').get();

Note that no space was specified in the connections function. In that case, callers of the THOP are expected to pass the space name in the reserved parameter 'space'.

Multiple Values for Parameters

If a query parameter can be specified multiple times (like c() in a connection query()), you have a few options to specify them:

connections().c('cat1').c('cat2').c('cat3');

Or like this:

connections().c('cat1','cat2','cat3');

If you have the data in an array already, you can use the apply function.

var cats = ['cat1', 'cat2', 'cat3'];
var query = connections();
query.c.apply(query, cats); 

If you use the param function, you can use an array as well:

connections().params({ c: ['cat1', 'cat2', 'cat3']});

Multiple identical parameters is also supported.

When using the REST API to call a THOP:

c=city&c=country&c=person

it is translated into an array of values. The params argument looks like this:

{ c: [ 'city', 'country', 'person' ] }

JavaScript Environment

THOPs are written in JavaScript code.

The run-time environment is very different to the one available in browsers. There is no window object, for example. In order to make working with JavaScript easy, the following libraries are loaded and made available in the context in which a THOP runs:

Lo-Dash (http://lodash.com/)
Contains dozens of useful functions to manipulate arrays, objects, and collections. If a result returned from a Saffron Query API call has a result array by the name of r, it is automatically wrapped using the _(...) function. This makes all lo-dash functions available on r itself.
 
JSON.js
Adds the standard JSON.stringify and JSON.parse functions. Return values from Saffron Query API calls are automatically parsed using JSON.parse.
 
resty
A simple JSON-centered HTTP-client which can be used with any Saffron REST service. It has a single function getJSON(url, accessKey, secretCode) and returns the content of the URL as string.
 
EcmaScript 5 Shims (https://github.com/es-shims/es5-shim)
THOPs 1.0 Only.

THOPs 1.0 JS code is evaluated in a Rhino container, enhanced with EcmaScript 5 extensions. THOPs 2.0 does not require this because it runs on an ES5 JavaScript engine.

The Rhino environment also allows access to any Java class. This gives a THOP almost complete access to the web service layer in Saffron. Please use this capability wisely.

All global functions and classes are available in the API documentation (http://yourserver:8080/ws/doc)

Debug

Debugging THOPs

THOPBuilder does not include a proper debugger that can step through each line in a THOP. Below are the current available actions.

debug(...) can be used to add debug information that is returned as part of the result object.

Example:

function run(p) {
 debug(p);
 return [{hello: p.name || 'world!'}];
}

When run like this:

curl http://admin:saffron@localhost:8080/ws/rest/proc/helloworld/result?name=Scaramanga

The resulting JSON will look like this:

{"result":"[{\"hello\":\"Scaramanga\"}]","duration":1,"error":null,"debug":"{\"name\":\"Scaramanga\"}"}

Note the debug section!

If you use THOPBuilder in Intellij, run the THOP as Debug to see the Debug output:

-- Running helloworld
-- Debug output:
{"name":"Scaramanga","access_key":"demo"}
-- Result: helloworld/name=Scaramanga
[
 {"hello": "Scaramanga"
 }
] 

Tip

If you add parameter log=http when calling a THOP, the URL called as well as the result will be added to the debug output (make sure you run your THOP using the Debug action in Intellij).

What's My Query Again?

If you are unsure what your current query looks like (before running .get() or .getRaw()), you can use .uri() to look at the actual URL generated. You can use that with debug() to add debug output that will be returned to you in the 'debug' property of the result object.

Here's an example:

function run(p) {
 debug(connections('jirahadoop').q('test').c('issue').uri());
 return {};
}

returns this:

{"result":"{}","duration":3,"error":null,"debug":"http://localhost:8080/ws/spaces/jirahadoop/connections?q=test&c=issue&"}

Use Debug to Troubleshoot THOPS

Since there is no ordinary debugger available for troubleshooting Thought Processes, you can use the debug(...) method to add debug output alongside the result of your THOP. Debug(...) will automatically convert any object given to JSON. As seen in the first example, the debug output is delivered alongside the result.

If an error occurs when calling a Saffron API, the error message might contain a URL. Open the URL to learn about any errors that occurred when making the call.

Testing

Test a THOP From the Command Line

Below is the simplest way to test a THOP from the command line. An access key is still needed for the moment, but no encoding of the URL is necessary.

curl'http://admin:saffron@localhost:8080/ws/rest/proc/test/result?access_key=demo'

Tips

Dos and Don'ts

Do

  • Use debug(...) to help with troubleshooting.
  • debug(yourQuery.uri()) will show you the URL used. Use the test harness to run
  • Guard against invalid parameters. Anyone with a THOP account

Do not

  • Use global variables

    The behavior of global variables is undefined. In the current THOPs 1.0 implementation, each THOP has its own instance of the Rhino JavaScript Engine. So, invoking the same THOP in parallel runs in that single engine. If your THOP has global variables, changes to the variables will be visible to all invocations.

  • Be careful when using Java classes, such as Packages.java.lang.System.exit(0);

THOPs Editor Hidden Gems

The editor that is used for THOPs contains some hidden gems:

Alt-F
Search for keywords (with regexp etc).
2xAlt-F
Search and replace.

Unsupported REST Calls

You can use the THOPs API to make any kind of call to the SaffronMemoryBase REST API if it starts with the ws/spaces/<spacename> prefix by using the space(name, 'path') function.

Example:

function run(p) {
 var resource = space('uksecurity','resources').params({rr: 'resourceref' }).get();
}

Using .get() on Connection and Other Query Objects

Please keep in mind that calling .get() will always make a call to SaffronMemoryBase. This means:

  1. Hold on to the result of a .get() operation if you want to make calls to it repeatedly.
  2. You can re-use connection objects.

Asynchronous THOPs (2.0)

Calling THOPs from THOPs

You can call other THOPs from a THOP by using the thop.runAsync variable. You can call other THOPs asynchronously and collect the results. 

Example:

function runAsync(p, cb) {
 Promise.all([
 thop.runAsync('test-thop', {name:'Darth Vader'}),
 thop.runAsync('test-thop', {name:'Luke Skywalker'}),
 thop.runAsync('test-thop', {name:'Walter White'})
 ]).then(function(results) {
 debug(results);
 cb(results);
 });
}

The THOP test-thop can be a synchronous or asynchronous THOP. 

In this example, it is simply:

function run(p) {
 console.log("Console output from within THOP");
 debug('name:' + p.name);
 return "Hello " + (p.name || 'world!');
}

Here is the output when run from THOP builder: 

-- Running test-subcall
-- Debug output:
Output from running test-thop
name:Walter White
Output from running test-thop
name:Darth Vader
Output from running test-thop
name:Luke Skywalker
["Hello Darth Vader","Hello Luke Skywalker","Hello Walter White"]
-- Result: test-subcall/
["Hello Darth Vader","Hello Luke Skywalker","Hello Walter White”
]

Writing Non-Blocking THOP Code

Here is a simple example of a THOP that uses async I/O versus the current blocking I/O: 

function runAsync(p, result) {
 analogies('uksecurity').q('city:london').a('city:london').rtr(true).async()
 .then(function (data) { result(data) });
}

Three things are different here:

  • The main function needs to be called runAsync.
  • The second parameter of that function is a callback function that the THOP needs to call with the final result.
  • get() and getRaw() will return right away with a Promise object.

Important: get() will not give you the result of your query, but will return immediately with a promise that the actual result will be delivered (or fail) in the future.

A Promise object has two interesting functions then(...) and catch(...). Any function you pass to then will be called with the result of the REST call.

In the example above, the result of analogies call is then returned as a result of the THOP.

Why Async I/O?

Here is an example that shows how we can make better use of SaffronMemoryBase resources and do things in parallel without actually having to resort to multi-threaded programming.

Keep in mind that THOPs run single-threaded, much like JavaScript runs in the browser.

We can still run things concurrently as shown in this example:

function runAsync(p, result) {
 var p1 = connections('uksecurity').q('london’).async();
 var p2 = connections('uksecurity').q('moscow’).async();
 Promise.all([p1,p2]).then(function (results) {
 results[0].r.push(results[1].r);
 result(results[0].r.value());
 }).catch(function (error){ console.log("Oh my"); console.log(error.message); });
}

This example runs two connection queries, combines the results and returns it, but each connection query is run asynchronously. The requests are being run in parallel as the underlying engine will make two HTTP requests as fast as it can, then wait for I/O.

When the requests are complete, the promise objects p1 and p2 become fulfilled in unspecified order and at an unspecified time.

Using Promise.all, we register a function that is being called when both promises have become fulfilled, returning their results in results in the same order that we used in the Promise.all call.

This way we can run any number of queries in parallel and wait for their 'fulfillment' before doing the next step in our processing. 

JavaScript Help

https://developer.mozilla.org/en-US/docs/Web/JavaScript

http://www.codecademy.com/en/tracks/javascript

Thought Processes Overview and Examples

$
0
0

Understanding Thought Processes
     THOPs 1.0
     THOPs 2.0

THOPs Examples
     THOPs 1.0
          Example 1: Hello World
          Example 2: Hello Stranger
          Example 3: Connections Query
          Example 4: Connections Query with Built-Ins
     THOPs 2.0
          Example 1: Asynchronous Connections
          Example 2: Long-Running Asynchronous Connections
 

Understanding Thought Processes

Saffron Technology™ Thought Processes (THOPs) are user-defined functions that allow you to tie together various SaffronMemoryBase™ and other capabilities using a scripting language. Thought Processes can help perform a wide variety of actions such as: simplifying the execution of complex sequential queries, calling outside applications to be used in conjunction with Saffron Technology reasoning functions, or converting query results into data suitable for UI widgets. THOPs are written in one of the supported script languages (currently, only JavaScript is available).

If you are familiar with other database products, think of Thought Processes as stored procedures. THOPs can be created via the REST API or by using the developer tools in Saffron Admin. After a Thought Process is defined, it becomes callable through the standard REST API.

Currently, SaffronMemoryBase includes two versions of Thought Processes: THOPs 1.0 and THOPs 2.0.

THOPs 1.0

These Thought Processes are run synchronously. THOPs 1.0 runs via an HTTP GET call with the calling client and waits for a result.

Synchronous THOP diagram

THOPs 1.0 are called using a GET operation and uses the Tomcat application server. This version is based on the Rhino JavaSript engine with extensions.

Use THOPs 1.0 for the following types of operations:

  • Simple single queries
  • Fast operations 
  • Where asynchronization is not necessary

THOPs 2.0

The THOPs 2.0 layer enables Thought Processes to run asynchronously and to have bi-directional communication with the calling client. The processes can be long-running, which is why we refer to them as Deep Thoughts (DT). When initiated by a caller, DTs can run arbitrarily long and can send messages to the client before delivering a final result.

The following is an example of a Deep Thought that can deliver a partial result to the calling client as soon as data is available and deliver the remaining results later:

Deep Thought diagram sending messages to caller

Deep Thoughts can also listen to messages being sent by the caller and reply to those messages individually (as well as return partial or final results).

Here, THOPs can register for messages from the caller. THOPs can also reply to messages or send messages proactively:

Deep Thought message exchange with caller

Clients start an instance of a Deep Thought (DT) by POSTing to the thopname/result resource. The result of the POST does not contain a final result, but an Event Bus address to listen for messages from the DT as well as a start address. Clients then send a message to the start address to actually start the DT (it lies dormant to give the client a chance to register to the listen address).

Deep Thought POST

A DT can send a (partial) result to the client using the send(data) function. data is any JS object; it will be turned into JSON during transmission to the client.

A DT can register for messages sent by the client after the DT was started using the on(address, handler) function. Address is any string that the client uses to send messages to the DT. DT's handler will then be called with the message as well as a replyHandler. The DT can use the replyHandler to send a message back to the calling client. (Alternatively, it can use the send function.)

As part of default parameters for a DT, a timeout value can be specified after which a DT instance shuts down (and the client will be unable to send messages to the DT).

Use THOPs 2.0 for the following types of operations:

  • Complex queries that cannot be expressed in a single query
  • Business logic when writing apps based on SUIT and THOPS
  • Integrating SaffronMemoryBase APIs and foreign APIs
  • Using a stored procedure in a relational database
  • Deploying code at run time

THOPs Examples

THOPs 1.0

Below is an example of the THOPs 1.0 layer. Note that this is primarily used for SaffronMemoryBase 10.x and runs only synchronous Thought Processes:

THOPs 1.0 layer

Example 1: Hello World

File: HelloWorld.js

function run() {
     return 'Hello World!';
}

Line 1: A THOP must have at least one function. In THOPs 1.0, the start function must be named run() 

           (The function is runAsync(…) for THOPs 2.0.)

Line 2: Return result. This can be a string or an object (which will be turned into JSON)

To run:  

  curl –u username ‘http://localhost:8080/ws/rest/proc/HelloWorld/result?access_key=demo’

Result format:

  var data = {
        result: “Hello World”,
        debug: [{ … }],
        error: {…} or “”
    }

In order to turn the result into a JS object on the client side, run JSON.parse(data.result) !

Example 2: Hello Stranger

File: HelloStranger.js

function run(p) {
     debug (p);
     return 'Hello ' + p.name;
}

Line 1: The first argument is a map of all parameters of the query part of the URL used in calling the THOP.

Line 2: Thought Processes include a built-in debug(…) function. Debug info is delivered as part of the result.

Line 3: The optional parameter p contains a map of key-value pairs that are specified when calling the Thought Process.

Example 3: Connections Query

File: connections.js

function run(p) {
     var smb = new Smb('http://mhtn01.saffrontech.org:8080/ws', '******', '12345');
     var result = smb.connections('jira_all_saffron').q('project_key:suit').get();
     return result.r.map(attr).value();
 }

function attr(r) {
     return r.a.c + ':' + r.a.v;

}

Line 2: Create a connection to a SaffronMemoryBase cluster using an access_key (******) and secret (12345).

Line 3: Run a connections query on space jira_all_saffron with an AQL query project_key:suit. (The function q(...) is used to specify an AQL query.) The call to get() makes the actual call to the REST service. The resulting parsed JSON object is stored in the result variable.

Line 4: Use lo-dash to map the result to a list of category:values (see http://lodash.com/):

  • result.r contains an array of attributes which are connected with project_key:suit in our example.
  • Behind the scenes, the call to get() wraps this array as a lo-dash object and adds a map(...) function. Map calls the attr(...) function for each element in the array and builds a new array out of the return values.
  • attr(...) is a function that turns an object like the one above into an AQL-compatible string and then returns it.
  • The call to value() unwraps the object and returns the raw array.

Example 4: Connections Query with Built-Ins

File: connections-2.js

 function run(p) {
  var result = connections('jira_all_saffron').q('project_key:suit').get();
     return result.r.map(attr).value();
 }

 function attr(r) {
     return r.a.c + ':' + r.a.v;
 }

Instead of creating your own SaffronMemoryBase connection object (as in Example 3), you can use the built-in connections that then connects to your local SaffronMemoryBase instance.

  local.connections(…)  is connections(…)

THOPs 2.0

Below is an example of the THOPs 2.0 layer. This supports asynchronous Thought Processes:

THOPs 2.0 layer

Example 1: Asynchronous Connections

function runAsync(p, cb) {
      var smb = new Smb('http://mhtn01.saffrontech.org:8080/ws', 'demo', '12345');
      var result = smb.connections('jira_all_saffron').q('project_key:suit').then(
          function (result) {
              debug(result);
              cb(result.r.map(attr).value());
          }
       );
  }

function attr(r) {
     return r.a.c + ':' + r.a.v;
 }

Line 1: The first argument is a map of all parameters of the query part of the URL used in calling the THOP. In THOPs 2.0, the function is runAsync(…)

Line 2: Create a connection to a SaffronMemoryBase cluster using an access_key (******) and secret (12345).

Line 3: Run a connections query on space jira_all_saffron with an AQL query project_key:suit. (The function q(...) is used to specify an AQL query.) Instead of using GET when you create the connection, use then or async then.

Line 4: The result function is the callback (cb) handler. This function is called with the final result of the connection call at some point in the future.

Example 2: Long-Running Asynchronous Connections

 function runAsync(p, cb) {
     on('marco', function(data){
         send('polo');
     });
     on('quit', function(data){
         cb('thanks!');
     });
 }

Some connections can run forever. They can also register events from the caller.

Line 2: The on function is used to register for an event that is private to this connection.

Line 3: The send function is used to send a message to the calling client.

Access the Status Page for Thought Processes

$
0
0

To inquire about the status of a Thought Process (THOP) that is currently running, running the THOP via POST (or GET, although that is less useful) returns a location response header.

GET the URL in this header to access the status page for the THOP.

Note: These resources are only available with Thought Process Engine (TPE) 2.0.

GET

Returns status information about a THOP run as a JSON object.

THOPs run via POST are going through the following lifecycle: Initialized, running and finished. 

Each of those events correspond to fields in the result. The value is a timestamp of the event.

Depending on the state of the THOP run, fields like 'finished' might not be present.

If a result is available, a corresponding result field will have the usual THOP result object.

If the THOP uses the `send()` function to send a partial result to the caller, a `partials` array is available (see second example) with result objects that contain timestamps on when the partial result was sent.

Partial results are only retained in a 30-second time window. If polling is used to request the result of a THOP, it must happen at least every 30 seconds to not miss a partial result.

Example output

{"running" : "2016-12-22T01:56:48.718Z","result" : {"result" : "Hello World!","duration" : 1,"debug" : "[]","error" : null

  },

  "name" : "thop-test","initialized" : "2016-12-22T01:56:48.716Z","finished" : "2016-12-22T01:56:48.719Z","id" : "3b7dee66-2621-4d14-b30e-13750e37b0e9","status" : "finished"

}

Example with partials

{"running":"2016-11-08T08:10:50.597Z","result":{"result":"done","duration":4229,"debug":""
   },"name":"send-example","finished":"2016-11-08T08:10:54.835Z","id":"ed427f34-98e1-47ca-b852-d009c68547d7","partials":[
      {"result":"Hello","ts":"2016-11-08T08:10:50.604Z"
      },
      {"result":"World","ts":"2016-11-08T08:10:50.605Z"
      },
      {"result":[
            1,
            2,
            3,
            4
         ],"ts":"2016-11-08T08:10:50.605Z"
      },
      {"result":{"a":{"v":"london","c":"city"
            }
         },"ts":"2016-11-08T08:10:50.606Z"
      }
   ],"status":"finished"
}

 

Combating VR Sickness with User Experience Design

$
0
0

By Matt Rebong

Introduction

Virtual reality (VR) gaming has become the new frontier of game development, exciting every creative developer. It is a space where the possibilities for unique gaming mechanics are seemingly endless. However, as more game developers transition from traditional game development to VR, one issue is becoming a limiter to that creative freedom. Nausea, dubbed in the community as “VR sickness,” (VRS) is widely considered the biggest hurdle for VR game developers to date and for good reason: it is one of the largest barriers to entry for new adopters of the technology.

As indie developers, we at Well Told Entertainment* have identified user experience design as the number one priority to creating an enjoyable experience in VR. Whether the task is to hold a steady frame rate or ease a user into a world, our mission is to make sure that everyone—from the VR novice to the seasoned veteran—has a great time. However, there are no experts in this space. Having spent a year as a VR game design company, we have come to know a little more about what helps users of all kinds build literacy in VR, and we'd like to share some code, shader techniques, and design principles with the community as a whole to get more people strapped in and jamming on well-made VR experiences.

Well Told Entertainment* and Learning VR

Our first steps into VR started with the curiosity behind alternate movement schemes from teleportation. Other than strict room-scale experiences, VR users were, and are still, looking for unique and clever ways to escape room scale without having to fall over or feel like they’ve been at sea. In early 2016 we were excited for the new wave of incoming VR games, but like most early adopters of VR we were skeptical after hearing the reports of VRS. After reading on the forums that people were tackling this issue through teleportation, we were eager to try out games like The Lab*, Rec Room*, and Budget Cuts*. Though the games themselves are fantastic experiences in VR, and the teleportation solution effective against VRS, we still felt that something was off. What it boiled down to was that the act of teleportation in VR is immersion breaking, especially when unrelated to the story. Knowing that true immersion is one of VR’s greatest qualities, we were inspired to seek out what it would take for a dynamic solution to movement.

Like any curious individuals, we started our search by perusing the Reddit* forums. (r/vive, r/virtualreality, and r/oculus have plenty of inspired devs sharing their progress on indie development. In our opinion these are the best places to survey and look for creative solutions in the medium.) With movement schemes being the hottest topic in summer 2016, we noticed that a number of devs were sharing their opinions and demos online. I read about techniques such as climbing schemes that involved using your controllers to throw your character, arm pumping to propel your character forward, and even putting one of the controllers in your belt loop to act as a pedometer that moves your character forward as you bounce. From our own research we were able to draw conclusions on unique ways to approach the problem and begin forming our own hypotheses. After a month of random testing by one of our trusted developers, Vincent Wing, we began to make progress. Based on our research, below are some of the best practices we found to combat VRS without compromising dynamic locomotion and interactive gameplay.

Considering User Experience in Virtual Reality

In late June my cofounder, Sam Warner, and I attended a VR event in Playa Vista hosted by the Interaction Design Association. Though the event was dedicated to VR, we seemed to be the only people in attendance who weren’t UX designers by trade, which made networking a little awkward. However, we were happy to find value in the presentation given by Andrew Cochrane, the guest speaker and a director working in new media. Cochrane focused on sharing his insights on VR content creation, insights we used that night to solidify our design approach for VR.

Most developers approaching VR come from either a games or film background, mediums that largely revolve around storytelling. Cochrane believes that there are no storytellers in VR. Instead, VR is an experiential medium and everyone in this space is an experience designer. In games or films, the creator has the ability to manipulate the story for the user. They follow story-act structures and use editing and effects to direct a user’s attention to enable an emotional response. However, in VR, the goal is to give the audience a feeling of complete immersion to enable an emotional response. Thus it’s important to make sure the user has a strong sense of presence within the virtual space. The more immersive a developer can make the experience feel the better, which places a lot of responsibility on the developer, compared to developing in traditional mediums. In our experience regarding locomotion, the best practice we recommend to reduce immersion breaking and VR sickness is to always provide the player with a strong visual reference.

Spatial Orientation

To create a great feeling of presence in VR, the first step is to introduce the player to their surroundings. Though locomotion tends to get much of the blame, VRS can kick in as soon as the headset is put on, because in effect the player is putting on a blindfold. Covering one’s eyes has a big impact on balance. Try this: stand up and close your eyes. Though at first it seems like a simple task, most people find themselves starting to waver after only a short amount of time. Some may even start to fall over. The reason is because balance relies heavily on visual reference and depth perception. When the visual sense disappears, our bodies resort to proprioception for balance, which varies in ability from person to person. When someone enters a virtual world, it’s important that they have a good sense of spatial awareness before things can get magical.

In developing Vapor Riders ‘99*, a type of skiing/gliding racing game (as seen below in the link to the video), we learned that giving the user a good reference to the floor beneath them had a huge impact in combating VRS. When a player first enters the game, we made sure to give them a good visual of the track beneath their feet, the distance ahead of them, and their surroundings. However, based on the racing nature of the game, we had to take ground reference in Vapor Riders ‘99 a bit further. When we first tested the game in public at Virtual Reality Los Angeles 2016, some expressed discomfort relating to a user’s height to the track, particularly with taller players. In order to solve this issue we put in a simple height calibration system before play, which worked well. Below is the script we used with Unity* software that you can put into your own games.

 

The two functions we use to calculate and set player height and wingspan are UpdateTPoseInfo() and ScaleAvatar(). The first function checks the position of the head against the position of the ground and does the same between the two controllers. Before the distances are calculated, the player has to assume a T-pose. Otherwise, it’s difficult to get accurate information on the player’s size. 

ScaleAvatar() is used to size the actual player prefab based on the scale information collected with UpdateTPoseInfo(). We divide the default size of the player by the collected sizes and then set the local scale in X and Y of the prefab to the new Vector2. There is no need to scale the player in the z-axis in our game.

This system allows us to determine the difference between the player’s ‘default’ position and their current body position. Vapor Riders relies on head and hand position to determine velocity and movement direction, and this is a necessary step to achieve a consistent game-feel across people with different body sizes and types.

void UpdateTPoseInfo()
    {
        if (!leftWrist)
            leftWrist = manager.leftController.transform.Find("LControllerWrist");
        if(!rightWrist)
            rightWrist = manager.rightController.transform.Find("RControllerWrist");

        Manager.Instance.playerHeight = eyes.position.y - playspaceFloor.position.y;
        Manager.Instance.playerWingspan = Vector3.Distance(leftWrist.position, rightWrist.position);
    }

    void ScaleAvatar()
    {
        float avatarHeight = eyes.position.y - avatarFoot.position.y;
        float avatarWidth = Vector3.Distance(avatarLeftWrist.position, avatarRightWrist.position);
        Vector2 scaleFactor = new Vector2(Manager.Instance.playerHeight / avatarHeight, Manager.Instance.playerWingspan / avatarWidth);
        avatar.localScale = new Vector3(initAvatarScale.x * scaleFactor.x, initAvatarScale.y * scaleFactor.y, initAvatarScale.z);
    }

We don’t fully understand why height in our game made such an impact on player nausea. Ultimately we believe it relied on a few factors relating to vehicle-less locomotion and the lack of a heads-up display (HUD).

In October we began development on a short horror escape room, which became Escape Bloody Mary*. In the game you play a child trying to escape a demon while locked in a bathroom. One of our goals for the game’s presence was to make the player feel like a child without running into the same height issues we faced with Vapor Riders ‘99. Rather than shortening the player to make them feel small in the bathroom, we kept the height-to-floor ratio and instead scaled all the assets in the room based on the headset’s distance to the floor. This way, a player always has to look over set pieces, making them feel smaller without compromising their reference point to the ground.

Level Design

The space a player moves around in is as important as the locomotion scheme itself, especially in VR. When testing Vapor Riders ‘99 we learned how a player interacts with different parts of our test track. When designing our track we wanted to see what worked best without the track being boring. When the player starts we put them through a series of left and right turns along the x-axis, starting with gradual curves and then transitioning to sharper turns. The track then transitions to mostly up and down z-axis movements. User feedback indicated that players were more comfortable with the end of the track as opposed to the beginning. After some troubleshooting we began to figure out why. It wasn’t so much that users preferred up and down movements over left and right (though flying is tons of fun). The cause for discomfort was our inability to ease users into disruptive elements of the level. To avoid nausea during accelerated movements, the player must always have a good reference point of where they are headed and how quickly they will get there. In the beginning of our test track, we quickly put the players into gradually sharper turns, which turned their attention away from the road ahead of them. Without being able to see what was ahead they were caught off guard and unable to adapt their sense of balance to the oncoming track—something to be aware of with fast gameplay.

Effective Common Movement Schemes

Aside from teleportation, there are a few different movement schemes that address VR locomotion and nausea. One of our favorites is dashing. Dashing is similar to teleportation except that instead of cutting from one spot to the other, the player quickly dashes forward to the designated target. This type of movement is executed well in the game Raw Data*. What’s great about dashing versus teleportation is that the user never loses their reference point, which helps make them feel like they do not need to reorient themselves to their new location, something we really enjoy.

For faster movement, such as in Vapor Riders ‘99, using vehicles as transportation devices seems to do the trick well for some of our favorite games. Our goal for Vapor Riders ‘99 was to achieve fast-paced “vehicle-less” movement; however, the addition of a vehicle has its benefits to VR game design. First and foremost, it gives the player a visual reference point for balance. Though the world is moving around them, players can always reorient themselves to the vehicle they are in, much like driving a car. Games such as Hover Junkers* pull this off really well, especially since the vehicle in Hover Junkers has a large platform with good visibility.

Optimization

It was established fairly early that a stable 60fps/120hz is the required baseline for non-nauseating VR content. Anything less and the user suffers a noticeable stutter in frame-to-head position offset. In our quest to complete Escape Bloody Mary, we quickly found that even room-scale games have a fairly hard cap on certain engine features used throughout the scene.

From the beginning of the project, we knew we wanted to protect a few tech-art-related goals from the violent swings of the feature-creep-axe. We wanted to have a large mirror spanning most of one of the walls that would reflect all the lights, characters, props, and static objects. We had to be able to use lights whenever and wherever we wanted in order to manage atmosphere and tension. We wanted to give Bloody Mary 4 pieces of cloth to make her feel submerged and floaty. We also needed to have the moment when she comes out of the mirror be as close to one-to-one as possible to preserve immersion, which meant having two copies of her in the scene at all times.

In order to get all of this running on our machines as smoothly as possible we had to set up a few small systems. For the light management, we created a rough check-in, check-out system where we could move lights around the scene and change their settings as needed. We found that toggling lights on and off gave us larger performance hits thanks to their being duplicated by our mirror script, so we had to shuffle around half as many lights as normally would be renderable in our bathroom scene.

We ended up lowering the resolution of many of the more distant textures as well so that the mirror could more easily handle the entire scene duplication. When we paired this with the light management system we were able to get a stable environment that let us conjure lightning, light four candles, toggle a dynamic flashlight, fire a handgun, and still have a couple of room atmosphere lights to manage the presence of the game.

Between our two Bloody Marys, only four active cloth simulations were ever running. When Mary started to move through the mirror, we began turning off the old Mary's simulations and activating the cloth on Mary coming out of the mirror. Their animations were synced by finite state machines so that they would always be doing the same movement. These two systems let us teleport and spawn her whenever we wanted with a fairly stable frame rate no matter where she was.

Sound Brings Everything Together

The finishing touch to any game is sound, which brings all the elements to life. In film everything on screen that moves needs its own sound effect, no matter how subtle. Games take this a step further in that players have sound effects to accompany their movements, such as footsteps or breathing. But for unknown reasons, this principle is often ignored in VR game development.

Adaptive soundtracks, sound effects triggered by user input which match the rhythm of the soundtrack, are an effective way to include sound in movement where it is seemingly lacking. Thumper* is a great example. One of our long-term goals is to add an adaptive soundtrack to Vapor Riders ’99, because it will give users an audio reference of how much they are affecting directional movement based on the position of their controllers.

Summary Checklist

  • Make decisions based on creating a better immersive experience for the user.
  • Make sure the player starts the experience with a good point of reference to the floor.
  • Ease users into the experience
  • When designing levels, make sure players know what’s coming if they need to make quick decisions, especially when regarding movement.
  • Vehicles can help provide reference points in fast-paced games.
  • Optimize whenever possible to avoid frame lag.
  • Give your movement and interactions sound effects. Consider adaptive soundtracks when applicable.

Now Throw Our Advice Out the Window

Okay maybe not . . . But we want to encourage game developers to think outside the box and find the locomotion scheme that best fits the style of their game. Vapor Riders ‘99 was founded as a byproduct of random trial and error. We started testing movement schemes, and when we stumbled on something fun and non-nauseating we created a game around it. That’s the magic of VR and the pioneering into the unknown that excites us to become better developers in this space. More creative solutions and tools will come only by taking risks, and we can’t wait to see what developers come up with in these early years of VR!

References

Intel Developer Mesh – https://devmesh.intel.com/
Sam - https://devmesh.intel.com/users/14426
Vapor Riders ‘99 - https://devmesh.intel.com/projects/vapor-riders-99
https://motherboard.vice.com/read/how-vapor-riders-99-is-helping-cure-motion-sickness-in-vr-gaming
Escape Bloody Mary - http://store.steampowered.com/app/544530
The Lab - http://store.steampowered.com/app/450390/
Budget Cuts - http://store.steampowered.com/app/400940/
Thumper - http://store.steampowered.com/app/356400/
Rec Room - http://store.steampowered.com/app/471710/
Raw Data - http://store.steampowered.com/app/436320/
Hover Junkers - http://store.steampowered.com/app/380220/
Reddit - https://www.reddit.com/r/Vive/
https://www.reddit.com/r/virtualreality/
https://www.reddit.com/r/oculus/
IxDA event - http://www.andrew-cochrane.com/biography/
https://www.meetup.com/UXPALA/events/231750013/
http://ixda.org/
http://www.virtualrealityla.com/

Build a Product Recommender

$
0
0

Use these instructions to build a Product Recommender application that uses Saffron Technology™ APIs and thought processes (THOPs). 

Note: The Product Recommender is also referred to as SaffronOne.

Overview

The Saffron Technology Natural Intelligence Platform offers a personalization solution that enables multi-channel and multi-product businesses to anticipate individual consumer behavior and thus market to that consumer the right products or services at the right time and in the right channel.

Note: The term consumer can refer to a customer or member or group. 

Use Case

Predict or recommend a product for a consumer based on both current and previous purchases of other consumers with similar consumer characteristics.

Workflow

Individual data, including demographics and past product purchases, is collected and stored for all consumers. A consumer can represent one person or a group of people such as a group of members on an insurance plan. Product purchase information for all consumers is also collected and stored. In order to recommend a product for an inquiring consumer, the current situation of that consumer is retrieved. The situation for a consumer is the current snapshot of all characteristics that defines that consumer such as purchased products and demographics. The consumer's situation is then compared with other consumers who have similar situations. Purchases made by these similar consumers are recommended to the inquiring consumer, and in ranked order. 

Product Recommender Workflow

How it Works

1. Create two Spaces for consumer and product data

Create a consumer_space Space and a product_space Space. These Spaces are used for the following purposes:

ETL configuration
The Spaces store the ETL configuration for your specified data and stores the settings for your data ingestion.

Location of consumer and product data sources
consumer_space
This Space holds records of individual consumer data called situations. Data that makes up a situation can include such items as personal and financial information, and purchase history. This information is used to look for consumers who have similar situations and view their purchase histories. Note that a consumer can represent one person or a group of people such as a group of members on an insurance plan.
product_space
This Space includes product purchase information such as the name of a product, purchase dates, and other associated information. This information is used to create product.
memories
Used to make recommendations for consumers.

The locations of your data Sources (consumer and prduct) are stored in the respective Space along with definitions for parsing and mapping the data. Once all configurations are complete, both Spaces are published and the data can be ingested. The knowledge discovered in your data is stored in the SaffronMemoryBase™. This information is used to make product recommendations for consumers, as described in the following steps.

2. Find consumers

Saffron makes a connections call to return a list of consumers from the consumer_spaceSpace. These are customers for whom you wish to make product recommendations:

Input:
//url/consumer_space/connections/c=consumer

Output:
{
   c:consumer
   v: 12380

   c:consumer
   v: 76498
}

3. Get the current "situation" for the consumer

Saffron makes a connections call to query each consumer or a specific consumer (q=consumer:ID) from the consumer_space Space. It outputs the current attributes for this consumer (or consumers in the Space) at this particular moment in time. In the example below, consumer 12380 is queried:

Input:
For each consumer
//url/consumer_space/connections/q=consumer:12380

Output:
{
   c:consumer
   v: 12380

   c:name
   v: John_Smith

   c:age
   v: 30

   c:Destination
   v: Italy

   c:Airline
   v: Delta
}

4. Build an AQL

Saffron builds an AQL from the current attributes information for that consumer:

q=("consumer":"12380", "name":"John_Smith", "age":"30", "Destination":"Italy", "Airline":"Delta")

5. Get a list of recommended products for the consumer

Saffron performs a new connections call from the product_space Space to find products that consumers with similar attributes have purchased. This call returns a list of the closest recommended products as well as the metric score, which shows the strength of the product recommendation.

Note that a metric score of 1.0 does not mean that the recommended product (Gold, in our example below) is an exact match. It means that this product recommendation is the closest match to the consumer's situation.

Input:
For each consumer
//url/product_space/connections/q=aql & c=product

Output:
{
   c:product
   v: Gold
   m: 1

   c:product
   v: Platinum
   m: 0.99

   c:product
   v: Silver
   m: 0.94

} 

6. (Optional) Filter current products or other categories

Saffron can make a connections call from the product_spaceSpace to filter out products that a consumer already has to avoid including it as a recommendation. Categories that might not apply to a specific group of consumers can also be filtered out: 

Input:
//url/product_space/connections/f=current_products

Output:
{
   c:product
   v: Silver
   m: 0.94

}

7. View the CSV report that lists recommended products for the consumer

Consumer, Product, Rank, Evidence
12380, Gold, 1, age: 30 (0.8), Destination: Italy (0.2)
12380, Platinum, 0.99, age: 30 (0.4), Destination: Italy (0.4),  Airline: Delta (0.2)

Thought Processes

Use (or customize) the Product Recommender THOP to perform the steps outlined above.

function run(p,log) {
// Member Space

    var memberSpace = 'trip_insurance_consumer'
// Product Space

    var productSpace = 'trip_insurance_product'
// Member Category

    var memberCategory = 'enrlid'
// Product Category

    var productCategory = 'productname';

// Other variables

    var page = 1;
    var currentstatepageSize = 25;
    var numberOfMembers = 2;
    var results = [];

// Get list of members from Member Space

    var members = getMembers(memberSpace, memberCategory, numberOfMembers);

// For each member in the Member Space, get their current situation vector and
// Recommend products with evidence

    for(var i=0; i < members.length; i++) {
       // Get current state vector

        var currentStateAttrs = getCurrentState(memberSpace, memberCategory,
            members[i], page, currentstatepageSize)
        // Build AQL
        var aql = buildCurrentMemberStateQuery(currentStateAttrs);


        // Get list of recommended products
        var recommendedProducts = getProductRecommendations(productSpace, aql,
            memberCategory, productCategory);

       // For each product recommended, output evidence for the recommendation

        for(var k=0; k < recommendedProducts.length; k++) {

          var result = ({'Member':members[i],'Product':recommendedProducts[k].product,'RScore':recommendedProducts[k].rollup_score,'Score':recommendedProducts[k].score,'Evidence':recommendedProducts[k].evidence})

            results.push(result);

        }
    }

// Return results in JSON format
    return JSON.stringify(results);
}
// Function to get list of members
function getMembers(space, memberCategory, numberOfMembers) {
    var members = [];
    var membersUrl = new Smb('localhost:8080','demo','12345').connections(space).c(memberCategory).p(1).ps(numberOfMembers)
    membersUrl.get().r.forEach(function (r) {
        members.push(r.a.v);
    });
    return members;
}

// Function to get current state vector
function getCurrentState(space, memberCategory, member, page, pageSize) {
    var currentStateAttrs = [{category: memberCategory, value: member}];
   var currentStateUrl = new Smb('localhost:8080','demo','12345')
                            .connections(space).q(cat(memberCategory, member))
                            .p(page).ps(pageSize)
    currentStateUrl.get().r.forEach(function (r) {
        currentStateAttrs.push({category: r.a.c, value: r.a.v});
    });
    return currentStateAttrs;
}

// function to build AQL
function buildCurrentMemberStateQuery(currentStateAttrs) {
    var aql = '(';
    for(var i=0; i < currentStateAttrs.length; i++) {
    aql += cat(currentStateAttrs[i].category, currentStateAttrs[i].value) + ''
    }
    aql += ')';
    return aql;

}

// function to get list of recommended products
function getProductRecommendations(space, q, memberCategory, productCategory) {
// Filter these categories from the evidence
    var filter_categories = ['age', 'count_of_sessions','days_since_last_session', 'fname', 'lname','miscproviders', 'session_duration', 'source','totalcost', 'totaldays'];
    var recommendedProducts = [];
     // This below line is if we want to use Cognitive Distance and auto-target
    // "separate" (me=as), meaning that association counts are voted between
    // multiple memories
    //var recommendedProductsUrl = new Smb('localhost:8080','demo','12345')
    //.connections(space).q(q).category(productCategory).me('as').ml(2).sm(2)

    // For this use case we use Frequency and Directory Memory
    var recommendedProductsUrl = new Smb('localhost:8080','demo','12345')
                                        .connections(space).q(q)
                                        .category(productCategory).ml(2).ps("5")
    recommendedProductsUrl.get().r.forEach(function (r) {

        var expAttrs = [];
            r.exp.forEach(function (exp) {
                if(filter_categories.indexOf(exp.a.c) == -1){
                expAttrs.push(exp.a.c + ":" + exp.a.v + " S:" + exp.am[0].s);
                }

            });
        recommendedProducts.push({product: r.a.v, rollup_score: r.m,
        score: r.am[0].s, evidence: expAttrs.join(" | ")});
     });

    return recommendedProducts;
}

// Other Essential Functions
// function to prepare results
function prepareResult(r) {
    var o = {};
    o[r.a.c] = r.a.v;
    o.score = r.m;
    return o;
}

// Concatenate function
function cat(c,v) {
    return v ? '"' + c + '":"' + v + '"' : '';
}

 

Creating Immersive Virtual Worlds Within Reach of Current-Generation CPUs

$
0
0

By Justin Link
Chronosapien

For a little over three years I’ve had the opportunity to run a studio called Chronosapien, which specializes in creating interactive content with emerging technology components. We work with a lot of different tech, but VR has captured us the most, so much so that we’ve started working on our own project for it, called Shapesong*.

Working on Shapesong has been unique from a number of perspectives. One of the most interesting aspects though has been learning how to maximize performance on systems while constantly adapting to an evolving and maturing medium. This article is a sort of mashup of our learnings in what makes a believable and compelling VR environment, along with what it takes to enable that from a hardware perspective, specifically focusing on the CPU.

New Expectations

People are just beginning to have their initial VR experiences with 2016’s wave of first-generation devices. Because VR is really a souped-up natural user interface, people approach it differently than they do traditional media devices. They expect the content and experiences inside VR to behave naturally. For example, put a VR device on anyone who has never tried it before and instead of first asking, “What button do I press?” they ask, “Where are my hands?” Put them inside a virtual environment and instead of asking what they should do they immediately start touching things, picking things up, throwing them, and other interactions that you might not expect from someone who is using a computer program.

When expectations fall short, the suspension of disbelief is broken and the illusion of VR disappears. No longer is the user inside a virtual world, but rather looking through lenses at a digital facsimile with undisguised design elements and scripted scenarios.

There are many use cases for VR that don’t involve constructing virtual environments. However, if the goal of a VR application is to immerse and transport, we as developers and designers must create living, breathing worlds that respond to users just like our world does. That means creating environments that can shift and bend, objects that can be grabbed and thrown, and tools that can shape and change.

This is what the next generation of interactive experiences are, living and breathing virtual worlds. Users naturally expect that they can interact with them the way they do with our world, but what they don’t see are all the calculations behind that immersion and interactivity. Developers have the job of bringing these worlds to life with existing tools and technology, but they can only do so much. At some point they need to leverage hardware with greater performance capabilities to enable these experiences.

This is a challenge facing myself and my team. When working on our own VR experience, Shapesong, we learned what we need to create in order to immerse, and we know what it takes to enable it. However, the breadth of interactivity and immersion is so great and computing resources so limited on traditional systems that we’re forced to pick and choose which areas we breathe life into, or to get creative in how we do that. It feels a lot like trying to squeeze a mountain through a straw.

In this article, I want to talk about some of the ways that Shapesong eats up CPU performance, how that impacts users, and how more-powerful CPUs enable us to scale our immersion. My goal is to help others better understand the benefits that high-end VR systems can have in enabling these immersive virtual experiences.

What is Shapesong?

First, let me give some context about Shapesong. Shapesong is our solution for a next-generation interactive experience for music. Users can explore musical environments, discover sounds that they can use in virtual instruments, create songs inside locations that dance and play along with them, and play music with clones of themselves or with others. I like to describe it simply as an experience where Fantasia meets Willy Wonka and the Chocolate Factory in a shared virtual world. Here is a video of our proof-of-concept demo:

Shapesong’s Teaser Video.

Our goal with Shapesong is to create an entire world that can be played musically, and to give users tools that let them make something with the environment and instruments that they find. We’re also trying to create a kind of synesthetic experience that melts visual and musical performances together, so that both performers and spectators can be completely immersed.

There are many aspects of the experience that we need to design for and control in real time, and this is where the capability of the system running Shapesong becomes so critical.

Enabling a Musical World

VR imposes a strict 90 frames per second rendering time, or about 11 milliseconds per frame. In comparison, traditional experiences render at 30 frames per second, and even then dipping below that number in certain areas isn’t a deal breaker, the way it is for VR. VR actually requires that you render two versions of the scene, one for each eye. That means the rendering load for VR is twice that of flat media devices. There are some exceptions to these rules, and techniques that help to bend them, but the bottom line is this—the requirements for computing in VR are much more strict, and also much more expensive.

With Shapesong, we have some unique features that require even more power from VR systems. From a technical perspective, Shapesong is a cross between a digital audio workstation (DAW) and a video game inside a virtual environment. All three love to eat cycles on a CPU. Let’s look at some of the areas in Shapesong that really rely on CPU horsepower.

Audio Processing

It’s probably no surprise that a music game like Shapesong does a lot of audio processing. In addition to the baseline rendering of ambient, player, and UI sounds, we also have the sounds associated with instruments being played at various times. In fact, the audio processing load for these instruments is 20 times greater on average than the baseline for the experience, and that’s when only a single instrument is played.


Figure 1:Playing the keyboard in Shapesong.

This is the way that instruments work behind the scenes. To play a single sound, or note, on an instrument requires playing an audio clip of that note. For some perspective, a full-size piano has 88 different keys, or unique notes, that can be played at a given time. Playing a similar virtual instrument inside Shapesong could have up to 88 unique audio clips playing at once. However, this assumes each note only has a single active clip, or voice, playing at the same time, which isn’t always true in Shapesong.

There is a way around this clip-based approach to instruments—sound synthesis. However, sound synthesis isn’t a replacement for samples, and it comes with its own unique processing overhead. We want Shapesong to have both methods to allow for the greatest flexibility in music playing.

Environmental Effects

As I said, one of the things we’re trying to do with the music experience in Shapesong is to melt visual and musical performances together. Music that’s played needs to be in lockstep with visuals in the environment.

Most people tend to think that any graphics rendered in a game or experience are handled by the graphics card, or GPU. In fact, the CPU plays a large role in the graphics rendering pipeline by performing draw calls. Draw calls are essentially the CPU identifying a graphics job and passing it along to the GPU. In general, they happen each time there is something unique to be drawn to the screen.

In Unity*, the Shapesong engine, draw calls are optimized in a process called batching. Batching takes similar draw calls and groups them into a single call to be sent to the GPU, thus saving computation time. However, calls can only be batched in Unity under specific conditions, one of which is that the objects all share the same material. Another condition is that the batched objects must all be stationary and not change position or animate in any way. This works great for static environments where there are, say, 200 trees sharing the same material. However, it doesn’t work when you want each of these trees to respond uniquely to player input or musical performance.

This is a huge challenge in creating a living, breathing, virtual world, regardless of whether that world needs to respond to music. How can you make a place come to life if the things inside it cannot move or change in any way? The reality, which has always been the case for games, is that you have to be selective with what you bring to life, and creative in how you do it. As I said earlier, the difference between traditional experiences and next-generation ones is the user’s expectation.

Physics Objects

Bringing a virtual world to life isn’t only about making things in it animated. You also need to imbue it with the laws of physics that we’re used to in our world. In fact, you could even make the argument that current-generation VR systems are not true VR systems, but augmented virtuality (see the Mixed Reality Spectrum) with a virtual world overlaid. Why? Because even though when we’re in VR we’re seeing a virtual environment, we’re still standing in a physical one, with all of the laws of nature that govern it. The point is that if you want to create a seamless, natural experience without any illusion-breaking tells, you probably want to match the physics of virtual reality with physical reality.


Figure 2: Throwing a sound cartridge.

In Shapesong, we want to create a natural experience of exploring environments musically by playing with the things inside of it. For example, we want users to be able pick up a rock and skip it across a pond to play musical tones as it crosses; or to drop a ball and listen to the sound it makes change in pitch as it falls. The idea is to encourage musical exploration in a way that isn’t intimidating for non-musicians.

While physics in a game engine isn’t incredibly difficult to enable, it is rather expensive and taxing on the CPU. Aside from calculating new positions for objects that are bound by physics in every frame, the physics system also has to check for collisions between those objects and the rest of the environment. The cost of this scales with the number of physics-enabled objects and objects those things can collide with.

Performance Recording

Part of what makes Shapesong unique is the way that users can record themselves. We wanted to approach performance recording in a way that takes advantage of the capabilities of VR and the systems that drive it. Traditionally, when you record music in something like a DAW, only the notes you play are captured, not the motion of your hand as it sweeps the keys, or the bobbing of your head as you lock into a groove. But music isn’t only about the notes that you play. It’s very much about the way that you play it.


Figure 3: Playing alongside a clone of yourself.

Our approach is to record all of a user’s input and to bake it into an animation that can be played back through a virtual avatar. Essentially, what users do when they record a performance is clone themselves doing something over a period of time. On an instrument, that means cloning yourself playing a piece of music. Elsewhere, it could mean interacting with the environment, dancing, or just saying hello.

While recording an individual performance isn’t an incredibly taxing operation, playing back a performance can be, especially as the size of the performance scales. For example, in a given song section, there may be four or five instruments being played at once: rhythm, bass, melody, strings, and some texture. There is also likely some visual performance involved like dancing, drawing glowing trails, or triggering things in the environment. So, at any time in a typical performance a user will likely have around 10 or more recordings playing. Each of these characters contain three objects that have their positions recorded: the left hand, right hand, and head. We also keep track of what objects characters are holding and the states of those objects. In total there are a hundred or more objects or properties being played back for a typical performance, and all of the processing for them happens every frame.

What can Greater Hardware Performance Enable?

It’s clear that VR imposes some strict performance requirements. It’s also understandable that simulating immersive environments and enabling abilities in them can be expensive. But what does this mean? How do these CPU performance requirements affect the end experience for VR?

One of the main aspects that is generally scaled with processing power is the size of virtual environments. If you look at the experiences that have been released, almost all exist inside small rooms or bubbles, with limited interactivity. Tilt Brush*, for example, limits the size of the environment canvas and has only recently allowed users to move outside of their room-scale space. The Lab* is built inside of a, well, lab, which is really only a few room-scale spaces long. Even seemingly more open environments like those from Lucky’s Tale* are shrunken down compared to their modern platformer counterparts like Super Mario Galaxy*. With greater performance from CPUs we could see these environments grow, creating more seamless and varied worlds to explore.

Another effect processing power can have on the VR experience is limited interactivity. The majority of experiences released focus on a single type of interactivity, and then scale that. Job Simulator* for example is a physics sandbox that lets users pick objects up, throw them around, or use them together in unique and interesting ways. Raw Data* is one of many wave shooters that spawns hordes of enemies for users to shoot at. Audio Shield* dynamically generates spheres that players block with shields in sync with a song’s beat. Even though these games are great and are tons of fun to play, the depth of the experiences are relatively thin, and as a result don’t really have the stickiness that other popular, non-VR games have. Therefore, greater processing power can help to enable more breadth and depth in an experience’s interactivity by putting less stress on the hardware with each interactive system. Arizona Sunshine* is an example of a game that enables lots of physics objects and zombies in the environment when using high-performing CPUs on top of their already existing wave shooter experience.

These kinds of effects are exactly what we’re experiencing with Shapesong. As we enable more features, we must pull in the edges of our environment. As we add more characters, we must limit the total number of active audio voices. When we enable more visual effects with music, we must lower graphic fidelity elsewhere. Again, these compromises are not unique to VR—they have always existed for any game or experience. The differences are the expectation of reality, which for us as humans has always brimmed with nuance and detail, and the requirements of VR systems, which are at least twice as demanding as traditional systems. Having more performant CPUs in VR systems can help us step closer to the goal of creating these immersive and truly transparent virtual worlds.

Looking Forward

Right now with VR we’re at the edge of a new paradigm shift in media. We’re driving directly into our imaginations instead of just watching from the outside, and interacting with worlds and characters as if we were there. It’s a new type of destination, and as we begin to define what these next-generation virtual experiences are, we need to also reconsider the technology that gets us there. Astronauts didn’t fly to the moon in a gas-chugging beater, and we’re not diving into VR with our web-surfing PCs.

But enabling VR experiences isn’t only about having the latest computing tech. It’s also about creating designs that mimic reality enough to immerse, while ironically breaking reality enough to escape. For us developing Shapesong, that means creating environments with familiar yet new laws of physics, and instruments with intuitive yet unique methods of interaction. Of course, each new experience in VR will have its own style and ways of pushing limitations; however, they will all have to leverage technology to do it.


The Ultimate Question of Programming, Refactoring, and Everything

$
0
0

Yes, you've guessed correctly - the answer is "42". In this article you will find 42 recommendations about coding in C++ that can help a programmer avoid a lot of errors, save time and effort. The author is Andrey Karpov - technical director of "Program Verification Systems", a team of developers, working on PVS-Studio static code analyzer. Having checked a large number of open source projects, we have seen a large variety of ways to shoot yourself in the foot; there is definitely much to share with the readers. Every recommendation is given with a practical example, which proves the currentness of this question. These tips are intended for C/C++ programmers, but usually they are universal, and may be of interest for developers using other languages.

Preface

About the author. My name is Andrey Karpov. The scope of my interests ? the C/C++ language and the promotion of code analysis methodology. I have been Microsoft MVP in Visual C++ for 5 years. The main aim of my articles and work in general - is to make the code of programs safer and more secure. I'll be really glad if these recommendations help you write better code, and avoid typical errors. Those who write code standards for companies may also find some helpful information here.

A little bit of history. Not so long ago I created a resource, where I shared useful tips and tricks about programming in C++. But this resource didn't get the expected number of subscribers, so I don't see the point in giving a link to it here. It will be on the web for some time, but eventually, it will be deleted. Still, these tips are worth keeping. That's why I've updated them, added several more and combined them in a single text. Enjoy reading!

1. Don't do the compiler's job

Consider the code fragment, taken from MySQL project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V525 The code containing the collection of similar blocks. Check items '0', '1', '2', '3', '4', '1', '6' in lines 680, 682, 684, 689, 691, 693, 695.

static int rr_cmp(uchar *a,uchar *b)
{
  if (a[0] != b[0])
    return (int) a[0] - (int) b[0];
  if (a[1] != b[1])
    return (int) a[1] - (int) b[1];
  if (a[2] != b[2])
    return (int) a[2] - (int) b[2];
  if (a[3] != b[3])
    return (int) a[3] - (int) b[3];
  if (a[4] != b[4])
    return (int) a[4] - (int) b[4];
  if (a[5] != b[5])
    return (int) a[1] - (int) b[5];     <<<<====
  if (a[6] != b[6])
    return (int) a[6] - (int) b[6];
  return (int) a[7] - (int) b[7];
}

Explanation

This is a classic error, related to copying fragments of code (Copy-Paste). Apparently, the programmer copied the block of code "if (a[1] != b[1]) return (int) a[1] - (int) b[1];". Then he started changing the indices and forgot to replace "1" with "5". This resulted in the comparison function occasionally returning an incorrect value; this issue is going to be difficult to notice. And it's really hard to detect since all the tests had not revealed it before we scanned MySQL with PVS-Studio.

Correct code

if (a[5] != b[5])
  return (int) a[5] - (int) b[5];

Recommendation

Although the code is neat and easy-to-read, it didn't prevent the developers from overlooking the error. You can't stay focused when reading code like this because all you see is just similar looking blocks, and it's hard to concentrate the whole time.

These similar blocks are most likely a result of the programmer's desire to optimize the code as much as possible. He "unrolled the loop" manually. I don't think it was a good idea in this case.

Firstly, I doubt that the programmer has really achieved anything with it. Modern compilers are pretty smart, and are very good at automatic loop unrolling if it can help improve program performance.

Secondly, the bug appeared in the code because of this attempt to optimize the code. If you write a simpler loop, there will be less chance of making a mistake.

I'd recommend rewriting this function in the following way:

static int rr_cmp(uchar *a,uchar *b)
{
  for (size_t i = 0; i < 7; ++i)
  {
    if (a[i] != b[i])
      return a[i] - b[i];
  }
  return a[7] - b[7];
}

Advantages:

  • The function is easier to read and comprehend.
  • You are much less likely to make a mistake writing it.

I am quite sure that this function will work no slower than its longer version.

So, my advice would be - write simple and understandable code. As a rule, simple code is usually correct code. Don't try to do the compiler's job - unroll loops, for example. The compiler will most definitely do it well without your help. Doing such fine manual optimization work would only make sense in some particularly critical code fragments, and only after the profiler has already estimated those fragments as problematic (slow).

2. Larger than 0 does not mean 1

The following code fragment is taken from CoreCLR project. The code has an error that PVS-Studio analyzer diagnoses in the following way: V698 Expression 'memcmp(....) == -1' is incorrect. This function can return not only the value '-1', but any negative value. Consider using 'memcmp(....) < 0' instead.

bool operator( )(const GUID& _Key1, const GUID& _Key2) const
  { return memcmp(&_Key1, &_Key2, sizeof(GUID)) == -1; }

Explanation

Let's have a look at the description of memcmp() function:

int memcmp ( const void * ptr1, const void * ptr2, size_t num );

Compares the first num bytes of the block of memory pointed by ptr1 to the first num bytes pointed by ptr2, returning zero if they all match, or a value different from zero representing which is greater, if they do not.

Return value:

  • < 0 - the first byte that does not match in both memory blocks has a lower value in ptr1 than in ptr2 (if evaluated as unsigned char values).
  • == 0 - the contents of both memory blocks are equal.
  • > 0 - the first byte that does not match in both memory blocks has a greater value in ptr1 than in ptr2 (if evaluated as unsigned char values).

Note that if blocks aren't the same, then the function returns values greater than or less than zero. Greater or less. This is important! You cannot compare the results of such functions as memcmp(), strcmp(), strncmp(), and so on with the constants 1 and -1.

Interestingly, the wrong code, where the result is compared with the 1/ -1 can work as the programmer expects for many years. But this is sheer luck, nothing more. The behavior of the function can unexpectedly change. For example, you may change the compiler, or the developers will optimize memcmp() in a new way, so your code will cease working.

Correct code

bool operator( )(const GUID& _Key1, const GUID& _Key2) const
  { return memcmp(&_Key1, &_Key2, sizeof(GUID)) < 0; }

Recommendation

Don't rely on the way the function works now. If the documentation says that a function can return values less than or greater than 0, it does mean it. It means that the function can return -10, 2, or 1024. The fact that you always see it return -1, 0, or 1 doesn't prove anything.

By the way, the fact that the function can return such numbers as 1024, indicates, that the result of memcmp() execution cannot be stored in the variable of char type. This is one more wide-spread error, whose consequences can be really serious. Such a mistake was the root of a serious vulnerability in MySQL/MariaDB in versions earlier than 5.1.61, 5.2.11, 5.3.5, 5.5.22. The thing is, that when a user connects to MySQL/MariaDB, the code evaluates a token (SHA from the password and hash) that is then compared with the expected value of memcmp() function. But on some platforms the return value can go beyond the range [-128..127] As a result, in 1 out of 256 cases the procedure of comparing hash with an expected value always returns true, regardless of the hash. Therefore, a simple command on bash gives a hacker root access to the volatile MySQL server, even if the person doesn't know the password. The reason for this was the following code in the file 'sql/password.c':

typedef char my_bool;
...
my_bool check(...) {
  return memcmp(...);
}

A more detailed description of this issue can be found here: Security vulnerability in MySQL/MariaDB.

3. Copy once, check twice

The fragment is taken from Audacity project. The error is detected by the following PVS-Studio diagnostic: V501 There are identical sub-expressions to the left and to the right of the '-' operator.

sampleCount VoiceKey::OnBackward (....) {
  ...
  int atrend = sgn(buffer[samplesleft - 2]-
                   buffer[samplesleft - 1]);
  int ztrend = sgn(buffer[samplesleft - WindowSizeInt-2]-
                   buffer[samplesleft - WindowSizeInt-2]);
  ...
}

Explanation

The "buffer[samplesleft - WindowSizeInt-2]" expression is subtracted from itself. This error appeared because of copying a code fragment (Copy-Paste): the programmer copied a code string but forgot to replace 2 with 1.

This a really banal error, but still it is a mistake. Errors like this are a harsh reality for programmers, and that's why there will speak about them several times here. I am declaring war on them.

Correct code

int ztrend = sgn(buffer[samplesleft - WindowSizeInt-2]-
                 buffer[samplesleft - WindowSizeInt-1]);

Recommendation

Be very careful when duplicating code fragments.

It wouldn't make sense to recommend rejecting the copy-paste method altogether. It's too convenient, and too useful to get rid of such an editor functionality.

Instead, just be careful, and don't hurry - forewarned is forearmed.

Remember that copying code may cause many errors. Here, take a look at some examples of bugs detected with the V501 diagnostic. Half of these errors are caused by using Copy-Paste.

If you copy the code and then edit it - check what you've got! Don't be lazy!

We'll talk more about Copy-Paste later. The problem actually goes deeper than it may seem, and I won't let you forget about it.

4. Beware of the ?: operator and enclose it in parentheses

Fragment taken from the Haiku project (inheritor of BeOS). The error is detected by the following PVS-Studio diagnostic: V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '-' operator.

bool IsVisible(bool ancestorsVisible) const
{
  int16 showLevel = BView::Private(view).ShowLevel();
  return (showLevel - (ancestorsVisible) ? 0 : 1) <= 0;
}

Explanation

Let's check the C/C++ operation precedence. The ternary operator ?: has a very low precedence, lower than that of operations /, +, <, etc; it is also lower than the precedence of the minus operator. As a result, the program doesn't work in the way the programmer expected.

The programmer thinks that the operations will execute in the following order:

(showLevel - (ancestorsVisible ? 0 : 1) ) <= 0

But it will actually be like this:

((showLevel - ancestorsVisible) ? 0 : 1) <= 0

The error is made in very simple code. This illustrates how hazardous the ?: operator is. It's very easy to make a mistake when using it; the ternary operator in more complex conditions is pure damage to the code. It's not only that you are very likely to make and miss a mistake; such expressions are also very difficult to read.

Really, beware of the ?: operator. I've seen a lot of bugs where this operator was used.

Correct code

return showLevel - (ancestorsVisible ? 0 : 1) <= 0;

Recommendation

In previous articles, we've already discussed the problem of a ternary operator, but since then I've become even more paranoid. The example given above shows how easy it is to make an error, even in a short and simple expression, that's why I'll modify my previous tips.

I don't suggest rejecting the ?: operator completely. It may be useful, and even necessary sometimes. Nevertheless, please do not overuse it, and if you have decided to use it, here is my recommendation:

ALWAYS enclose the ternary operator in parentheses.

Suppose you have an expression:

A = B ? 10 : 20;

Then you should write it like this:

A = (B ? 10 : 20);

Yes, the parentheses are excessive here...

But, it will protect your code later when you or your colleagues add an X variable to 10 or 20 while doing code refactoring:

A = X + (B ? 10 : 20);

Without the parentheses, you could forget that the ?: operator has low precedence, and accidentally break the program.

Of course, you can write "X+" inside the parentheses, but it will still lead to the same error, although it is additional protection that shouldn't be rejected.

5. Use available tools to analyze your code

The fragment is taken from LibreOffice project. The error is detected by the following PVS-Studio diagnostic: V718 The 'CreateThread' function should not be called from 'DllMain' function.

BOOL WINAPI DllMain( HINSTANCE hinstDLL,
                     DWORD fdwReason, LPVOID lpvReserved )
{
  ....
  CreateThread( NULL, 0, ParentMonitorThreadProc,
                (LPVOID)dwParentProcessId, 0, &dwThreadId );
  ....
}

Explanation

I used to have a side job as a freelancer long time ago. Once I was given a task I failed to accomplish. The task itself was formulated incorrectly, but I didn't realise that at the time. Moreover, it seemed clear and simple at first.

Under a certain condition in the DllMain I had to do some actions, using Windows API functions; I don't remember which actions exactly, but it wasn't anything difficult.

So I spent loads of time on that, but the code just wouldn't work. More than that, when I made a new standard application, it worked; but it didn't when I tried it in the DllMain function. Some magic, isn't it? I didn't manage to figure out the root of the problem at the time.

It's only now that I work on PVS-Studio development, so many years later, that I have suddenly realized the reason behind that old failure. In the DllMain function, you can perform only a very limited set of actions. The thing is that some DLL may be not loaded yet, and you cannot call functions from them.

Now we have a diagnostic to warn programmers when dangerous operations are detected in DllMain functions. So it was this, which was the case with that old task I was working on.

Details

More details about the usage of DllMain can be found on the MSDN site in this article: Dynamic-Link Library Best Practices. I'll give some abstracts from it here:

DllMain is called while the loader-lock is held. Therefore, significant restrictions are imposed on the functions that can be called within DllMain. As such, DllMain is designed to perform minimal initialization tasks, by using a small subset of the Microsoft Windows API. You cannot call any function in DllMain which directly, or indirectly, tries to acquire the loader lock. Otherwise, you will introduce the possibility that your application deadlocks or crashes. An error in a DllMain implementation can jeopardize the entire process and all of its threads.

The ideal DllMain would be just an empty stub. However, given the complexity of many applications, this is generally too restrictive. A good rule of thumb for DllMain is to postpone the initialization as for as long as possible. Slower initialization increases how robust the application is, because this initialization is not performed while the loader lock is held. Also, slower initialization enables you to safely use much more of the Windows API.

Some initialization tasks cannot be postponed. For example, a DLL that depends on a configuration file will fail to load if the file is malformed or contains garbage. For this type of initialization, the DLLs should attempt to perform the action, and in the case of a failure, exit immediately rather than waste resources by doing some other work.

You should never perform the following tasks from within DllMain:

  • Call LoadLibrary or LoadLibraryEx (either directly or indirectly). This can cause a deadlock or a crash.
  • Call GetStringTypeA, GetStringTypeEx, or GetStringTypeW (either directly or indirectly). This can cause a deadlock or a crash.
  • Synchronize with other threads. This can cause a deadlock.
  • Acquire a synchronization object that is owned by code that is waiting to acquire the loader lock. This can cause a deadlock.
  • Initialize COM threads by using CoInitializeEx Under certain conditions, this function can call LoadLibraryEx.
  • Call the registry functions. These functions are implemented in Advapi32.dll. If Advapi32.dll is not initialized before your DLL, the DLL can access uninitialized memory and cause the process to crash.
  • Call CreateProcess. Creating a process can load another DLL.
  • Call ExitThread. Exiting a thread during DLL detach can cause the loader lock to be acquired again, causing a deadlock or a crash.
  • Call CreateThread. Creating a thread can work if you do not synchronize with other threads, but it is risky.
  • Create a named pipe or other named object (Windows 2000 only). In Windows 2000, named objects are provided by the Terminal Services DLL. If this DLL is not initialized, calls to the DLL can cause the process to crash.
  • Use the memory management function from the dynamic C Run-Time (CRT). If the CRT DLL is not initialized, calls to these functions can cause the process to crash.
  • Call functions in User32.dll or Gdi32.dll. Some functions load another DLL, which may not be initialized.
  • Use managed code.

Correct code

The code fragment from the LibreOffice project cited above may or may not work - it all a matter of chance.

It's not easy to fix an error like this. You need refactor your code in order to make the DllMain function as simple, and short, as possible.

Recommendation

It's hard to give recommendations. You can't know everything; everyone may encounter a mysterious error like this. A formal recommendation would sound like this: you should carefully read all the documentation for every program entity you work with. But you surely understand that one can't foresee every possible issue. You'd only spend all your time reading documentation then, have no time for programming. And even having read N pages, you couldn't be sure you haven't missed some article that could warn you against some trouble.

I wish I could give you somewhat more practical tips, but there is unfortunately only one thing I can think of: use static analyzers. No, it doesn't guarantee you will have zero bugs. Had there been an analyzer all those years ago, which could have told me that I couldn't call the Foo function in DllMain, I would have saved a lot of time and even more nerves: I really was angry, and going crazy, because of not being able to solve the task.

6. Check all the fragments where a pointer is explicitly cast to integer types

The fragment is taken from IPP Samples project. The error is detected by the following PVS-Studio diagnostic: V205 Explicit conversion of pointer type to 32-bit integer type: (unsigned long)(img)

void write_output_image(...., const Ipp32f *img,
                        ...., const Ipp32s iStep) {
  ...
  img = (Ipp32f*)((unsigned long)(img) + iStep);
  ...
}

Note. Some may say that this code isn't the best example for several reasons. We are not concerned about why a programmer would need to move along a data buffer in such a strange way. What matters to us is the fact that the pointer is explicitly cast to the "unsigned long" type. And only this. I chose this example purely because it is brief.

Explanation

A programmer wants to shift a pointer at a certain number of bytes. This code will execute correctly in Win32 mode because the pointer size is the same as that of the long type. But if we compile a 64-bit version of the program, the pointer will become 64-bit, and casting it to long will cause the loss of the higher bits.

Note. Linux uses a different data model. In 64-bit Linux programs, the 'long' type is 64-bit too, but it's still a bad idea to use 'long' to store pointers there. First, such code tends to get into Windows applications quite often, where it becomes incorrect. Second, there are special types whose very names suggest that they can store pointers - for example, intptr_t. Using such types makes the program clearer.

In the example above, we can see a classic error which occurs in 64-bit programs. It should be said right off that there are lots of other errors, too, awaiting programmers in their way of 64-bit software development. But it is the writing of a pointer into a 32-bit integer variable that's the most widespread and insidious issue.

This error can be illustrated in the following way:

Figure 1. A) 32-bit program. B) 64-bit pointer refers to an object that is located in the lower addresses. C) 64-bit pointer is damaged.

Figure 1. A) 32-bit program. B) 64-bit pointer refers to an object that is located in the lower addresses. C) 64-bit pointer is damaged.

Speaking about its insidiousness, this error is sometimes very difficult to notice. The program just "almost works". Errors causing the loss of the most significant bits in pointers may only show up in a few hours of intense use of the program. First, the memory is allocated in the lower memory addresses, that's why all the objects and arrays are stored in the first 4 GB of memory. Everything works fine.

As the program keeps running, the memory gets fragmented, and even if the program doesn't use much of it, new objects may be created outside those first 4 GB. This is where the troubles start. It's extremely difficult to purposely reproduce such issues.

Correct code

You can use such types as size_t, INT_PTR, DWORD_PTR, intrptr_t, etc. to store pointers.

img = (Ipp32f*)((uintptr_t)(img) + iStep);

Actually, we can do it without any explicit casting. It is not mentioned anywhere that the formatting is different from the standard one, that's why there is no magic in using __declspec(align( # )) and so on. So, the pointers are shifted by the number of bytes that is divisible by Ipp32f; otherwise we will have undefined behavior (see EXP36-C)

So, we can write it like this:

img += iStep / sizeof(*img);

Recommendation

Use special types to store pointers - forget about int and long. The most universal types for this purpose are intptr_t and uintptr_t. In Visual C++, the following types are available: INT_PTR, UINT_PTR, LONG_PTR, ULONG_PTR, DWORD_PTR. Their very names indicate that you can safely store pointers in them.

A pointer can fit into the types size_t and ptrdiff_t too, but I still wouldn't recommend using them for that, for they are originally intended for storing sizes and indices.

You cannot store a pointer to a member function of the class in uintptr_t. Member functions are slightly different from standard functions. Except for the pointer itself, they keep hidden value of this that points to the object class. However, it does not matter - in the 32-bit program, you can not assign such a pointer to unsigned int. Such pointers are always handled in a special way, that's why there aren't many problems in 64-bit programs. At least I haven't seen such errors.

If you are going to compile your program into a 64-bit version, first, you need to review and fix all the code fragments where pointers are cast into 32-bit integer types. Reminder - there will be more troublesome fragments in the program, but you should start with the pointers.

For those who are creating or planning to create 64-bit applications, I suggest studying the following resource: Lessons on development of 64-bit C/C++ applications.

7. Do not call the alloca() function inside loops

This bug was found in Pixie project. The error is detected by the following PVS-Studio diagnostic: V505 The 'alloca' function is used inside the loop. This can quickly overflow stack.

inline  void  triangulatePolygon(....) {
  ...
  for (i=1;i<nloops;i++) {
    ...
    do {
      ...
      do {
        ...
        CTriVertex *snVertex =
          (CTriVertex *) alloca(2*sizeof(CTriVertex));
        ...
      } while(dVertex != loops[0]);
      ...
    } while(sVertex != loops[i]);
    ...
  }
  ...
}

Explanation

The alloca(size_t) function allocates memory by using the stack. Memory allocated by alloca() is freed when leaving the function.

There's not much stack memory usually allocated for programs. When you create a project in Visual C++, you may see that the default setting is just 1 megabyte for the stack memory size, this is why the alloca() function can very quickly use up all the available stack memory if used inside a loop.

In the example above, there are 3 nested loops at once. Therefore, triangulating a large polygon will cause a stack overflow.

It is also unsafe to use such macros as A2W in loops as they also contain a call of the alloca() function.

As we have already said, by default, Windows-programs use a stack of 1 Megabyte. This value can be changed; in the project settings find and change the parameters 'Stack Reserve Size', and 'Stack Commit Size'. Details: "/STACK (Stack Allocations)". However, we should understand that making the stack size bigger isn't the solution to the problem - you just postpone the moment when the program stack will overflow.

Recommendation

Do not call the alloca() function inside loops. If you have a loop and need to allocate a temporary buffer, use one of the following 3 methods to do so:

  1. Allocate memory in advance, and then use one buffer for all the operations. If you need buffers of different sizes every time, allocate memory for the biggest one. If that's impossible (you don't know exactly how much memory it will require), use method 2.
  2. Make the loop body a separate function. In this case, the buffer will be created and destroyed right off at each iteration. If that's difficult too, there's only method N3 left.
  3. Replace alloca() with the malloc() function or new operator, or use a class such as std::vector. Take into account that memory allocation will take more time in this case. In the case of using malloc/new you will have to think about freeing it. On the other hand, you won't get a stack overflow when demonstrating the program on large data to the customer.

8. Remember that an exception in the destructor is dangerous

This issue was found in LibreOffice project. The error is detected by the following PVS-Studio diagnostic: V509 The 'dynamic_cast<T&>' operator should be located inside the try..catch block, as it could potentially generate an exception. Raising exception inside the destructor is illegal.

virtual ~LazyFieldmarkDeleter()
{
  dynamic_cast<Fieldmark&>
    (*m_pFieldmark.get()).ReleaseDoc(m_pDoc);
}

Explanation

When an exception is thrown in a program, the stack begins to unroll, and objects get destroyed by calling their destructors. If the destructor of an object being destroyed during stack unrolling throws another exception which leaves the destructor, the C++ library will immediately terminate the program by calling the terminate() function. What follows from this is the rule that destructors should never let exceptions out. An exception thrown inside a destructor must be handled inside the same destructor.

The code cited above is rather dangerous. The dynamic_cast operator will generate a std::bad_cast exception if it fails to cast an object reference to the required type.

Likewise, any other construct that can throw an exception is dangerous. For example, it's not safe to use the new operator to allocate memory in the destructor. If it fails, it will throw a std::bad_alloc exception.

Correct code:

The code can be fixed using the dynamic_cast notwith a reference, but with the pointer. In this case, if it's impossible to convert the type of the object, it won't generate an exception, but will return nullptr.

virtual ~LazyFieldmarkDeleter()
{
  auto p = dynamic_cast<Fieldmark*>m_pFieldmark.get();
  if (p)
    p->ReleaseDoc(m_pDoc);
}

Recommendation

Make your destructors as simple as possible. Destructors aren't meant for memory allocation and file reading.

Of course, it's not always possible to make destructors simple, but I believe we should try to reach that. Besides that, a destructor being complex is generally a sign of a poor class design, and ill-conceived solutions.

The more code you have in your destructor, the harder it is to provide for all possible issues. It makes it harder to tell which code fragment can or cannot throw an exception.

If there is some chance that an exception may occur, a good solution is usually to suppress it by using the catch(...):

virtual ~LazyFieldmarkDeleter()
{
  try
  {
    dynamic_cast<Fieldmark&>
      (*m_pFieldmark.get()).ReleaseDoc(m_pDoc);
  }
  catch (...)
  {
    assert(false);
  }
}

True, using it may conceal some error in the destructor, but it may also help the application to run more stably in general.

I'm not insisting on configuring destructors to never throw exceptions - it all depends on the particular situation. Sometimes it's rather useful to generate an exception in the destructor. I have seen that in specialized classes, but these were rare cases. These classes are designed in such a way that the objects generate an exception upon the destruction, but if it is a usual class like "own string","dot", "brush""triangle", "document" and so on, in these cases the exceptions shouldn't be thrown from the destructor.

Just remember that double exception on end cause a program termination, so it's up to you to decide if you want this to happen in your project or not.

9. Use the '\0' literal for the terminal null character

The fragment is taken from Notepad++ project. The error is detected by the following PVS-Studio diagnostic: The error text: V528 It is odd that pointer to 'char' type is compared with the '\0' value. Probably meant: *headerM != '\0'.

TCHAR headerM[headerSize] = TEXT("");
...
size_t Printer::doPrint(bool justDoIt)
{
  ...
  if (headerM != '\0')
  ...
}

Explanation

Thanks to this code's author, using the '\0' literal to denote the terminal null character, we can easily spot and fix the error. The author did a good job, but not really.

Imagine this code were written in the following way:

if (headerM != 0)

The array address is verified against 0. The comparison doesn't make sense as it always true. What's that - an error or just a redundant check? It's hard to say, especially if it is someone else's code or code written a long time ago.

But since the programmer used the '\0' literal in this code, we can assume that the programmer wanted to check the value of one character. Besides, we know that comparing the headerM pointer with NULL doesn't make sense. All of that taken into account, we figure that the programmer wanted to find out if the string is empty or not but made a mistake when writing the check. To fix the code, we need to add a pointer dereferencing operation.

Correct code

TCHAR headerM[headerSize] = TEXT("");
...
size_t Printer::doPrint(bool justDoIt)
{
  ...
  if (*headerM != _T('\0'))
  ...
}

Recommendation

The number 0 may denote NULL, false, the null character '\0', or simply the value 0. So please don't be lazy - avoid using 0 for shorter notations in every single case. It only makes the code less comprehensible, and errors harder to find.

Use the following notations:

  • 0 - for integer zero;
  • nullptr - for null pointers in C++;
  • NULL - for null pointers in C;
  • '\0', L'\0', _T('\0') - for the terminal null;
  • 0.0, 0.0f - for zero in expressions with floating-point types;
  • false, FALSE - for the value 'false'.

Sticking to this rule will make your code clearer, and make it easier for you and other programmers to spot bugs during code reviews.

10. Avoid using multiple small #ifdef blocks

The fragment is taken from CoreCLR project. The error is detected by the following PVS-Studio diagnostic: V522 Dereferencing of the null pointer 'hp' might take place.

heap_segment* gc_heap::get_segment_for_loh (size_t size
#ifdef MULTIPLE_HEAPS
                                           , gc_heap* hp
#endif //MULTIPLE_HEAPS
                                           )
{
#ifndef MULTIPLE_HEAPS
    gc_heap* hp = 0;
#endif //MULTIPLE_HEAPS
    heap_segment* res = hp->get_segment (size, TRUE);
    if (res != 0)
    {
#ifdef MULTIPLE_HEAPS
        heap_segment_heap (res) = hp;
#endif //MULTIPLE_HEAPS
  ....
}

Explanation

I believe that #ifdef/#endif constructs are evil - an unavoidable evil, unfortunately. They are necessary and we have to use them. So I won't urge you to stop using #ifdef, there's no point in that. But I do want to ask you to be careful to not "overuse" it.

I guess many of you have seen code literally stuffed with #ifdefs. It's especially painful to deal with code where #ifdef is repeated every ten lines, or even more often. Such code is usually system-dependent, and you can't do without using #ifdef in it. That doesn't make you any happier, though.

See how difficult it is to read the code sample above! And it is code reading which programmers have to do as their basic activity. Yes, I do mean it. We spend much more time reviewing and studying existing code than writing new one. That's why code which is hard to read reduces our efficiency so much, and leaves more chance for new errors to sneak in.

Getting back to our code fragment, the error is found in the null pointer dereferencing operation, and occurs when the MULTIPLE_HEAPS macro is not declared. To make it easier for you, let's expand the macros:

heap_segment* gc_heap::get_segment_for_loh (size_t size)
{
  gc_heap* hp = 0;
  heap_segment* res = hp->get_segment (size, TRUE);
  ....

The programmer declared the hp variable, initialized it to NULL, and dereferenced it right off. If MULTIPLE_HEAPS hasn't been defined, we'll get into trouble.

Correct code

This error is still living in CoreCLR (12.04.2016) despite a colleague of mine having reported it in the article "25 Suspicious Code Fragments in CoreCLR", so I'm not sure how best to fix this error.

As I see it, since (hp == nullptr), then the 'res' variable should be initialized to some other value, too - but I don't know what value exactly. So we'll have to do without the fix this time.

Recommendations

Eliminate small #ifdef/#endif blocks from your code - they make it really hard to read and understand! Code with "woods" of #ifdefs is harder to maintain and more prone to mistakes.

There is no recommendation to suit every possible case - it all depends on the particular situation. Anyway, just remember that #ifdef is a source of trouble, so you must always strive to keep your code as clear as possible.

Tip N1. Try refusing #ifdef.

#ifdef can be sometimes replaced with constants and the usual if operator. Compare the following 2 code fragments: A variant with macros:

#define DO 1

#ifdef DO
static void foo1()
{
  zzz();
}
#endif //DO

void F()
{
#ifdef DO
  foo1();
#endif // DO
  foo2();
}

This code is hard to read; you don't even feel like doing it. Bet you've skipped it, haven't you? Now compare it to the following:

const bool DO = true;

static void foo1()
{
  if (!DO)
    return;
  zzz();
}

void F()
{
  foo1();
  foo2();
}

It's much easier to read now. Some may argue the code has become less efficient since there is now a function call and a check in it. But I don't agree with that. First, modern compilers are pretty smart and you are very likely to get the same code without any extra checks and function calls in the release version. Second, the potential performance losses are too small to be bothered about. Neat and clear code is more important.

Tip N2. Make your #ifdef blocks larger.

If I were to write the get_segment_for_loh() function, I wouldn't use a number of #ifdefs there; I'd make two versions of the function instead. True, there'd be a bit more text then, but the functions would be easier to read, and edit too.

Again, some may argue that it's duplicated code, and since they have lots of lengthy functions with #ifdef in each, having two versions of each function may cause them to forget about one of the versions when fixing something in the other.

Hey, wait! And why are your functions lengthy? Single out the general logic into separate auxiliary functions - then both of your function versions will become shorter, ensuring that you will easily spot any differences between them.

I know this tip is not a cure-all. But do think about it.

Tip N3. Consider using templates - they might help.

Tip N4. Take your time and think it over before using #ifdef. Maybe you can do without it? Or maybe you can do with fewer #ifdefs, and keep this "evil" in one place?

11. Don't try to squeeze as many operations as possible in one line

The fragment is taken from Godot Engine project. The error is detected by the following PVS-Studio diagnostic: V567 Undefined behavior. The 't' variable is modified while being used twice between sequence points.

static real_t out(real_t t, real_t b, real_t c, real_t d)
{
  return c * ((t = t / d - 1) * t * t + 1) + b;
}

Explanation

Sometimes, you can come across code fragments where the authors try to squeeze as much logic as possible into a small volume of code, by means of complex constructs. This practice hardly helps the compiler, but it does make the code harder to read and understand for other programmers (or even the authors themselves). Moreover, the risk of making mistakes in such code is much higher, too.

It is in such fragments, where programmers try to put lots of code in just a few lines, that errors related to undefined behavior are generally found. They usually have to do with writing in and reading from one and the same variable within one sequence point. For a better understanding of the issue, we need to discuss in more detail the notions of "undefined behavior" and "sequence point".

Undefined behavior is the property of some programming languages to issue a result that depends on the compiler implementation or switches of optimization. Some cases of undefined behavior (including the one being discussed here) are closely related to the notion of a "sequence point".

A sequence point defines any point in a computer program's execution at which it is guaranteed that all side effects of previous evaluations will have been performed, and no side effects from subsequent evaluations have yet been revealed. In C/C++ programming languages there are following sequence points:

  • sequence points for operators "&&", "||", ",". When not overloaded, these operators guarantee left-to-right execution order;
  • sequence point for ternary operator "?:";
  • sequence point at the end of each full expression (usually marked with ';');
  • sequence point in place of the function call, but after evaluating the arguments;
  • sequence point when returning from the function.

Note. The new C++ standard has discarded the notion of a "sequence point", but we'll be using the above given explanation to let those of you unfamiliar with the subject, grasp the general idea easier and faster. This explanation is simpler than the new one, and is sufficient for us to understand why one shouldn't squeeze lots of operations into one "pile".

In the example we have started with, there is none of the above mentioned sequence points, while the '=' operator, as well as the parentheses, can't be treated as such. Therefore, we cannot know which value of the t variable will be used when evaluating the return value.

In other words, this expression is one single sequence point, so it is unknown in what order the t variable will be accessed. For instance, the "t * t" subexpression may be evaluated before or after writing into the " t = t / d - 1" variable.

Correct code

static real_t out(real_t t, real_t b, real_t c, real_t d)
{
  t = t / d - 1;
  return c * (t * t * t + 1) + b;
}

Recommendation

It obviously wasn't a good idea to try to fit the whole expression in one line. Besides it being difficult to read, it also made it easier for an error to sneak in.

Having fixed the defect and split the expression into two parts, we have solved 2 issues at once - made the code more readable, and gotten rid of undefined behavior by adding a sequence point.

The code discussed above is not the only example, of course. Here's another:

*(mem+addr++) =
   (opcode >= BENCHOPCODES) ? 0x00 : ((addr >> 4)+1) << 4;

Just as in the previous case, the error in this code has been caused by unreasonably complicated code. The programmer's attempt to increment the addr variable within one expression has led to undefined behavior as it is unknown which value the addr variable will have in the right part of the expression - the original or the incremented one.

The best solution to this problem is the same as before - do not complicate matters without reason; arrange operations in several expressions instead of putting them all in one:

*(mem+addr) = (opcode >= BENCHOPCODES) ? 0x00 : ((addr >> 4)+1) << 4;
addr++;

There is a simple yet useful conclusion to draw from all of this - do not try to fit a set of operations in as few lines if possible. It may be more preferable to split the code into several fragments, thus making it more comprehensible, and reducing the chance errors occuring.

Next time you're about to write complex constructs, pause for a while and think what using them will cost you, and if you are ready to pay that price.

12. When using Copy-Paste, be especially careful with the last lines

This bug was found in Source SDK library. The error is detected by the following PVS-Studio diagnostic: V525 The code containing the collection of similar blocks. Check items 'SetX', 'SetY', 'SetZ', 'SetZ'.

inline void SetX( float val );
inline void SetY( float val );
inline void SetZ( float val );
inline void SetW( float val );

inline void Init( float ix=0, float iy=0,
                  float iz=0, float iw = 0 )
{
  SetX( ix );
  SetY( iy );
  SetZ( iz );
  SetZ( iw );
}

Explanation

I'm 100% sure this code was written with the help of Copy-Paste. One of the first lines was copied several times, with certain letters changed in its duplicates. At the very end, this technique failed the programmer: his attention weakened, and he forgot to change letter 'Z' to 'W' in the last line.

In this example, we are not concerned about the fact of a programmer making a mistake; what matters is that it was made at the end of a sequence of monotonous actions.

I do recommend reading the article "The Last Line Effect". Due to public interest a scientific version of it also got published.

Put briefly, when copying code fragments through the Copy-Paste method, it is highly probable that you will make a mistake at the very end of the sequence of copied lines. It's not my guess, it's statistical data.

Correct code

{
  SetX( ix );
  SetY( iy );
  SetZ( iz );
  SetW( iw );
}

Recommendation

I hope you have already read the article I've mentioned above. So, once again, we are dealing with the following phenomenon. When writing similarly looking code blocks, programmers copy and paste code fragments with slight changes. While doing so, they tend to forget to change certain words or characters, and it most often happens at the end of a sequence of monotonous actions because their attention weakens.

To reduce the number of such mistakes, here are a few tips for you:

  1. Arrange your similar looking code blocks in "tables": it should make mistakes more prominent. We will discuss the "table" code layout in the next section. Perhaps in this case the table layout wasn't of much help, but still it's a very useful thing in programming.
  2. Be very careful and attentive when using Copy-Paste. Stay focused, and double-check the code you have written - especially the last few lines.
  3. You have now learned about the last line effect; try to keep this in mind, and tell your colleagues about it. The very fact of you knowing how such errors occur, should help you avoid them.
  4. Share the link to the "The Last Line Effect" article with your colleagues.

13. Table-style formatting

Fragment taken from the ReactOS project (open-source operating system compatible with Windows). The error is detected by the following PVS-Studio diagnostic: V560 A part of conditional expression is always true: 10035L.

void adns__querysend_tcp(adns_query qu, struct timeval now) {
  ...
  if (!(errno == EAGAIN || EWOULDBLOCK ||
        errno == EINTR || errno == ENOSPC ||
        errno == ENOBUFS || errno == ENOMEM)) {
  ...
}

Explanation

The code sample given above is small and you can easily spot the error in it. But when dealing with real-life code, bugs are often very hard to notice. When reading code like that, you tend to unconsciously skip blocks of similar comparisons and go on to the next fragment.

The reason why it happens has to do with the fact that conditions are poorly formatted and you don't feel like paying too much attention to them because it requires certain effort, and we assume that since the checks are similar, there are hardly any mistakes in the condition and everything should be fine.

One of the ways out is formatting the code as a table.

If you felt too lazy to search for an error in the code above, I'll tell you: "errno ==" is missing in one of the checks. It results in the condition always being true as the EWOULDBLOCK is not equal to zero.

Correct code

if (!(errno == EAGAIN || errno == EWOULDBLOCK ||
      errno == EINTR || errno == ENOSPC ||
      errno == ENOBUFS || errno == ENOMEM)) {

Recommendation

For a start, here's a version of this code formatted in the simplest "table" style. I don't like it actually.

if (!(errno == EAGAIN  || EWOULDBLOCK     ||
      errno == EINTR   || errno == ENOSPC ||
      errno == ENOBUFS || errno == ENOMEM)) {

It's better now, but not quite.

There are two reasons why I don't like this layout. First, the error is still not much visible; second, you have to insert too many spaces to align the code.

That's why we need to make two improvements in this formatting style. The first one is we need to use no more than one comparison per line: it makes errors easy to notice. For example:

a == 1 &&
b == 2 &&
c      &&
d == 3 &&

The second improvement is to write operators &&, ||, etc., in a more rational way, i.e. on the left instead of on the right.

See how tedious it is to align code by means of spaces:

x == a          &&
y == bbbbb      &&
z == cccccccccc &&

Writing operators on the left makes it much faster and easier:

   x == a&& y == bbbbb&& z == cccccccccc

The code looks a bit odd, but you'll get used to it very soon.

Let's combine these two improvements to write our code sample in the new style:

if (!(   errno == EAGAIN
      || EWOULDBLOCK
      || errno == EINTR
      || errno == ENOSPC
      || errno == ENOBUFS
      || errno == ENOMEM)) {

Yes, it's longer now - yet the error has become clearly seen, too.

I agree that it looks strange, but nevertheless I do recommend this technique. I've been using it myself for half a year now and enjoy it very much, so I'm confident about this recommendation.

I don't find it a problem at all that the code has become longer. I'd even write it in a way like this:

const bool error =    errno == EAGAIN
                   || errno == EWOULDBLOCK
                   || errno == EINTR
                   || errno == ENOSPC
                   || errno == ENOBUFS
                   || errno == ENOMEM;
if (!error) {

Feel disappointed with the code being too lengthy and cluttered? I agree. So let's make it a function!

static bool IsInterestingError(int errno)
{
  return    errno == EAGAIN
         || errno == EWOULDBLOCK
         || errno == EINTR
         || errno == ENOSPC
         || errno == ENOBUFS
         || errno == ENOMEM;
}
....
if (!IsInterestingError(errno)) {

You may think that I'm dramatizing things, being too much of a perfectionist. But I assure you that errors are very common in complex expressions, and I wouldn't ever bring them up weren't they 'so frequent. They are everywhere. And they are very difficult to notice.

Here's another example from WinDjView project:

inline bool IsValidChar(int c)
{
  return c == 0x9 || 0xA || c == 0xD ||
         c >= 0x20 && c <= 0xD7FF ||
         c >= 0xE000 && c <= 0xFFFD ||
         c >= 0x10000 && c <= 0x10FFFF;
}

The function consists of just a few lines, but it still has an error. The function always returns true. The reason, in the long run, has to do with poor formatting and programmers maintaining the code for many years being unwilling to read it carefully.

Let's refactor this code in the "table" style, I'd also add some parentheses:

inline bool IsValidChar(int c)
{
  return
       c == 0x9
    || 0xA
    || c == 0xD
    || (c >= 0x20    && c <= 0xD7FF)
    || (c >= 0xE000  && c <= 0xFFFD)
    || (c >= 0x10000 && c <= 0x10FFFF);
}

You don't have to format your code exactly the way I suggest. The aim of this post is to draw your attention to typos in "chaotically" written code. By arranging it in the "table" style, you can avoid lots of silly typos, and that's already great. So I hope this post will help you.

Note

Being completely honest, I have to warn you that "table" formatting may sometimes cause harm. Check this example:

inline
void elxLuminocity(const PixelRGBi& iPixel,
                   LuminanceCell< PixelRGBi >& oCell)
{
  oCell._luminance = 2220*iPixel._red +
                     7067*iPixel._blue +
                     0713*iPixel._green;
  oCell._pixel = iPixel;
}

It's taken from the eLynx SDK project. The programmer wanted to align the code, so he added 0 before the value 713. Unfortunately, he forgot that 0 being the first digit in a number means that this number is octal.

An array of strings

I hope that the idea about the table formatting of the code is clear, but I feel like giving couple more examples. Let's have a look at one more case. By bringing it here, I am saying that the table formatting should be used not only with conditions, but also with other various constructions of a language.

The fragment is taken from Asterisk project. The error is detected by the following PVS-Studio diagnostic: V653 A suspicious string consisting of two parts is used for array initialization. It is possible that a comma is missing. Consider inspecting this literal: "KW_INCLUDES""KW_JUMP".

static char *token_equivs1[] =
{
  ...."KW_IF","KW_IGNOREPAT","KW_INCLUDES""KW_JUMP","KW_MACRO","KW_PATTERN",
  ....
};

There is a typo here - one comma is forgotten. As a result two strings that have completely different meaning are combined in one, i.e. we actually have:

  ...."KW_INCLUDESKW_JUMP",
  ....

The error could be avoided if the programmer used the table formatting. Then, if the comma is omitted, it will be easy to spot.

static char *token_equivs1[] =
{
  ...."KW_IF"        ,"KW_IGNOREPAT" ,"KW_INCLUDES"  ,"KW_JUMP"      ,"KW_MACRO"     ,"KW_PATTERN"   ,
  ....
};

Just like last time, pay attention, that if we put the delimiter to the right (a comma in this case), you have to add a lot of spaces, which is inconvenient. It is especially inconvenient if there is a new long line/phrase: we will have to reformat the entire table.

That's why I would again recommend formatting the table in the following way:

static char *token_equivs1[] =
{
  ....
  , "KW_IF"
  , "KW_IGNOREPAT"
  , "KW_INCLUDES"
  , "KW_JUMP"
  , "KW_MACRO"
  , "KW_PATTERN"
  ....
};

Now it's very easy to spot a missing comma and there is no need to use a lot of spaces - the code is beautiful and intuitive. Perhaps this way of formatting may seem unusual, but you quickly get used to it - try it yourself.

Finally, here is my short motto. As a rule, beautiful code is usually correct code.

14. A good compiler and coding style aren't always enough

We have already spoken about good styles of coding, but this time we'll have a look at an anti-example. It's not enough to write good code: there can be various errors and a good programming style isn't always a cure-all.

The fragment is taken from PostgreSQL. The error is detected by the following PVS-Studio diagnostic: V575 The 'memcmp' function processes '0' elements. Inspect the third argument.

Cppcheck analyzer can also detect such errors. It issues a warning: Invalid memcmp() argument nr 3. A non-boolean value is required.

Datum pg_stat_get_activity(PG_FUNCTION_ARGS)
{
  ....
  if (memcmp(&(beentry->st_clientaddr), &zero_clientaddr,
             sizeof(zero_clientaddr) == 0))
  ....
}

Explanation

A closing parenthesis is put in a wrong place. It's just a typo, but unfortunately it completely alters the meaning of the code.

The sizeof(zero_clientaddr) == 0 expression always evaluates to 'false' as the size of any object is always larger than 0. The false value turns to 0, which results in the memcmp() function comparing 0 bytes. Having done so, the function assumes that the arrays are equal and returns 0. It means that the condition in this code sample can be reduced to if (false).

Correct code

if (memcmp(&(beentry->st_clientaddr), &zero_clientaddr,
           sizeof(zero_clientaddr)) == 0)

Recommendation

It's just the case when I can't suggest any safe coding technique to avoid typos. The only thing I can think of is "Yoda conditions", when constants are written to the left of the comparison operator:

if (0 == memcmp(&(beentry->st_clientaddr), &zero_clientaddr,
                sizeof(zero_clientaddr)))

But I won't recommend this style. I don't like and don't use it for two reasons:

First, it makes conditions less readable. I don't know how to put it exactly, but it's not without reason that this style is called after Yoda.

Second, they don't help anyway if we deal with parentheses put in a wrong place. There are lots of ways you can make a mistake. Here's an example of code where using the Yoda conditions didn't prevent the incorrect arrangement of parentheses:

if (0 == LoadStringW(hDllInstance, IDS_UNKNOWN_ERROR,
        UnknownError,
        sizeof(UnknownError) / sizeof(UnknownError[0] -
        20)))

This fragment is taken from the ReactOS project. The error is difficult to notice, so let me point it out for you: sizeof(UnknownError[0] - 20).

So Yoda conditions are useless here.

We could invent some artificial style to ensure that every closing parenthesis stands under the opening one. But it will make the code too bulky and ugly, and no one will be willing to write it that way.

So, again, there is no coding style I could recommend to avoid writing closing parentheses in wrong places.

And here's where the compiler should come in handy and warn us about such a strange construct, shouldn't it? Well, it should but it doesn't. I run Visual Studio 2015, specify the /Wall switch... and don't get any warning. But we can't blame the compiler for that, it has enough work to do as it is.

The most important conclusion for us to draw from today's post is that good coding style and compiler (and I do like the compiler in VS2015) do not always make it. I sometimes hear statements like, "You only need to set the compiler warnings at the highest level and use good style, and everything's going to be OK" No, it's not like that. I don't mean to say some programmers are bad at coding; it's just that every programmer makes mistakes. Everyone, no exceptions. Many of your typos are going to sneak past the compiler and good coding style.

So the combo of good style + compiler warnings is important but not sufficient. That's why we need to use a variety of bug search methods. There's no silver bullet; the high quality of code can be only achieved through a combination of several techniques.

The error we are discussing here can be found by means of the following methods:

  • code review;
  • unit-tests;
  • manual testing;
  • static code analysis;
  • etc.

I suppose you have already guessed that I am personally interested in the static code analysis methodology most of all. By the way, it is most appropriate for solving this particular issue because it can detect errors at the earliest stage, i.e. right after the code has been written.

Indeed, this error can be easily found by such tools as Cppcheck or PVS-Studio.

Conclusion. Some people don't get it that having skill isn't enough to avoid mistakes. Everyone makes them - it's inevitable. Even super-guru make silly typos every now and then. And since it's inevitable, it doesn't make sense blaming programmers, bad compilers, or bad style. It's just not going to help. Instead, we should use a combination of various software quality improving techniques.

15. Start using enum class in your code, if possible

All the examples of this error I have are large. I've picked one of the smallest, but it's still quite lengthy. Sorry for that.

This bug was found in Source SDK library. The error is detected by the following PVS-Studio diagnostic: V556 The values of different enum types are compared: Reason == PUNTED_BY_CANNON.

enum PhysGunPickup_t
{
  PICKED_UP_BY_CANNON,
  PUNTED_BY_CANNON,
  PICKED_UP_BY_PLAYER,
};

enum PhysGunDrop_t
{
  DROPPED_BY_PLAYER,
  THROWN_BY_PLAYER,
  DROPPED_BY_CANNON,
  LAUNCHED_BY_CANNON,
};

void CBreakableProp::OnPhysGunDrop(...., PhysGunDrop_t Reason)
{
  ....
  if( Reason == PUNTED_BY_CANNON )
  {
    PlayPuntSound();
  }
  ....
}

Explanation

The Reason variable is an enumeration of the PhysGunDrop_t type. This variable is compared to the named constant PUNTED_BY_CANNON belonging to another enumeration, this comparison being obviously a logical error.

This bug pattern is quite widespread. I came across it even in such projects as Clang, TortoiseGit, and Linux Kernel.

The reason why it is so frequent is that enumerations are not type safe in the standard C++; you may get easily confused about what should be compared with what.

Correct code

I don't know for sure what the correct version of this code should look like. My guess is that PUNTED_BY_CANNON should be replaced with DROPPED_BY_CANNON or LAUNCHED_BY_CANNON. Let it be LAUNCHED_BY_CANNON.

if( Reason == LAUNCHED_BY_CANNON )
{
  PlayPuntSound();
}

Recommendation

Consider yourself lucky if you write in C++; I recommend that you start using enum class right now and the compiler won't let you compare values, that refer to different enumerations. You won't be comparing pounds with inches anymore.

There are certain innovations in C++ I don't have much confidence in. Take, for instance, the auto keyword. I believe it may be harmful when used too often. Here's how I see it: programmers spend more time reading the code rather than writing it, so we must ensure that the program text is easy-to-read. In the C language, variables are declared in the beginning of the function, so when you edit the code in the middle or at the end of it, it's not always easy to figure what some Alice variable actually means. That's why there exists a variety of variable naming notations. For instance, there is a prefix notation, where pfAlice may stand for a "pointer to float".

In C++, you can declare variables whenever you need, and it is considered a good style. Using prefixes and suffixes in variable names is no longer popular. And here the auto keyword emerges, resulting in programmers starting to use multiple mysterious constructs of the "auto Alice = Foo();" kind again. Alice, who the fuck is Alice?!

Sorry for digressing from our subject. I wanted to show you that some of the new features may do both good and bad. But it's not the case with enum class: I do believe it does only good.

When using enum class, you must explicitly specify to which enumeration a named constant belongs to. It protects the code from new errors. That is, the code will look like this:

enum class PhysGunDrop_t
{
  DROPPED_BY_PLAYER,
  THROWN_BY_PLAYER,
  DROPPED_BY_CANNON,
  LAUNCHED_BY_CANNON,
};

void CBreakableProp::OnPhysGunDrop(...., PhysGunDrop_t Reason)
{
  ....
  if( Reason == PhysGunDrop_t::LAUNCHED_BY_CANNON )
  {
    PlayPuntSound();
  }
  ....
}

True, fixing old code may involve certain difficulties. But I do urge you to start using enum class in new code right from this day on. Your project will only benefit from it.

I don't see much point in introducing enum class. Here's a few links for you to learn all the details about this new wonderful feature of the C++11 language:

  1. Wikipedia. C++11. Strongly typed enumerations.
  2. Cppreference. Enumeration declaration.
  3. StackOverflow. Why is enum class preferred over plain enum?

16. "Look what I can do!" - Unacceptable in programming

This section will be slightly similar to "Don't try to squeeze as many operations as possible in one line", but this time I want to focus on a different thing. Sometimes it feels like programmers are competing against somebody, trying to write the shortest code possible.

I am not speaking about complicated templates. This is a different topic for discussion, as it is very hard to draw a line between where these templates do harm, and where they do good. Now I am going to touch upon a simpler situation which is relevant for both C and C++ programmers. They tend to make the constructions more complicated, thinking, "I do it because I can".

The fragment is taken from KDE4 project. The error is detected by the following PVS-Studio diagnostic: V593 Consider reviewing the expression of the 'A = B == C' kind. The expression is calculated as following: 'A = (B == C)'.

void LDAPProtocol::del( const KUrl &_url, bool )
{
  ....
  if ( (id = mOp.del( usrc.dn() ) == -1) ) {
    LDAPErr();
    return;
  }
  ret = mOp.waitForResult( id, -1 );
  ....
}

Explanation

After looking at this code, I always have questions such as: What was the point of doing it? Did you want to save a line? Did you want to show that you can combine several actions in one expression?

As a result we have a typical error pattern - using expressions of the if (A = Foo() == Error) kind.

The precedence of the comparison operation is higher than that of the assignment operation. That's why the "mOp.del( usrc.dn() ) == -1" comparison is executed first, and only then the true (1) or false (0) value is assigned to the id variable.

If mOp.del() returns '-1', the function will terminate; otherwise, it will keep running and the 'id' variable will be assigned an incorrect value. So it will always equal 0.

Correct code

I want to emphasize: adding extra parentheses is not a solution to the problem. Yes, the error can be eliminated. But it's the wrong way.

There were additional parentheses in the code - have a closer look. It's difficult to say what they were meant for; perhaps the programmer wanted to get rid of the compiler warnings. Perhaps he suspected that the operation priority may be not right, and wanted to fix this issue, but failed to do so. Anyway, those extra brackets don't help.

There is a deeper problem here. If it is a possible not to make the code more complicated, don't. It is better to write:

id = mOp.del(usrc.dn());
if ( id == -1 ) {

Recommendation

Don't be so lazy as not to write an extra code line: complex expressions are hard to read, after all. Do the assignment first, and only then, the comparison. Thus you will make it easier for programmers who will be maintaining your code later, and also it will reduce the chances of making a mistake.

So my conclusion is - don't try to show off.

This tip sounds trivial, but I hope it will help you. It's always better to write clear and neat code, instead of in a "see how cool I am" style.

17. Use dedicated functions to clear private data

The fragment is taken from the Apache HTTP Server project. The error is detected by the following PVS-Studio diagnostic: V597 The compiler could delete the 'memset' function call, which is used to flush 'x' buffer. The RtlSecureZeroMemory() function should be used to erase the private data.

static void MD4Transform(
  apr_uint32_t state[4], const unsigned char block[64])
{
  apr_uint32_t a = state[0], b = state[1],
               c = state[2], d = state[3],
               x[APR_MD4_DIGESTSIZE];
  ....
  /* Zeroize sensitive information. */
  memset(x, 0, sizeof(x));
}

Explanation

In this code the programmer uses a call of the memset() function to erase private data. But it's not the best way to do that because the data won't actually be erased. To be more exact, whether or not they will be erased depends on the compiler, its settings, and the Moon phase.

Try to look at this code from the compiler's viewpoint. It does its best to make your code work as fast as possible, so it carries out a number of optimizations. One of them is to remove the calls of functions which don't affect the program's behavior, and are therefore excessive from the viewpoint of the C/C++ language. This is exactly the case with the memset() function in the code sample above. True, this function changes the 'x' buffer, but this buffer is not used anywhere after that, which means the call of the memset() function can - and ought to - be deleted.

Important! What I'm telling you now is not a theoretical model of the compiler's behavior - it's a real-life one. In such cases, the compiler does remove the calls of the memset() function. You can do a few experiments to check it for yourself. For more details and examples on this issue, please see the following articles:

  1. Security, security! But do you test it?
  2. Safe Clearing of Private Data.
  3. V597. The compiler could delete the 'memset' function call, which is used to flush 'Foo' buffer. The RtlSecureZeroMemory() function should be used to erase the private data
  4. Zero and forget -- caveats of zeroing memory in C (see also the discussion of this article).
  5. MSC06-C. Beware of compiler optimizations.

What makes this error with removed memset() calls especially tricky, is its being very hard to track. When working in the debugger, you will most likely be dealing with un-optimized code, with the function call still there. You can only find the error when studying the assembler listing, generated when building the optimized application version.

Some programmers believe that it has to do with a bug in the compiler, and that it has no right to throw away the calls of such an important function as memset(). But this is not the case. This function is by no means more, or less, important than any other, so the compiler has full right to optimize the code where it is called. After all, such code may turn out to be excessive indeed.

Correct code

memset_s(x, sizeof(x), 0, sizeof(x));

or

RtlSecureZeroMemory(x, sizeof(x));

Recommendation

You should use special memory clearing functions that the compiler is not allowed to remove for its optimization purposes.

Visual Studio, for instance, offers the RtlSecureZeroMemory function; and starting with C11, you can use the memset_s function. If necessary, you can even create a safe function of your own - there are lots of examples on the Internet. Here is a couple of them.

Version No.1.

errno_t memset_s(void *v, rsize_t smax, int c, rsize_t n) {
  if (v == NULL) return EINVAL;
  if (smax > RSIZE_MAX) return EINVAL;
  if (n > smax) return EINVAL;
  volatile unsigned char *p = v;
  while (smax-- && n--) {
    *p++ = c;
  }
  return 0;
}

Version No.2.

void secure_zero(void *s, size_t n)
{
    volatile char *p = s;
    while (n--) *p++ = 0;
}

Some programmers even go further, and implement functions to fill the array with pseudo-random values, these functions running at different times to ensure better protection from time-measuring attacks. You can find the implementations of such functions on the internet, too.

18. The knowledge you have, working with one language isn't always applicable to another language

The fragment is taken from Putty project. Ineffective code is detected by the following PVS-Studio diagnostic: V814 Decreased performance. Calls to the 'strlen' function have being made multiple times when a condition for the loop's continuation was calculated.

static void tell_str(FILE * stream, char *str)
{
  unsigned int i;
  for (i = 0; i < strlen(str); ++i)
    tell_char(stream, str[i]);
}

Explanation

There's no actual error here, but such code can be extremely inefficient when we deal with long strings, as the strlen() function is called in every loop iteration. So the error, if there is one here, is one of inefficiency.

As a rule, this kind of thing is typically found in code written by those that have previously worked with the Pascal language (or Delphi). In Pascal, the evaluation of the terminating condition of the loop is computed just once, thus this code is suitable and quite commonly used.

Let's have a look at an example of code written in Pascal. The word called will be printed only once, because the pstrlen() is called only once.

program test;
var
  i   : integer;
  str : string;

function pstrlen(str : string): integer;
begin
  writeln('called');
  pstrlen := Length(str);
end;

begin
  str := 'a pascal string';
  for i:= 1 to pstrlen(str) do
    writeln(str[i]);
end.

Effective code:

static void tell_str(FILE * stream, char *str)
{
  size_t i;
  const size_t len = strlen(str);
  for (i = 0; i < len; ++i)
    tell_char(stream, str[i]);
}

Recommendation

Don't forget that in C/C++, loop termination conditions are re-computed at the end of each and every iteration. Therefore it's not a good idea to call inefficient slow functions as part of this evaluation, especially if you can compute it just the once, before the loop is entered.

In some cases the compiler might be able to optimize the code with strlen(). For instance, if the pointer always refers to the same string literal, but we shouldn't rely on that in any way.

19. How to properly call one constructor from another

This issue was found in LibreOffice project. The error is detected by the following PVS-Studio diagnostic: V603 The object was created but it is not being used. If you wish to call constructor, 'this->Guess::Guess(....)' should be used.

Guess::Guess()
{
  language_str = DEFAULT_LANGUAGE;
  country_str = DEFAULT_COUNTRY;
  encoding_str = DEFAULT_ENCODING;
}

Guess::Guess(const char * guess_str)
{
  Guess();
  ....
}

Explanation

Good programmers hate writing duplicate code. And that's great. But when dealing with constructors, many shoot themselves in the foot, trying to make their code short and neat.

You see, a constructor can't simply be called like an ordinary function. If we write "A::A(int x) { A(); }", it will lead to creating a temporary unnamed object of the A type, instead of calling a constructor without arguments.

This is exactly what happens in the code sample above: a temporary unnamed object Guess() is created and gets immediately destroyed, while the class member language_str and others remain uninitialized.

Correct code:

There used to be 3 ways to avoid duplicate code in constructors. Let's see what they were.

The first way is to implement a separate initialization function, and call it from both constructors. I'll spare you the examples - it should be obvious as it is.

That's a fine, reliable, clear, and safe technique. However, some bad programmers want to make their code even shorter. So I have to mention two other methods.

They are pretty dangerous, and require you to have a good understanding of how they work, and what consequences you may have to face.

The second way:

Guess::Guess(const char * guess_str)
{
  new (this) Guess();
  ....
}

Third way:

Guess::Guess(const char * guess_str)
{
  this->Guess();
  ....
}

The second and the third variant are rather dangerous because the base classes are initialized twice. Such code can cause subtle bugs, and do more harm than good. Consider an example where such a constructor call is appropriate, where it's not.

Here is a case where everything is fine:

class SomeClass
{
  int x, y;
public:
  SomeClass() { new (this) SomeClass(0,0); }
  SomeClass(int xx, int yy) : x(xx), y(yy) {}
};

The code is safe and works well since the class only contains simple data types, and is not derived from other classes. A double constructor call won't pose any danger.

And here's another example where explicitly calling a constructor will cause an error:

class Base
{
public:
 char *ptr;
 std::vector vect;
 Base() { ptr = new char[1000]; }
 ~Base() { delete [] ptr; }
};

class Derived : Base
{
  Derived(Foo foo) { }
  Derived(Bar bar) {
     new (this) Derived(bar.foo);
  }
  Derived(Bar bar, int) {
     this->Derived(bar.foo);
  }
}

So we call the constructor using the expressions "new (this) Derived(bar.foo);" or "this->Derived(bar.foo)".

The Base object is already created, and the fields are initialized. Calling the constructor once again will cause double initialization. As a result, a pointer to the newly allocated memory chunk will be written into ptr, which will result in a memory leak. As for double initialization of an object of the std::vector type, the consequences of it are even harder to predict. One thing is clear: code like that is not permissible.

Do you need all that headache, after all? If you can't utilize C++11's features, then use method No. 1 (create an initialization function). An explicit constructor call may be only needed on very rare occasions.

Recommendation

And now we have a feature to help us with the constructors, at last!

C++11 allows constructors to call other peer constructors (known as delegation). This allows constructors to utilize another constructor's behavior with a minimum of added code.

For example:

Guess::Guess(const char * guess_str) : Guess()
{
  ....
}

To learn more about delegating constructors, see the following links:

  1. Wikipedia. C++11. Object construction improvement.
  2. C++11 FAQ. Delegating constructors.
  3. MSDN. Uniform Initialization and Delegating Constructors.

20. The End-of-file (EOF) check may not be enough

The fragment is taken from SETI@home project. The error is detected by the following PVS-Studio diagnostic: V663 Infinite loop is possible. The 'cin.eof()' condition is insufficient to break from the loop. Consider adding the 'cin.fail()' function call to the conditional expression.

template <typename T>
std::istream &operator >>(std::istream &i, sqlblob<T> &b)
{
  ....
  while (!i.eof())
  {
    i >> tmp;
    buf+=(tmp+'');
  }
  ....
}

Explanation

The operation of reading data from a stream object is not as trivial as it may seem at first. When reading data from streams, programmers usually call the eof() method to check if the end of stream has been reached. This check, however, is not quite adequate as it is not sufficient and doesn't allow you to find out if any data reading errors or stream integrity failures have occurred, which may cause certain issues.

Note. The information provided in this article concerns both input and output streams. To avoid repetition, we'll only discuss one type of stream here.

This is exactly the mistake the programmer made in the code sample above: in the case of there being any data reading error, an infinite loop may occur as the eof() method will always return false. On top of that, incorrect data will be processed in the loop, as unknown values will be getting to the tmp variable.

To avoid issues like that, we need to use additional methods to check the stream status: bad(), fail().

Correct code

Let's take advantage of the fact that the stream can implicitly cast to the bool type. The true value indicates that the value is read successfully.More details about the way this code works can be found on StackOverflow.

template <typename T>
std::istream &operator >>(std::istream &i, sqlblob<T> &b)
{
  ....
  while (i >> tmp)
  {
    buf+=(tmp+'');
  }
  ....
}

Recommendation

When reading data from a stream, don't use the eof() method only; check for any failures, too.

Use the methods bad() and fail() to check the stream status. The first method is used to check stream integrity failures, while the second is for checking data reading errors.

However, it's much more convenient to use bool() operator, as it is shown in the example of the correct code.

21. Check that the end-of-file character is reached correctly (EOF)

Let's continue the topic of working with files. And again we'll have a look at EOF. But this time we'll speak about a bug of a completely different type. It usually reveals itself in localized versions of software.

The fragment is taken from Computational Network Toolkit. The error is detected by the following PVS-Studio diagnostic: V739 EOF should not be compared with a value of the 'char' type. The 'c' should be of the 'int' type.

string fgetstring(FILE* f)
{
  string res;
  for (;;)
  {
    char c = (char) fgetc(f);
    if (c == EOF)
      RuntimeError("error reading .... 0: %s", strerror(errno));
    if (c == 0)
      break;
    res.push_back(c);
  }
  return res;
}

Explanation

Let's look at the way EOF is declared:

#define EOF (-1)

As you can see, the EOF is nothing more than '-1 ' of int type. Fgetc() function returns a value of int type. Namely, it can return a number from 0 to 255 or -1 (EOF). The values read are placed into a variable of char type. Because of this, a symbol with the 0xFF (255) value turns into -1, and then is handled in the same way as the end of file (EOF).

Users that use Extended ASCII Codes, may encounter an error when one of the symbols of their alphabet is handled incorrectly by the program.

For example in the Windows 1251 code page, the last letter of Russian alphabet has the 0xFF code, and so, is interpreted by the program as the end-of-file character.

Correct code

for (;;)
{
  int c = fgetc(f);
  if (c == EOF)
    RuntimeError("error reading .... 0: %s", strerror(errno));
  if (c == 0)
    break;
  res.push_back(static_cast<char>(c));
}

Recommendation

There is probably no particular recommendation here, but as we are speaking about EOF, I wanted to show an interesting variant of an error, that some people aren't aware of.

Just remember, if the functions return the values of int type, don't hasten to change it into char. Stop and check that everything is fine. By the way, we have already had a similar case discussing the function memcmp() in Chapter N2 - "Larger than 0 does not mean 1" (See the fragment about a vulnerability in MySQL)

22. Do not use #pragma warning(default:X)

The fragment is taken from TortoiseGIT project. The error is detected by the following PVS-Studio diagnostic: V665 Possibly, the usage of '#pragma warning(default: X)' is incorrect in this context. The '#pragma warning(push/pop)' should be used instead.

#pragma warning(disable:4996)
LONG result = regKey.QueryValue(buf, _T(""), &buf_size);
#pragma warning(default:4996)

Explanation

Programmers often assume that warnings disabled with the "pragma warning(disable: X)" directive earlier will start working again after using the "pragma warning(default : X)" directive. But it is not so. The 'pragma warning(default : X)' directive sets the 'X' warning to the DEFAULT state, which is quite not the same thing.

Suppose that a file is compiled with the /Wall switch used. The C4061 warning must be generated in this case. If you add the "#pragma warning(default : 4061)" directive, this warning will not be displayed, as it is turned off by default.

Correct code

#pragma warning(push)
#pragma warning(disable:4996)
LONG result = regKey.QueryValue(buf, _T(""), &buf_size);
#pragma warning(pop)

Recommendation

The correct way to return the previous state of a warning is to use directives "#pragma warning(push[ ,n ])" and "#pragma warning(pop)". See the Visual C++ documentation for descriptions of these directives: Pragma Directives. Warnings.

Library developers should pay special attention to the V665 warning. Careless warning customization may cause a whole lot of trouble on the library users' side.

A good article on this topic: So, You Want to Suppress This Warning in Visual C++

23. Evaluate the string literal length automatically

The fragment is taken from the OpenSSL library. The error is detected by the following PVS-Studio diagnostic: V666 Consider inspecting the third argument of the function 'strncmp'. It is possible that the value does not correspond with the length of a string which was passed with the second argument.

if (!strncmp(vstart, "ASCII", 5))
  arg->format = ASN1_GEN_FORMAT_ASCII;
else if (!strncmp(vstart, "UTF8", 4))
  arg->format = ASN1_GEN_FORMAT_UTF8;
else if (!strncmp(vstart, "HEX", 3))
  arg->format = ASN1_GEN_FORMAT_HEX;
else if (!strncmp(vstart, "BITLIST", 3))
  arg->format = ASN1_GEN_FORMAT_BITLIST;
else
  ....

Explanation

It's very hard to stop using magic numbers. Also, it would be very unreasonable to get rid of such constants as 0, 1, -1, 10. It's rather difficult to come up with names for such constants, and often they will make reading of the code more complicated.

However, it's very useful to reduce the number of magic numbers. For example, it would be helpful to get rid of magic numbers which define the length of string literals.

Let's have a look at the code given earlier. The code was most likely written using the Copy-Paste method. A programmer copied the line:

else if (!strncmp(vstart, "HEX", 3))

After that "HEX" was replaced by "BITLIST", but the programmer forgot to change 3 to 7. As a result, the string is not compared with "BITLIST", only with "BIT". This error might not be a crucial one, but still it is an error.

It's really bad that the code was written using Copy-Paste. What's worse is that the string length was defined by a magic constant. From time to time we come across such errors, where the string length does not correspond with the indicated number of symbols because of a typo or carelessness of a programmer. So it's quite a typical error, and we have to do something about it. Let's look closely at the question of how to avoid such errors.

Correct code

First it may seem that it's enough to replace strncmp() call with strcmp(). Then the magic constant will disappear.

else if (!strcmp(vstart, "HEX"))

Too bad-we have changed the logic of the code work. The strncmp() function checks if the string starts with "HEX", and the function strcmp() checks if the strings are equal. There are different checks.

The easiest way to fix this is to change the constant:

else if (!strncmp(vstart, "BITLIST", 7))
  arg->format = ASN1_GEN_FORMAT_BITLIST;

This code is correct, but it is very bad because the magic 7 is still there. That's why I would recommend a different method.

Recommendation

Such an error can be prevented if we explicitly evaluate the string length in the code. The easiest option is to use the strlen() function.

else if (!strncmp(vstart, "BITLIST", strlen("BITLIST")))

In this case it will be much easier to detect a mismatch if you forget to fix one of the strings:

else if (!strncmp(vstart, "BITLIST", strlen("HEX")))

But the suggested variant has two disadvantages:

  1. There is no guarantee that the compiler will optimize the strlen() call and replace it with a constant.
  2. You have to duplicate the string literal. It does not look graceful, and can be the subject of a possible error.

The first issue can be dealt with by using special structures for literal length evaluation during the compilation phase. For instance, you can use a macro such as:

#define StrLiteralLen(arg) ((sizeof(arg) / sizeof(arg[0])) - 1)
....
else if (!strncmp(vstart, "BITLIST", StrLiteralLen("BITLIST")))

But this macros can be dangerous. The following code can appear during the refactoring process:

const char *StringA = "BITLIST";
if (!strncmp(vstart, StringA, StrLiteralLen(StringA)))

In this case StrLiteralLen macro will return some nonsense. Depending on the pointer size (4 or 8 byte) we will get the value 3 or 7. But we can protect ourselves from this unpleasant case in C++ language, by using a more complicated trick:

template <typename T, size_t N>
char (&ArraySizeHelper(T (&array)[N]))[N];
#define StrLiteralLen(str) (sizeof(ArraySizeHelper(str)) - 1)

Now, if the argument of the StrLiteralLen macro is a simple pointer, we won't be able to compile the code.

Let's have a look at the second issue (duplicating of the string literal). I have no idea what to say to C programmers. You can write a special macro for it, but personally I don't like this variant. I am not a fan of macros. That's why I don't know what to suggest.

In C++ everything is fabulously awesome. Moreover, we solve the first problem in a really smart way. The template function will be of a great help to us. You can write it in different ways, but in general it will look like this:

template<typename T, size_t N>
int mystrncmp(const T *a, const T (&b)[N])
{
  return _tcsnccmp(a, b, N - 1);
}

Now the string literal is used only once. The string literal length is evaluated during the compilation phase. You cannot accidentally pass a simple pointer to the function and incorrectly evaluate the string length. Presto!

Summary: try to avoid magic numbers when working with strings. Use macros or template functions; the code will become not only safer, but more beautiful and shorter.

As an example, you can look at the declaration of a function strcpy_s ():

errno_t strcpy_s(
   char *strDestination,
   size_t numberOfElements,
   const char *strSource
);
template <size_t size>
errno_t strcpy_s(
   char (&strDestination)[size],
   const char *strSource
); // C++ only

The first variant is intended for the C language, or in the case of a buffer size not being known in advance. If we work with the buffer, created on the stack, then we can use the second variant in C++:

char str[BUF_SIZE];
strcpy_s(str, "foo");

There are no magic numbers, there is no evaluation of the buffer size at all. It's short and sweet.

24. Override and final specifiers should become your new friends

The fragment is taken from the MFC library. The error is detected by the following PVS-Studio diagnostic: V301 Unexpected function overloading behavior. See first argument of function 'WinHelpW' in derived class 'CFrameWndEx' and base class 'CWnd'.

class CWnd : public CCmdTarget {
  ....
  virtual void WinHelp(DWORD_PTR dwData,
                       UINT nCmd = HELP_CONTEXT);
  ....
};
class CFrameWnd : public CWnd {
  ....
};
class CFrameWndEx : public CFrameWnd {
  ....
  virtual void WinHelp(DWORD dwData,
                       UINT nCmd = HELP_CONTEXT);
  ....
};

Explanation

When you override a virtual function it's quite easy to make an error in the signature and to define a new function, which won't be in any way connected with the function in the base class. There can be various errors in this case.

  1. Another type is used in the parameter of the overridden function.
  2. The overridden function has a different number of parameters, this can be especially crucial when there are many parameters.
  3. The overridden function differs in const modifier.
  4. The base class function is not a virtual one. It was assumed that the function in the derived class would override it in the base class, but in reality it hides it.

The same error can occur during the change of types or parameter quantity in the existing code, when the programmer changed the virtual function signature in almost the entire hierarchy, but forgot to do it in some derived class.

This error can appear particularly often during the porting process to the 64-bit platform when replacing the DWORD type with DWORD_PTR, LONG with LONG_PTR and so on. Details. This is exactly our case.

Even in the case of such an error the 32-bit system will work correctly, as both DWORD and DWORD_PTR are synonyms of unsigned long; but in 64-bit version there will be an error because DWORD_PTR is a synonym of unsigned __int64 there.

Correct code

class CFrameWndEx : public CFrameWnd {
  ....
  virtual void WinHelp(DWORD_PTR dwData,
                       UINT nCmd = HELP_CONTEXT) override;
  ....
};

Recommendation

Now we have a way to protect ourselves from the error we described above. Two new specifiers were added in C++11:

  • Override - to indicate that the method is overriding a virtual method in a base class
  • Final - to indicate that derived classes do not need to override this virtual method.

We are interested in the override specifier. This is an indication for the compiler to check if the virtual function is really overriding the base class function, and to issue an error if it isn't.

If override was used when determining the function WinHelp in the CFrameWndEx class, we would have an error of compilation on a 64-bit version of an application. Thus the error could have been prevented at an early stage.

Always use the override specifier (or final), when overriding virtual functions. More details about override and final can be seen here:

25. Do not compare 'this' to nullptr anymore

The fragment is taken from CoreCLR project. This dangerous code is detected by the following PVS-Studio diagnostic: V704 'this == nullptr' expression should be avoided - this expression is always false on newer compilers, because 'this' pointer can never be NULL.

bool FieldSeqNode::IsFirstElemFieldSeq()
{
  if (this == nullptr)
    return false;
  return m_fieldHnd == FieldSeqStore::FirstElemPseudoField;
}

Explanation

People used to compare this pointer with 0 / NULL / nullptr. It was a common situation when C++ was only in the beginning of its development. We have found such fragments doing "archaeological" research. I suggest reading about them in an article about checking Cfront. Moreover, in those days the value of this pointer could be changed, but it was so long ago that it was forgotten.

Let's go back to the comparison of this with nullptr.

Now it is illegal. According to modern C++ standards, this can NEVER be equal to nullptr.

Formally the call of the IsFirstElemFieldSeq() method for a null-pointer this according to C++ standard leads to undefined behavior.

It seems that if this==0, then there is no access to the fields of this class while the method is executed. But in reality there are two possible unfavorable ways of such code implementation. According to C++ standards, this pointer can never be null, so the compiler can optimize the method call, by simplifying it to:

bool FieldSeqNode::IsFirstElemFieldSeq()
{
  return m_fieldHnd == FieldSeqStore::FirstElemPseudoField;
}

There is one more pitfall, by the way. Suppose there is the following inheritance hierarchy.

class X: public Y, public FieldSeqNode { .... };
....
X * nullX = NULL;
X->IsFirstElemFieldSeq();

Suppose that the Y class size is 8 bytes. Then the source pointer NULL (0x00000000) will be corrected in such a way, so that it points to the beginning of FieldSeqNode sub object. Then you have to offset it to sizeof(Y) byte. So this in the IsFirstElemFieldSeq() function will be 0x00000008. The "this == 0" check has completely lost its sense.

Correct code

It's really hard to give an example of correct code. It won't be enough to just remove this condition from the function. You have to do the code refactoring in such a way that you will never call the function, using the null pointer.

Recommendation

So, now the "if (this == nullptr)" is outlawed. However, you can see this code in many applications and libraries quite often (MFC library for instance). That's why Visual C++ is still diligently comparing this to 0. I guess the compiler developers are not so crazy as to remove code that has been working properly for a dozen years.

But the law was enacted. So for a start let's avoid comparing this to null. And once you have some free time, it will be really useful to check out all the illegal comparisons, and rewrite the code.

Most likely the compilers will act in the following way. First they will give us comparison warnings. Perhaps they are already giving them, I haven't studied this question. And then at some point they'll fully support the new standard, and your code will cease working altogether. So I strongly recommend that you start obeying the law, it will be helpful later on.

P.S. When refactoring you may need the Null object pattern.

Additional links on the topic:

  1. Still Comparing "this" Pointer to Null?
  2. Diagnostic V704.

26. Insidious VARIANT_BOOL

The fragment is taken from NAME project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V721 The VARIANT_BOOL type is utilized incorrectly. The true value (VARIANT_TRUE) is defined as -1. Inspect the first argument.

virtual HRESULT __stdcall
  put_HandleKeyboard (VARIANT_BOOL pVal) = 0;
....
pController->put_HandleKeyboard(true);

Explanation:

There is quite a witty quote:

We all truck around a kind of original sin from having learned Basic at an impressionable age. (C) P.J. Plauger

And this hint is exactly on the topic of evil. VARIANT_BOOL type came to us from Visual Basic. Some of our present day programming troubles are connected with this type. The thing is that "true" is coded as -1 in it.

Let's see the declaration of the type and the constants denoting true/false:

typedef short VARIANT_BOOL;

#define VARIANT_TRUE ((VARIANT_BOOL)-1)

#define VARIANT_FALSE ((VARIANT_BOOL)0)

It seems like there is nothing terrible in it. False is 0, and truth is not 0. So, -1 is quite a suitable constant. But it's very easy to make an error by using true or TRUE instead of VARIANT_TRUE.

Correct code

pController->put_HandleKeyboard(VARIANT_TRUE);

Recommendation

If you see an unknown type, it's better not to hurry, and to look up in the documentation. Even if the type name has a word BOOL, it doesn't mean that you can place 1 into the variable of this type.

In the same way programmers sometimes make mistakes, when they use HRESULT type, trying to compare it with FALSE or TRUE and forgetting that:

#define S_OK     ((HRESULT)0L)
#define S_FALSE  ((HRESULT)1L)

So I really ask you to be very careful with any types which are new to you, and not to hasten when programming.

27. Guileful BSTR strings

Let's talk about one more nasty data type - BSTR (Basic string or binary string).

The fragment is taken from VirtualBox project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V745 A 'wchar_t *' type string is incorrectly converted to 'BSTR' type string. Consider using 'SysAllocString' function.

....
HRESULT EventClassID(BSTR bstrEventClassID);
....
hr = pIEventSubscription->put_EventClassID(
                    L"{d5978630-5b9f-11d1-8dd2-00aa004abd5e}");

Explanation

Here's how a BSTR type is declared:

typedef wchar_t OLECHAR;
typedef OLECHAR * BSTR;

At first glance it seems that "wchar_t *" and BSTR are one and the same things. But this is not so, and this brings a lot of confusion and errors.

Let's talk about BSTR type to get a better idea of this case.

Here is the information from MSDN site. Reading MSDN documentation isn't much fun, but we have to do it.

A BSTR (Basic string or binary string) is a string data type that is used by COM, Automation, and Interop functions. Use the BSTR data type in all interfaces that will be accessed from script. BSTR description:

  1. Length prefix. A four-byte integer that contains the number of bytes in the following data string. It appears immediately before the first character of the data string. This value does not include the terminating null character.
  2. Data string. A string of Unicode characters. May contain multiple embedded null characters.
  3. Terminator. Two null characters.

A BSTR is a pointer. The pointer points to the first character of the data string, not to the length prefix. BSTRs are allocated using COM memory allocation functions, so they can be returned from methods without concern for memory allocation. The following code is incorrect:

BSTR MyBstr = L"I am a happy BSTR";

This code builds (compiles and links) correctly, but it will not function properly because the string does not have a length prefix. If you use a debugger to examine the memory location of this variable, you will not see a four-byte length prefix preceding the data string. Instead, use the following code:

BSTR MyBstr = SysAllocString(L"I am a happy BSTR");

A debugger that examines the memory location of this variable will now reveal a length prefix containing the value 34. This is the expected value for a 17-byte single-character string that is converted to a wide-character string through the inclusion of the "L" string modifier. The debugger will also show a two-byte terminating null character (0x0000) that appears after the data string.

If you pass a simple Unicode string as an argument to a COM function that is expecting a BSTR, the COM function will fail.

I hope this is enough to understand why we should separate the BSTR and simple strings of "wchar_t *" type.

Additional links:

  1. MSDN. BSTR.
  2. StackOverfow. Static code analysis for detecting passing a wchar_t* to BSTR.
  3. StackOverfow. BSTR to std::string (std::wstring) and vice versa.
  4. Robert Pittenger. Guide to BSTR and CString Conversions.
  5. Eric Lippert. Eric's Complete Guide To BSTR Semantics.

Correct code

hr = pIEventSubscription->put_EventClassID(
       SysAllocString(L"{d5978630-5b9f-11d1-8dd2-00aa004abd5e}"));

Recommendation

The tip resembles the previous one. If you see an unknown type, it's better not to hurry, and to look it up in the documentation. This is important to remember, so it's not a big deal that this tip was repeated once again.

28. Avoid using a macro if you can use a simple function

The fragment is taken from ReactOS project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V640 The code's operational logic does not correspond with its formatting. The second statement will always be executed. It is possible that curly brackets are missing.

#define stat64_to_stat(buf64, buf)   \
    buf->st_dev   = (buf64)->st_dev;   \
    buf->st_ino   = (buf64)->st_ino;   \
    buf->st_mode  = (buf64)->st_mode;  \
    buf->st_nlink = (buf64)->st_nlink; \
    buf->st_uid   = (buf64)->st_uid;   \
    buf->st_gid   = (buf64)->st_gid;   \
    buf->st_rdev  = (buf64)->st_rdev;  \
    buf->st_size  = (_off_t)(buf64)->st_size;  \
    buf->st_atime = (time_t)(buf64)->st_atime; \
    buf->st_mtime = (time_t)(buf64)->st_mtime; \
    buf->st_ctime = (time_t)(buf64)->st_ctime; \

int CDECL _tstat(const _TCHAR* path, struct _stat * buf)
{
  int ret;
  struct __stat64 buf64;

  ret = _tstat64(path, &buf64);
  if (!ret)
    stat64_to_stat(&buf64, buf);
  return ret;
}

Explanation

This time the code example will be quite lengthy. Fortunately it's rather easy, so it shouldn't be hard to understand.

There was the following idea. If you manage to get file information by means of _tstat64() function, then put these data into the structure of _stat type. We use a stat64_to_stat macro to save data.

The macro is incorrectly implemented. The operations it executes are not grouped in blocks with curly brackets { }. As a result the conditional operator body is only the first string of the macro. If you expand the macro, you'll get the following:

if (!ret)
  buf->st_dev   = (&buf64)->st_dev;
buf->st_ino   = (&buf64)->st_ino;
buf->st_mode  = (&buf64)->st_mode;

Consequently the majority of the structure members are copied regardless of the whether the information was successfully received or not.

This is certainly an error, but in practice it's not a fatal one. The uninitialized memory cells are just copied in vain. We had a bit of luck here. But I've come across more serious errors, connected with such poorly written macros.

Correct code

The easiest variant is just to add curly brackets to the macro. To add do { .... } while (0) is a slightly better variant. Then after the macro and the function you can put a semicolon ';'.

#define stat64_to_stat(buf64, buf)   \
  do { \
    buf->st_dev   = (buf64)->st_dev;   \
    buf->st_ino   = (buf64)->st_ino;   \
    buf->st_mode  = (buf64)->st_mode;  \
    buf->st_nlink = (buf64)->st_nlink; \
    buf->st_uid   = (buf64)->st_uid;   \
    buf->st_gid   = (buf64)->st_gid;   \
    buf->st_rdev  = (buf64)->st_rdev;  \
    buf->st_size  = (_off_t)(buf64)->st_size;  \
    buf->st_atime = (time_t)(buf64)->st_atime; \
    buf->st_mtime = (time_t)(buf64)->st_mtime; \
    buf->st_ctime = (time_t)(buf64)->st_ctime; \
  } while (0)

Recommendation

I cannot say that macros are my favorite. I know there is no way to code without them, especially in C. Nevertheless I try to avoid them if possible, and would like to appeal to you not to overuse them. My macro hostility has three reasons:

  • It's hard to debug the code.
  • It's much easier to make an error.
  • The code gets hard to understand especially when some macros use another macros.

A lot of other errors are connected with macros. The one I've given as an example shows very clearly that sometimes we don't need macros at all. I really cannot grasp the idea of why the authors didn't use a simple function instead. Advantages of a function over a macro:

  • The code is simpler. You don't have to spend additional time writing it and, aligning some wacky symbols \.
  • The code is more reliable (the error given as an example won't be possible in the code at all)

Concerning the disadvantages, I can only think of optimization. Yes, the function is called but it's not that serious at all.

However, let's suppose that it's a crucial thing to us, and meditate on the topic of optimization. First of all, there is a nice keyword inline which you can use. Secondly, it would be appropriate to declare the function as static. I reckon it can be enough for the compiler to build in this function and not to make a separate body for it.

In point of fact you don't have to worry about it at all, as the compilers have become really smart. Even if you write a function without any inline/static, the compiler will build it in; if it considers that it's worth doing it. But don't really bother going into such details. It's much better to write a simple and understandable code, it'll bring more benefit.

To my mind, the code should be written like this:

static void stat64_to_stat(const struct __stat64 *buf64,
                           struct _stat *buf)
{
  buf->st_dev   = buf64->st_dev;
  buf->st_ino   = buf64->st_ino;
  buf->st_mode  = buf64->st_mode;
  buf->st_nlink = buf64->st_nlink;
  buf->st_uid   = buf64->st_uid;
  buf->st_gid   = buf64->st_gid;
  buf->st_rdev  = buf64->st_rdev;
  buf->st_size  = (_off_t)buf64->st_size;
  buf->st_atime = (time_t)buf64->st_atime;
  buf->st_mtime = (time_t)buf64->st_mtime;
  buf->st_ctime = (time_t)buf64->st_ctime;
}

Actually we can make even more improvements here. In C++ for example, it's better to pass not the pointer, but a reference. The usage of pointers without the preliminary check doesn't really look graceful. But this is a different story, I won't talk about it in a section on macros.

29. Use a prefix increment operator (++i) in iterators instead of a postfix (i++) operator

The fragment is taken from the Unreal Engine 4 project. Ineffective code is detected by the following PVS-Studio diagnostic: V803 Decreased performance. In case 'itr' is iterator it's more effective to use prefix form of increment. Replace iterator++ with ++iterator.

void FSlateNotificationManager::GetWindows(....) const
{
  for( auto Iter(NotificationLists.CreateConstIterator());
       Iter; Iter++ )
  {
    TSharedPtr<SNotificationList> NotificationList = *Iter;
    ....
  }
}

Explanation

If you hadn't read the title of the article, I think it would've been quite hard to notice an issue in the code. At first sight, it looks like the code is quite correct, but it's not perfect. Yes, I am talking about the postfix increment - 'Iter++'. Instead of a postfix form of the increment iterator, you should rather use a prefix analogue, i.e. to substitute 'Iter++' for '++Iter'. Why should we do it, and what's the practical value of it? Here is the story.

Effective code:

for( auto Iter(NotificationLists.CreateConstIterator());
     Iter; ++Iter)

Recommendation

The difference between a prefix and a postfix form is well known to everybody. I hope that the internal structure distinctions (which show us the operational principles) are not a secret as well. If you have ever done the operator overloading, then you must be aware of it. If not - I'll give a brief explanation. (All the others can skip this paragraph and go to the one, which follows the code examples with operator overloading)

The prefix increment operator changes an object's state, and returns itself in the changed form. No temporary objects required. Then the prefix increment operator may look like this:

MyOwnClass& operator++()
{
  ++meOwnField;
  return (*this);
}

A postfix operator also changes the object's state but returns the previous state of the object. It does so by creating a temporary object, then the postfix increment operator overloading code will look like this:

MyOwnClass operator++(int)
{
  MyOWnCLass tmp = *this;
  ++(*this);
  return tmp;
}

Looking at these code fragments, you can see that an additional operation of creating a temporary object is used. How crucial is it in practice?

Today's compilers are smart enough to do the optimization, and to not create temporary objects if they are of no use. That's why in the Release version it's really hard to see the difference between 'it++' and '++it'.

But it is a completely different story when debugging the program in the Debug-mode. In this case the difference in the performance can be really significant.

For example, in this article there are some examples of estimation of the code running time using prefix and postfix forms of increment operators in the Debug-version. We see that is almost 4 times longer to use the postfix forms.

Those, who will say, "And? In the Release version it's all the same!" will be right and wrong at the same time. As a rule we spend more time working on the Debug-version while doing the Unit-tests, and debugging the program. So quite a good deal of time is spent working with the Debug version of software, which means that we don't want to waste time waiting.

In general I think we've managed to answer the question - "Should we use the prefix increment operator (++i) instead a of postfix operator (i++) for iterators". Yes, you really should. You'll get a nice speed-up in the Debug version. And if the iterators are quite "heavy", then the benefit will be even more appreciable.

References (reading recommendation):

30. Visual C++ and wprintf() function

The fragment is taken from Energy Checker SDK. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V576 Incorrect format. Consider checking the second actual argument of the 'wprintf' function. The pointer to string of wchar_t type symbols is expected.

int main(void) {
  ...
  char *p = NULL;
  ...
  wprintf(
    _T("Using power link directory: %s\n"),
    p
  );
  ...
}

Explanation

Note: The first error is in the usage of _T for specifying a string in wide-character format. To use L prefix will be the correct variant here. However this mistake is not a crucial one and is not of a big interest to us. The code simply won't be compiled if we don't use a wide-character format and _T will expand into nothing.

If you want a wprintf() function to print a char* type string, you should use "%S" in the format string.

Many Linux programmers don't see where the pitfall is. The thing is that Microsoft quite strangely implemented such functions as wsprintf. If we work in Visual C++ with the wsprintf function, then we should use "%s" to print wide-character strings, at the same time to print char* strings we need "%S". So it's just a weird case. Those who develop cross platform applications quite often fall into this trap.

Correct code

The code I give here as a way to correct the issue is really not the most graceful one, but I still want to show the main point of corrections to make.

char *p = NULL;
...
#ifdef defined(_WIN32)
wprintf(L"Using power link directory: %S\n"), p);
#else
wprintf(L"Using power link directory: %s\n"), p);
#endif

Recommendation

I don't have any particular recommendation here. I just wanted to warn you about some surprises you may get if you use functions such as wprintf().

Starting from Visual Studio 2015 there was a solution suggested for writing a portable code. For compatibility with ISO C (C99), you should point out to the preprocessor a _CRT_STDIO_ISO_WIDE_SPECIFIERS macro.

In this case the code:

const wchar_t *p = L"abcdef";
const char *x = "xyz";
wprintf(L"%S %s", p, x);

is correct.

The analyzer knows about _CRT_STDIO_ISO_WIDE_SPECIFIERS and takes it into account when doing the analysis.

By the way, if you turn on the compatibility mode with ISO C (the _CRT_STDIO_ISO_WIDE_SPECIFIERS macro is declared), you can get the old behavior, using the specifier of "%Ts" format.

In general the story about the wide - character symbols is quite intricate, and goes beyond the frames of one short article. To investigate the topic more thoroughly, I recommend doing some reading on the topic:

31. In C and C++ arrays are not passed by value

The fragment is taken from the game 'Wolf'. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V511 The sizeof() operator returns size of the pointer, and not of the array, in 'sizeof (src)' expression.

ID_INLINE mat3_t::mat3_t( float src[ 3 ][ 3 ] ) {
  memcpy( mat, src, sizeof( src ) );
}

Explanation

Sometimes programmers forget that in C/C++ you cannot pass an array to a function by value. This is because a pointer to an array is passed as an argument. Numbers in square brackets mean nothing, they only serve as a kind of hint to the programmer, which array size is supposed to be passed. In fact, you can pass an array of a completely different size. For example, the following code will be successfully compiled:

void F(int p[10]) { }
void G()
{
  int p[3];
  F(p);
}

Correspondingly, the sizeof(src) operator evaluates not the array size, but the size of the pointer. As a result, memcpy() will only copy part of the array. Namely, 4 or 8 bytes, depending on the size of the pointer (exotic architectures don't count).

Correct code

The simplest variant of such code can be like this:

ID_INLINE mat3_t::mat3_t( float src[ 3 ][ 3 ] ) {
  memcpy(mat, src, sizeof(float) * 3 * 3);
}

Recommendation

There are several ways of making your code more secure.

The array size is known. You can make the function take the reference to an array. But not everyone knows that you can do this, and even fewer people are aware of how to write it. So I hope that this example will be interesting and useful:

ID_INLINE mat3_t::mat3_t( float (&src)[3][3] )
{
  memcpy( mat, src, sizeof( src ) );
}

Now, it will be possible to pass to the function an array only of the right size. And most importantly, the sizeof() operator will evaluate the size of the array, not a pointer.

Yet another way of solving this problem is to start using std::array class.

The array size is not known. Some authors of books on programming advise to use std::vector class, and other similar classes, but in practice it's not always convenient.

Sometimes you want to work with a simple pointer. In this case you should pass two arguments to the function: a pointer, and the number of elements. However, in general this is bad practice, and it can lead to a lot of bugs.

In such cases, some thoughts given in "C++ Core Guidelines" can be useful to read. I suggest reading "Do not pass an array as a single pointer". All in all it would be a good thing to read the "C++ Core Guidelines" whenever you have free time. It contains a lot of useful ideas.

32. Dangerous printf

The fragment is taken from TortoiseSVN project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V618 It's dangerous to call the 'printf' function in such a manner, as the line being passed could contain format specification. The example of the safe code: printf("%s", str);

BOOL CPOFile::ParseFile(....)
{
  ....
  printf(File.getloc().name().c_str());
  ....
}

Explanation

When you want to print or, for example, to write a string to the file, many programmers write code that resembles the following:

printf(str);
fprintf(file, str);

A good programmer should always remember that these are extremely unsafe constructions. The thing is, that if a formatting specifier somehow gets inside the string, it will lead to unpredictable consequences.

Let's go back to the original example. If the file name is "file%s%i%s.txt", then the program may crash or print some rubbish. But that's only a half of the trouble. In fact, such a function call is a real vulnerability. One can attack programs with its help. Having prepared strings in a special way, one can print private data stored in the memory.

More information about these vulnerabilities can be found in this article. Take some time to look through it; I'm sure it will be interesting. You'll find not only theoretical basis, but practical examples as well.

Correct code

printf("%s", File.getloc().name().c_str());

Recommendation

Printf()-like functions can cause a lot of security related issues. It is better not to use them at all, but switch to something more modern. For example, you may find boost::format or std::stringstream quite useful.

In general, sloppy usage of the functions printf(), sprintf(), fprintf(), and so on, not only can lead to incorrect work of the program, but cause potential vulnerabilities, that someone can take advantage of.

33. Never dereference null pointers

This bug was found in GIT's source code. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V595 The 'tree' pointer was utilized before it was verified against nullptr. Check lines: 134, 136.

void mark_tree_uninteresting(struct tree *tree)
{
  struct object *obj = &tree->object;
  if (!tree)
    return;
  ....
}

Explanation

There is no doubt that it's bad practice to dereference a null pointer, because the result of such dereferencing is undefined behavior. We all agree about the theoretical basis behind this.

But when it comes to practice, programmers start debating. There are always people who claim that this particular code will work correctly. They even bet their life for it - it has always worked for them! And then I have to give more reasons to prove my point. That's why this article topic is another attempt to change their mind.

I have deliberately chosen such an example that will provoke more discussion. After the tree pointer is dereferenced, the class member isn't just using, but evaluating, the address of this member. Then if (tree == nullptr), the address of the member isn't used in any way, and the function is exited. Many consider this code to be correct.

But it is not so. You shouldn't code in such a way. Undefined behavior is not necessarily a program crash when the value is written at a null address, and things like that. Undefined behavior can be anything. As soon as you have dereferenced a pointer which is equal to null, you get an undefined behavior. There is no point in further discussion about the way the program will operate. It can do whatever it wants.

One of the signs of undefined behavior is that the compiler can totally remove the "if (!tree) return;" - the compiler sees that the pointer has already been dereferenced, so the pointer isn't null and the compiler concludes that the check can be removed. This is just one of a great many scenarios, which can cause the program to crash.

I recommend having a look at the article where everything is explained in more details: http://www.viva64.com/en/b/0306/

Correct code

void mark_tree_uninteresting(struct tree *tree)
{
  if (!tree)
    return;
  struct object *obj = &tree->object;
  ....
}

Recommendation

Beware of undefined behavior, even if it seems as if everything is working fine. There is no need to risk that much. As I have already written, it's hard to imagine how it may show its worth. Just try avoiding undefined behavior, even if it seems like everything works fine.

One may think that he knows exactly how undefined behavior works. And, he may think that this means that he is allowed to do something that others can't, and everything will work. But it is not so. The next section is to underline the fact that undefined behavior is really dangerous.

34. Undefined behavior is closer than you think

This time it's hard to give an example from a real application. Nevertheless, I quite often see suspicious code fragments which can lead to the problems described below. This error is possible when working with large array sizes, so I don't know exactly which project might have arrays of this size. We don't really collect 64-bit errors, so today's example is simply contrived.

Let's have a look at a synthetic code example:

size_t Count = 1024*1024*1024; // 1 Gb
if (is64bit)
  Count *= 5; // 5 Gb
char *array = (char *)malloc(Count);
memset(array, 0, Count);

int index = 0;
for (size_t i = 0; i != Count; i++)
  array[index++] = char(i) | 1;

if (array[Count - 1] == 0)
  printf("The last array element contains 0.\n");

free(array);

Explanation

This code works correctly if you build a 32-bit version of the program; if we compile the 64-bit version, the situation will be more complicated.

A 64-bit program allocates a 5 GB buffer and initially fills it with zeros. The loop then modifies it, filling it with non-zero values: we use "| 1" to ensure this.

And now try to guess how the code will run if it is compiled in x64 mode using Visual Studio 2015? Have you got the answer? If yes, then let's continue.

If you run a debug version of this program, it'll crash because it'll index out of bounds. At some point the index variable will overflow, and its value will become ?2147483648 (INT_MIN).

Sounds logical, right? Nothing of the kind! This is an undefined behavior, and anything can happen.

To get more in-depth information, I suggest the following links:

An interesting thing - when I or somebody else says that this is an example of undefined behavior, people start grumbling. I don't know why, but it feels like they assume that they know absolutely everything about C++, and how compilers work.

But in fact they aren't really aware of it. If they knew, they would't say something like this (group opinion):

This is some theoretical nonsense. Well, yes, formally the 'int' overflow leads to an undefined behavior. But it's nothing more but some jabbering. In practice, we can always tell what we will get. If you add 1 to INT_MAX then we'll have INT_MIN. Maybe somewhere in the universe there are some exotic architectures, but my Visual C++ / GCC compiler gives an incorrect result.

And now without any magic, I will give a demonstration of UB using a simple example, and not on some fairy architecture either, but a Win64-program.

It would be enough to build the example given above in the Release mode and run it. The program will cease crashing, and the warning "the last array element contains 0" won't be issued.

The undefined behavior reveals itself in the following way. The array will be completely filled, in spite of the fact that the index variable of int type isn't wide enough to index all the array elements. Those who still don't believe me, should have a look at the assembly code:

  int index = 0;
  for (size_t i = 0; i != Count; i++)
000000013F6D102D  xor         ecx,ecx
000000013F6D102F  nop
    array[index++] = char(i) | 1;
000000013F6D1030  movzx       edx,cl
000000013F6D1033  or          dl,1
000000013F6D1036  mov         byte ptr [rcx+rbx],dl
000000013F6D1039  inc         rcx
000000013F6D103C  cmp         rcx,rdi
000000013F6D103F  jne         main+30h (013F6D1030h)

Here is the UB! And no exotic compilers were used, it's just VS2015.

If you replace int with unsigned, the undefined behavior will disappear. The array will only be partially filled, and at the end we will have a message - "the last array element contains 0".

Assembly code with the unsigned:

  unsigned index = 0;
000000013F07102D  xor         r9d,r9d
  for (size_t i = 0; i != Count; i++)
000000013F071030  mov         ecx,r9d
000000013F071033  nop         dword ptr [rax]
000000013F071037  nop         word ptr [rax+rax]
    array[index++] = char(i) | 1;
000000013F071040  movzx       r8d,cl
000000013F071044  mov         edx,r9d
000000013F071047  or          r8b,1
000000013F07104B  inc         r9d
000000013F07104E  inc         rcx
000000013F071051  mov         byte ptr [rdx+rbx],r8b
000000013F071055  cmp         rcx,rdi
000000013F071058  jne         main+40h (013F071040h)

Correct code

You must use proper data types for your programs to run properly. If you are going to work with large-size arrays, forget about int and unsigned. So the proper types are ptrdiff_t, intptr_t, size_t, DWORD_PTR, std::vector::size_type and so on. In this case it is size_t:

size_t index = 0;
for (size_t i = 0; i != Count; i++)
  array[index++] = char(i) | 1;

Recommendation

If the C/C++ language rules result in undefined behavior, don't argue with them or try to predict the way they'll behave in the future. Just don't write such dangerous code.

There are a whole lot of stubborn programmers who don't want to see anything suspicious in shifting negative numbers, comparing this with null or signed types overflowing.

Don't be like that. The fact that the program is working now doesn't mean that everything is fine. The way UB will reveal itself is impossible to predict. Expected program behavior is one of the variants of UB.

35. Adding a new constant to enum don't forget to correct switch operators

The fragment is taken from the Appleseed project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V719 The switch statement does not cover all values of the 'InputFormat' enum: InputFormatEntity.

enum InputFormat
{
    InputFormatScalar,
    InputFormatSpectralReflectance,
    InputFormatSpectralIlluminance,
    InputFormatSpectralReflectanceWithAlpha,
    InputFormatSpectralIlluminanceWithAlpha,
    InputFormatEntity
};

switch (m_format)
{
  case InputFormatScalar:
    ....
  case InputFormatSpectralReflectance:
  case InputFormatSpectralIlluminance:
    ....
  case InputFormatSpectralReflectanceWithAlpha:
  case InputFormatSpectralIlluminanceWithAlpha:
    ....
}

Explanation

Sometimes we need to add a new item to an existing enumeration (enum), and when we do, we also need to proceed with caution - as we will have to check where we have referenced the enum throughout all of our code, e.g., in every switch statement and if chain. A situation like this can be seen in the code given above.

InputFormatEntity was added to the InputFormat - I'm making that assumption based on the fact that the constant has been added to the end. Often, programmers add new constants to the end of enum, but then forget to check their code to make sure that they've dealt with the new constant properly throughout, and corrected the switch operator.

As a result we have a case when "m_format==InputFormatEntity" isn't handled in any way.

Correct code

switch (m_format)
{
  case InputFormatScalar:
  ....
  case InputFormatSpectralReflectance:
  case InputFormatSpectralIlluminance:
  ....
  case InputFormatSpectralReflectanceWithAlpha:
  case InputFormatSpectralIlluminanceWithAlpha:
  ....
  case InputFormatEntity:
  ....
}

Recommendation

Let's think, how can we reduce such errors through code refactoring? The easiest, but not a very effective solution is to add a "default:", that will cause a message to appear, e.g.:

switch (m_format)
{
  case InputFormatScalar:
  ....
  ....
  default:
    assert(false);
    throw "Not all variants are considered"
}

Now if the m_format variable is InputFormatEntity, we'll see an exception. Such an approach has two big faults:

1. As there is the chance that this error won't show up during testing (if during the test runs, m_format is not equal to InputFormatEntity), then this error will make its way into the Release build and would only show up later - during runtime at a customer's site. It's bad if customers have to report such problems!

2. If we consider getting into default as an error, then you have to write a case for all of the enum's possible values. This is very inconvenient, especially if there are a lot of these constants in the enumeration. Sometimes it's very convenient to handle different cases in the default section.

I suggest solving this problem in the following way; I can't say that it's perfect, but at least it's something.

When you define an enum, make sure you also add a special comment. You can also use a keyword and an enumeration name.

Example:

enum InputFormat
{
  InputFormatScalar,
  ....
  InputFormatEntity
  //If you want to add a new constant, find all ENUM:InputFormat.
};

switch (m_format) //ENUM:InputFormat
{
  ....
}

In the code above, when you change the InputFormat enum, you are directed to look for "ENUM:InputFormat" in the source code of the project.

If you are in a team of developers, you would make this convention known to everybody, and also add it to your coding standards and style guide. If somebody fails to follow this rule, it will be very sad.

36. If something strange is happening to your PC, check its memory

I think you got pretty tired looking at numerous error patterns. So this time, let's take a break from looking at code.

A typical situation - your program is not working properly. But you have no idea what's going on. In such situations I recommend not rushing to blame someone, but focus on your code. In 99.99% of cases, the root of the evil is a bug that was brought by someone from your development team. Very often this bug is really stupid and banal. So go ahead and spend some time looking for it!

The fact that the bug occurs from time to time means nothing. You may just have a Heisenbug.

Blaming the compiler would be an even worse idea. It may do something wrong, of course, but very rarely. It will be very awkward if you find out that it was an incorrect use of sizeof(), for example. I have a post about that in my blog: The compiler is to blame for everything

But to set the record straight, I should say that there are exceptions. Very seldom the bug has nothing to do with the code. But we should be aware that such a possibility exists. This will help us to stay sane.

I'll demonstrate this using an example of a case that once happened with me. Fortunately, I have the necessary screenshots.

I was making a simple test project that was intended to demonstrate the abilities of the Viva64 analyzer (the predecessor of PVS-Studio), and this project was refusing to work correctly.

After long and tiresome investigations, I saw that one memory slot is causing all this trouble. One bit, to be exact. You can see on the picture that I am in debug mode, writing the value "3" in this memory cell.

 

 

After the memory is changed, the debugger reads the values to display in the window, and shows number 2: See, there is 0x02. Although I've set the "3" value. The low-order bit is always zero.

 

 

A memory test program confirmed the problem. It's strange that the computer was working normally without any problems. Replacement of the memory bank finally let my program work correctly.

I was very lucky. I had to deal with a simple test program. And still I spent a lot of time trying to understand what was happening. I was reviewing the assembler listing for more than two hours, trying to find the cause of the strange behavior. Yes, I was blaming the compiler for it.

I can't imagine how much more effort it would take, if it were a real program. Thank God I didn't have to debug anything else at that moment.

Recommendation

Always look for the error in your code. Do not try to shift responsibility.

However, if the bug reoccurs only on your computer for more than a week, it may be a sign that it's not because of your code.

Keep looking for the bug. But before going home, run an overnight RAM test. Perhaps, this simple step will save your nerves.

37. Beware of the 'continue' operator inside do {...} while (...)

Fragment taken from the Haiku project (inheritor of BeOS). The code contains an error that PVS-Studio analyzer diagnoses in the following way: V696 The 'continue' operator will terminate 'do { ... } while (FALSE)' loop because the condition is always false.

do {
  ....
  if (appType.InitCheck() == B_OK&& appType.GetAppHint(&hintRef) == B_OK&& appRef == hintRef)
  {
    appType.SetAppHint(NULL);
    // try again
    continue;
  }
  ....
} while (false);

Explanation

The way continue works inside the do-while loop, is not the way some programmers expect it to. When continue is encountered, there will always be a check of loop termination condition. I'll try to explain this in more details. Suppose the programmer writes code like this:

for (int i = 0; i < n; i++)
{
  if (blabla(i))
    continue;
  foo();
}

Or like this:

while (i < n)
{
  if (blabla(i++))
    continue;
  foo();
}

Most programmers by intuition understand that when continue is encountered, the controlling condition (i < n) will be (re)evaluated, and that the next loop iteration will only start if the evaluation is true. But when a programmer writes code:

do
{
  if (blabla(i++))
    continue;
  foo();
} while (i < n);

the intuition often fails, as they don't see a condition above the continue, and it seems to them that the continue will immediately trigger another loop iteration. This is not the case, and continue does as it always does - causes the controlling condition to be re-evaluated.

It depends on sheer luck if this lack of understanding of continue will lead to an error. However, the error will definitely occur if the loop condition is always false, as it is in the code snippet given above, where the programmer planned to carry out certain actions through subsequent iterations. A comment in the code "//try again" clearly shows their intention to do so. There will of course be no "again", as the condition is always false, and so once continue is encountered, the loop will terminate.

In other words, it turns out that in the construction of this do {...} while (false), the continue is equivalent to using break.

Correct code

There are many options to write correct code. For example, create an infinite loop, and use continue to loop, and break to exit.

for (;;) {
  ....
  if (appType.InitCheck() == B_OK&& appType.GetAppHint(&hintRef) == B_OK&& appRef == hintRef)
  {
    appType.SetAppHint(NULL);
    // try again
    continue;
  }
  ....
  break;
};

Recommendation

Try to avoid continue inside do { ... } while (...). Even if you really know how it all works. The thing is that you could slip and make this error, and/or that your colleagues might read the code incorrectly, and then modify it incorrectly. I will never stop saying it: a good programmer is not the one who knows and uses different language tricks, but the one who writes clear understandable code, that even a newbie can comprehend.

38. Use nullptr instead of NULL from now on

New C++ standards brought quite a lot of useful changes. There are things which I would not rush into using straight away, but there are some changes which need to be applied immediately, as they will bring with them, significant benefits.

One such modernization is the keyword nullptr, which is intended to replace the NULL macro.

Let me remind you that in C++ the definition of NULL is 0, nothing more.

Of course, it may seem that this is just some syntactic sugar. And what's the difference, if we write nullptr or NULL? But there is a difference! Using nullptr helps to avoid a large variety of errors. I'll show this using examples.

Suppose there are two overloaded functions:

void Foo(int x, int y, const char *name);
void Foo(int x, int y, int ResourceID);

A programmer might write the following call:

Foo(1, 2, NULL);

And that same programmer might be sure that he is in fact calling the first function by doing this. It is not so. As NULL is nothing more than 0, and zero is known to have int type, the second function will be called instead of the first.

However, if the programmer had used nullptr no such error would occur and the first function would have been called. Another common enough use of NULL is to write code like this:

if (unknownError)
  throw NULL;

To my mind, it is suspicious to generate an exception passing the pointer. Nevertheless sometimes people do so. Apparently, the developer needed to write the code in this way. However, discussions on whether it is good or bad practice to do so, go beyond the scope of this note.

What is important, is that the programmer decided to generate an exception in the case of an unknown error and "send" a null pointer into the outer world.

In fact it is not a pointer but int. As a result the exception handling will happen in a way that the programmer didn't expect.

"throw nullptr;" code saves us from misfortune, but this does not mean that I believe this code to be totally acceptable.

In some cases, if you use nullptr, the incorrect code will not compile.

Suppose that some WinApi function returns a HRESULT type. The HRESULT type has nothing to do with the pointer. However, it is quite possible to write nonsensical code like this:

if (WinApiFoo(a, b, c) != NULL)

This code will compile, because NULL is 0 and of int type, and HRESULT is a long type. It is quite possible to compare values of int and long type. If you use nullptr, then the following code will not compile:

if (WinApiFoo(a, b, c) != nullptr)

Because of the compiler error, the programmer will notice and fix the code.

I think you get the idea. There are plenty such examples. But these are mostly synthetic examples. And it is always not very convincing. So are there any real examples? Yes, there are. Here is one of them. The only thing - it's not very graceful or short.

This code is taken from the MTASA project.

So, there exists RtlFillMemory(). This can be a real function or a macro. It doesn't matter. It is similar to the memset() function, but the 2nd and 3rd argument switched their places. Here's how this macro can be declared:

#define RtlFillMemory(Destination,Length,Fill) \
  memset((Destination),(Fill),(Length))

There is also FillMemory(), which is nothing more than RtlFillMemory():

#define FillMemory RtlFillMemory

Yes, everything is long and complicated. But at least it is an example of real erroneous code.

And here's the code that uses the FillMemory macro.

LPCTSTR __stdcall GetFaultReason ( EXCEPTION_POINTERS * pExPtrs )
{
  ....
  PIMAGEHLP_SYMBOL pSym = (PIMAGEHLP_SYMBOL)&g_stSymbol ;
  FillMemory ( pSym , NULL , SYM_BUFF_SIZE ) ;
  ....
}

This code fragment has even more bugs. We can clearly see that at least the 2 and 3 arguments are confused here. That's why the analyzer issues 2 warnings V575:

  • V575 The 'memset' function processes value '512'. Inspect the second argument. crashhandler.cpp 499
  • V575 The 'memset' function processes '0' elements. Inspect the third argument. crashhandler.cpp 499

The code compiled because NULL is 0. As a result, 0 array elements get filled. But in fact the error is not only about this. NULL is in general not appropriate here. The memset() function works with bytes, so there's no point in trying to make it fill the memory with NULL values. This is absurd. Correct code should look like this:

FillMemory(pSym, SYM_BUFF_SIZE, 0);

Or like this:

ZeroMemory(pSym, SYM_BUFF_SIZE);

But it's not the main point, which is that this meaningless code compiles successfully. However, if the programmer had gotten into the habit of using nullptr instead of NULL and written this instead:

FillMemory(pSym, nullptr, SYM_BUFF_SIZE);

the complier would have emitted a error message, and the programmer would realize that they did something wrong, and would pay more attention to the way they code.

Note. I understand that in this case NULL is not to blame. However, it is because of NULL that the incorrect code compiles without any warnings.

Recommendation

Start using nullptr. Right now. And make necessary changes in the coding standard of your company.

Using nullptr will help to avoid stupid errors, and thus will slightly speed up the development process.

39. Why incorrect code works

This bug was found in Miranda NG's project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V502 Perhaps the '?:' operator works in a different way than was expected. The '?:' operator has a lower priority than the '|' operator..

#define MF_BYCOMMAND 0x00000000L
void CMenuBar::updateState(const HMENU hMenu) const
{
  ....
  ::CheckMenuItem(hMenu, ID_VIEW_SHOWAVATAR,
    MF_BYCOMMAND | dat->bShowAvatar ? MF_CHECKED : MF_UNCHECKED);
  ....
}

Explanation

We have seen a lot of cases that lead to incorrect working of the program, this time I would like to raise a different thought-provoking topic for discussion. Sometimes we see that totally incorrect code happens, against all odds, to work just fine! Now, for experienced programmers this really comes as no surprise (another story), but for those that have recently started learning C/C++, well, it might be a little baffling. So today, we'll have a look at just such an example.

In the code shown above, we need to call CheckMenuItem() with certain flags set; and, on first glance we see that if bShowAvatar is true, then we need to bitwise OR MF_BYCOMMAND with MF_CHECKED - and conversely, with MF_UNCHECKED if it's false. Simple!

In the code above the programmers have chosen the very natural ternary operator to express this (the operator is a convenient short version of if-then-else):

MF_BYCOMMAND | dat->bShowAvatar ? MF_CHECKED : MF_UNCHECKED

The thing is that the priority of |operator is higher than of ?: operator. (see Operation priorities in C/C++). As a result, there are two errors at once.

The first error is that the condition has changed. It is no longer - as one might read it - "dat->bShowAvatar", but "MF_BYCOMMAND | dat->bShowAvatar".

The second error - only one flag gets chosen - either MF_CHECKED or MF_UNCHECKED. The flag MF_BYCOMMAND is lost.

But despite these errors the code works correctly! Reason - sheer stroke of luck. The programmer was just lucky that the MF_BYCOMMAND flag is equal to 0x00000000L. As the MF_BYCOMMAND flag is equal to 0, then it doesn't affect the code in any way. Probably some experienced programmers have already gotten the idea, but I'll still give some comments in case there are beginners here.

First let's have a look at a correct expression with additional parenthesis:

MF_BYCOMMAND | (dat->bShowAvatar ? MF_CHECKED : MF_UNCHECKED)

Replace macros with numeric values:

0x00000000L | (dat->bShowAvatar ? 0x00000008L : 0x00000000L)

If one of the operator operands | is 0, then we can simplify the expression:

dat->bShowAvatar ? 0x00000008L : 0x00000000L

Now let's have a closer look at an incorrect code variant:

MF_BYCOMMAND | dat->bShowAvatar ? MF_CHECKED : MF_UNCHECKED

Replace macros with numeric values:

0x00000000L | dat->bShowAvatar ? 0x00000008L : 0x00000000L

In the subexpression "0x00000000L | dat->bShowAvatar" one of the operator operands | is 0. Let's simplify the expression:

dat->bShowAvatar ? 0x00000008L : 0x00000000L

As a result we have the same expression, this is why the erroneous code works correctly; another programming miracle has occurred.

Correct code

There are various ways to correct the code. One of them is to add parentheses, another - to add an intermediate variable. A good old if operator could also be of help here:

if (dat->bShowAvatar)
  ::CheckMenuItem(hMenu, ID_VIEW_SHOWAVATAR,
                  MF_BYCOMMAND | MF_CHECKED);
else
  ::CheckMenuItem(hMenu, ID_VIEW_SHOWAVATAR,
                  MF_BYCOMMAND | MF_UNCHECKED);

I really don't insist on using this exact way to correct the code. It might be easier to read it, but it's slightly lengthy, so it's more a matter of preferences.

Recommendation

My recommendation is simple - try to avoid complex expressions, especially with ternary operators. Also don't forget about parentheses.

As it was stated before in chapter N4, the ?: is very dangerous. Sometimes it just slips your mind that it has a very low priority and it's easy to write an incorrect expression. People tend to use it when they want to clog up a string, so try not to do that.

40. Start using static code analysis

It is strange to read such big pieces of text, written by a developer of a static code analyzer, and not to hear recommendations about the usage of it. So here it is.

Fragment taken from the Haiku project (inheritor of BeOS). The code contains an error that PVS-Studio analyzer diagnoses in the following way: V501 There are identical sub-expressions to the left and to the right of the '<' operator: lJack->m_jackType < lJack->m_jackType

int compareTypeAndID(....)
{
  ....
  if (lJack && rJack)
  {
    if (lJack->m_jackType < lJack->m_jackType)
    {
      return -1;
    }
    ....
}

Explanation

It's just a usual typo. Instead of rJack it was accidentally written lJack in the right part of the expression.

This typo is a simple one indeed, but the situation is quite complicated. The thing is that the programming style, or other methods, are of no help here. People just make mistakes while typing and there is nothing you can do about it.

It's important to emphasize that it's not a problem of some particular people or projects. No doubt, all people can be mistaken, and even professionals involved in serious projects can be. Here is the proof of my words. You can see the simplest misprints like A == A, in such projects as: Notepad++, WinMerge, Chromium, Qt, Clang, OpenCV, TortoiseSVN, LibreOffice, CoreCLR, Unreal Engine 4 and so on.

So the problem is really there and it's not about students' lab works. When somebody tells me that experienced programmers don't make such mistakes, I usually send them this link.

Correct code

if (lJack->m_jackType < rJack->m_jackType)

Recommendation

First of all, let's speak about some useless tips.

  • Be careful while programming, and don't let errors sneak into your code (Nice words, but nothing more)
  • Use a good coding style (There isn't s a programming style which can help to avoid errors in the variable name)

What can really be effective?

  • Code review
  • Unit tests (TDD)
  • Static code analysis

I should say right away, that every strategy has its strong and weak sides. That's why the best way to get the most efficient and reliable, code is to use all of them together.

Code reviews can help us to find a great deal of different errors, and on top of this, they help us to improve readability of the code. Unfortunately shared reading of the text is quite expensive, tiresome and doesn't give a full validity guarantee. It's quite hard to remain alert, and find a typo looking at this kind of code:

qreal l = (orig->x1 - orig->x2)*(orig->x1 - orig->x2) +
          (orig->y1 - orig->y2)*(orig->y1 - orig->y1) *
          (orig->x3 - orig->x4)*(orig->x3 - orig->x4) +
          (orig->y3 - orig->y4)*(orig->y3 - orig->y4);

Theoretically, unit tests can save us. But it's only in theory. In practice, it's unreal to check all the possible execution paths; besides that, a test itself can have some errors too :)

Static code analyzers are mere programs, and not artificial intelligence. An analyzer can skip some errors and, on the contrary, display an error message for code which in actuality, is correct. But despite all these faults, it is a really useful tool. It can detect a whole lot of errors at an early stage.

A static code analyzer can be used as a cheaper version of Code Review. The program examines the code instead of a programmer doing it, and suggests checking certain code fragments more thoroughly.

Of course I would recommend using PVS-Studio code analyzer, which we are developing. But it's not the only one in the world; there are plenty of other free and paid tools to use. For example you can start with having a look at a free open Cppcheck analyzer. A good number of tools is given on Wikipedia: List of tools for static code analysis.

Attention:

  • A static analyzer can hurt your brain if not used correctly. One of the typical mistakes is to "get the maximum from the check mode options, and drown in the stream of warnings messages". That's one of many recommendations I could give, so to get a bigger list, could be useful to go to A, B.
  • A static analyzer should be used on a regular basis, not just from time to time, or when everything gets really bad. Some explanations: C, D.

Really, try using static code analyzers, you'll like them. It's a very nice sanitary tool.

Finally I would recommend reading an article by John Carmack: Static Code Analysis.

41. Avoid adding a new library to the project

Suppose you need to implement an X functionality in your project. Theorists of software development will say that you have to take the already existing library Y, and use it to implement the things you need. In fact, it is a classic approach in software development - reusing your own or others' previously created libraries (third-party libraries). And most programmers use this way.

However, those theorists in various articles and books, forget to mention what hell it will become to support several dozen third-party libraries in about 10 years.

I strongly recommend avoiding adding a new library to a project. Please don't get me wrong. I am not saying that you shouldn't use libraries at all, and write everything yourself. This would be insufficient, of course. But sometimes a new library is added to the project at the whim of some developer, intending to add a little cool small "feature" to the project. It's not hard to add a new library to the project, but then the whole team will have to carry the load of its support for many years.

Tracking the evolution of several large projects, I have seen quite a lot of problems caused by a large number of third-party libraries. I will probably enumerate only some of the issues, but this list should already provoke some thoughts:

  1. Adding new libraries promptly increases the project size. In our era of fast Internet and large SSD drives, this is not a big problem, of course. But, it's rather unpleasant when the download time from the version control system turns into 10 minutes instead of 1.
  2. Even if you use just 1% of the library capabilities, it is usually included in the project as a whole. As a result, if the libraries are used in the form of compiled modules (for example, DLL), the distribution size grows very fast. If you use the library as source code, then the compile time significantly increases.
  3. Infrastructure connected with the compilation of the project becomes more complicated. Some libraries require additional components. A simple example: we need Python for building. As a result, in some time you'll need to have a lot of additional programs to build a project. So the probability that something will fail increases. It's hard to explain, you need to experience it. In big projects something fails all the time, and you have to put a lot of effort into making everything work and compile.
  4. If you care about vulnerabilities, you must regularly update third-party libraries. It would be of interest to violators, to study the code libraries to search for vulnerabilities. Firstly, many libraries are open-source, and secondly, having found a weak point in one of the libraries, you can get a master exploit to many applications where the library is used.
  5. One the libraries may suddenly change the license type. Firstly, you have to keep that in mind, and track the changes. Secondly, it's unclear what to do if that happens. For example, once, a very widely used library softfloat moved to BSD from a personal agreement.
  6. You will have troubles upgrading to a new version of the compiler. There will definitely be a few libraries that won't be ready to adapt for a new compiler, you'll have to wait, or make your own corrections in the library.
  7. You will have problems when moving to a different compiler. For example, you are using Visual C++, and want to use Intel C++. There will surely be a couple of libraries where something is wrong.
  8. You will have problems moving to a different platform. Not necessarily even a totally different platform. Let's say, you'll decide to port a Win32 application to Win64. You will have the same problems. Most likely, several libraries won't be ready for this, and you'll wonder what to do with them. It is especially unpleasant when the library is lying dormant somewhere, and is no longer developing.
  9. Sooner or later, if you use lots of C libraries, where the types aren't stored in namespace, you'll start having name clashes. This causes compilation errors, or hidden errors. For example, a wrong enum constant can be used instead of the one you've intended to use.
  10. If your project uses a lot of libraries, adding another one won't seem harmful. We can draw an analogy with the broken windows theory. But consequently, the growth of the project turns into uncontrolled chaos.
  11. And there could be a lot of other downsides in adding new libraries, which I'm probably not aware of. But in any case, additional libraries increase the complexity of project support. Some issues can occur in a fragment where they were least expected to.

Again, I should emphasize; I don't say that we should stop using third-party libraries at all. If we have to work with images in PNG format in the program, we'll take the LibPNG library, and not reinvent the wheel.

But even working with PNG we need to stop and think. Do we really need a library? What do we want to do with the images? If the task is just to save an image in *.png file, you can get by with system functions. For example, if you have a Windows application, you could use WIC. And if you're already using an MFC library, there is no need to make the code more sophisticated, because there's a CImage class (see the discussion on StackOverflow). Minus one library - great!

Let me give you an example from my own practice. In the process of developing the PVS-Studio analyzer, we needed to use simple regular expressions in a couple of diagnostics. In general, I am convinced that static analysis isn't the right place for regular expressions. This is an extremely inefficient approach. I even wrote an article regarding this topic. But sometimes you just need to find something in a string with the help of a regular expression.

It was possible to add existing libraries, but it was clear that all of them would be redundant. At the same time we still needed regular expressions, and we had to come up with something.

Absolutely coincidentally, exactly at that moment I was reading a book "Beautiful Code" (ISBN 9780596510046). This book is about simple and elegant solutions. And there I came across an extremely simple implementation of regular expressions. Just a few dozen strings. And that's it!

I decided to use that implementation in PVS-Studio. And you know what? The abilities of this implementation are still enough for us; complex regular expressions are just not necessary for us.

Conclusion: Instead of adding a new library, we spent half an hour writing a needed functionality. We suppressed the desire to use one more library. And it turned out to be a great decision; the time showed that we really didn't need that library. And I am not talking about several months, we have happily used it for more than five years.

This case really convinced me that the simpler solution, the better. By avoiding adding new libraries (if possible), you make your project simpler.

Readers may be interested to know what the code for searching regular expressions was. We'll type it here from the book. See how graceful it is. This code was slightly changed when integrating to PVS-Studio, but its main idea remains unchanged. So, the code from the book:

 // regular expression format
// c Matches any "c" letter
//.(dot) Matches any (singular) symbol
//^ Matches the beginning of the input string
//$ Matches the end of the input string
# Match the appearance of the preceding character zero or
// several times

int matchhere(char *regexp, char *text);
int matchstar(int c, char *regexp, char *text);

// match: search for regular expression anywhere in text
int match(char *regexp, char *text)
{
  if (regexp[0] == '^')
    return matchhere(regexp+1, text);
  do { /* must look even if string is empty */
   if (matchhere(regexp, text))
     return 1;
  } while (*text++ != '\0');
  return 0;
}

// matchhere: search for regexp at beginning of text
int matchhere(char *regexp, char *text)
{
   if (regexp[0] == '\0')
     return 1;
   if (regexp[1] == '*')
     return matchstar(regexp[0], regexp+2, text);

   if (regexp[0] == '$'&& regexp[1] == '\0')
     return *text == '\0';
   if (*text!='\0'&& (regexp[0]=='.' || regexp[0]==*text))
     return matchhere(regexp+1, text+1);
   return 0;
}

// matchstar: search for c*regexp at beginning of text
int matchstar(int c, char *regexp, char *text)
{
  do {   /* * a * matches zero or more instances */
            more instances */
    if (matchhere(regexp, text))
      return 1;
  } while (*text != '\0'&& (*text++ == c || c == '.'));
  return 0;
}

Yes, this version is extremely simple, but for several years there was need to use more complex solutions. It really has got limited functionality, but there was no need to add anything more complicated, and I don't think there will be. This is a good example of where a simple solution turned out to be better than a complex one.

Recommendation

Don't hurry to add new libraries to the project; add one only when there is no other way to manage without a library.

Here are the possible workarounds:

  1. Have a look if the API of your system, or one of the already used libraries has a required functionality. It's a good idea to investigate this question.
  2. If you plan to use a small piece of functionality from the library, then it makes sense to implement it yourself. The argument to add a library "just in case" is no good. Almost certainly, this library won't be used much in the future. Programmers sometimes want to have universality that is actually not needed.
  3. If there are several libraries to resolve your task, choose the simplest one, which meets your needs. As I have stated before, get rid of the idea "it's a cool library - let's take it just in case"
  4. Before adding a new library, sit back and think. Maybe even take a break, get some coffee, discuss it with your colleagues. Perhaps you'll realsie that you can solve the problem in a completely different way, without using third-party libraries.

P.S. The things I speak about here may not be completely acceptable to everyone. For example, the fact that I'm recommending the use of WinAPI, instead of a universal portable library. There may arise objections based on the idea that going this way "binds" this project to one operating system. And then it will be very difficult to make a program portable. But I do not agree with this. Quite often the idea "and then we'll port it to a different operating system" exists only in the programmer's mind. Such a task may even be unnecessary for managers. Another option - the project will kick the bucket due to the complexity and universality of it before gaining popularity and having the necessity to port. Also don't forget about point (8) in the list of problems, given above.

42. Don't use function names with "empty"

The fragment is taken from WinMerge project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V530 The return value of function 'empty' is required to be utilized.

void CDirView::GetItemFileNames(
  int sel, String& strLeft, String& strRight) const
{
  UINT_PTR diffpos = GetItemKey(sel);
  if (diffpos == (UINT_PTR)SPECIAL_ITEM_POS)
  {
    strLeft.empty();
    strRight.empty();
  }
  ....
}

Explanation

A programmer wanted to clean the strLeft and strRight strings. They have String type, which is nothing else than std::wstring.

For this purpose he called the empty() function. And this is not correct. The empty() function doesn't change the object, but returns the information if the string is empty or not.

Correct code

To correct this error you should replace the empty() function with clear() or erase (). WinMerge developers preferred erase() and now the code looks like this:

if (diffpos == (UINT_PTR)SPECIAL_ITEM_POS)
{
  strLeft.erase();
  strRight.erase();
}

Recommendation

In this case the name "empty()" is really inappropriate. The thing is that in different libraries, this function can mean two different actions.

In some libraries the emply() function clears the object. In other ones, it returns the information if the object is empty or not.

I would say that the word "empty" is lame in general, because everybody understands it differently. Some think it's an "action", others that it's "information inquiry". That's the reason for the mess we can see.

There is just one way out. Do not use "empty" in the class names.

  • Name the function for cleaning as "erase" or "clear". I would rather use "erase", because "clear" can be quite ambiguous.
  • Choose another name for the function which gets information, "isEmpty" for instance.

If you for some reason think that it's not a big deal, then have a look here. It's quite a widespread error pattern. Of course it's slightly late to change such classes as std::string, but at least let's try not to spread the evil any longer.

Conclusion

I hope you enjoyed this collection of tips. Of course, it is impossible to write about all the ways to write a program incorrectly, and there is probably no point in doing this. My aim was to warn a programmer, and to develop a sense of danger. Perhaps, next time when a programmer encounters something odd, he will remember my tips and won't haste. Sometimes several minutes of studying the documentation or writing simple/clear code can help to avoid a hidden error that would make the life of your colleagues and users miserable for several years.

I also invite everybody to follow me on Twitter @Code_Analysis

Bugless coding!

Sincerely, Andrey Karpov.

Intel Media Server Studio 2017 generic install

What's New? - Intel® VTune™ Amplifier XE 2017 Update 2

$
0
0

Intel® VTune™ Amplifier XE 2017 performance profiler

A performance profiler for serial and parallel performance analysis. Overviewtrainingsupport.

New for the 2017 Update 2! (Optional update unless you need...)

As compared to 2017 Update 1:

  • Support for Intel® Xeon Phi™ coprocessor targets codenamed Knights Landing from Linux* OS host
  • Support for cross-OS analysis to all license types. Download installation packages for additional operating systems from registrationcenter.intel.com.
  • Support for the Intel® Atom™ processors codenamed Apollo Lake and Denverton, and the Intel processors codenamed Kaby Lake
  • Support for the mixed Python* and native code in the Locks and Waits analysis including call stack collection
  • HPC Performance Characterization analysis improvements:
    • Increased detail and structure for vector efficiency metrics based on FLOP counters in the FPU Utilization section
    • MPI imbalance metric based on MPI busy wait time and parallel efficiency for a most awaited rank in the CPU Utilization section
    • New section presenting the data on the hottest loops and functions with arithmetic operations, which enables you to identify which loops/functions with FPU Usage took the most CPU Time
  • DRAM Bandwidth Bound metric based on uncore events used in the Memory Usage viewpoint for the Memory Access and HPC Performance Characterization analyses
  • GPU Hotspots Summary view extended to provide the Packet Queue Depth and Packet Duration histograms for the analysis of DMA packet execution
  • Support for performance analysis of a guest Linux* operating system via Kernel-based Virtual Machine (KVM) from a Linux host system with the KVM Guest OS option
  • Support for the Ubuntu* 16.10 and Fedora* 25

Resources

  • Learn (“How to” videos, technical articles, documentation, …)
  • Support (forum, knowledgebase articles, how to contact Intel® Premier Support)
  • Release Notes (pre-requisites, software compatibility, installation instructions, and known issues)

Contents

File: vtune_amplifier_xe_2017_update2.tar.gz

Installer for Intel® VTune™ Amplifier XE 2017 for Linux* Update 2 

File: VTune_Amplifier_XE_2017_update2_setup.exe

Installer for Intel® VTune™ Amplifier XE 2017 for Windows* Update 2 

File: vtune_amplifier_xe_2017_update2.dmg

Installer for Intel® VTune™ Amplifier XE 2017 - OS X* host only Update 2 

* Other names and brands may be claimed as the property of others.

Microsoft, Windows, Visual Studio, Visual C++, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.

Intel® University Games Showcase 2017

$
0
0

Game Developers Conference (GDC) 2017 is fast approaching along with one of the year’s most anticipated events: the Intel® University Games Showcase.

The Intel University Games Showcase provides an opportunity for students from top university programs in computer entertainment, design, and engineering all across the country to show their very best work and compete for $40,000 in prizes. In its fourth year, the Intel University Games Showcase has become one of the most exciting showcases for new and innovative gaming ideas. This year is no exception.

Following are overviews of this year’s entries. They will all be competing for first-, second-, and third-place prizes in the following categories:

  • Best Gameplay
  • Best Visual Quality
  • And new this year, a special prize for innovation

Enjoy this preview of what you will see at the Intel University Games Showcase 2017, and then be sure to check them out yourself at this year’s GDC. More details for the entry of each university will be online shortly.

 

Carnegie Mellon University

Beatstep Cowboys

The faculty selected this game from last year's student projects. An important selection criteria was innovative gameplay.

 

The Game:

You are in the duel of your life. You have four beats to decide what moves to make. Your opponent is making their move. What do you do?

This is the life of the Beatstep Cowboys.

Beatstep Cowboys is a quirky Wild West duel that combines quick real-time strategy and rhythm-based gameplay. Players each control a cowboy, choose a sequence of moves to make, and try to shoot the other player before they are shot. But unlike other strategy games, players select their moves at the same time, to the beat of the soundtrack. Beatstep Cowboys is all about creating close calls and the element of surprise. The result is a competitive game that is easy enough for a little kid to pick up, but provides advanced players the opportunity to stretch the limits of their logical and predictive thinking.

 

DigiPen Institute of Technology

Magnolia

To select this game for the showcase, a panel of DigiPen instructors nominated different projects, and then they voted to determine the top pick. This year’s winner is "Magnolia."

 

The Game:

"Magnolia" is an interactive story dedicated to the memory of a 3-year-old girl of the same name. The player experiences things the girl loved in her real life as she follows a path to reach a Magnolia tree planted by her father. This experience comes to life with dialogue narrated by Magnolia's father and her sister. The team worked to personalizing the game as much as possible. The environment in the game revolves around places, object, and colors Magnolia loved in real life. The team recreated her bedroom and other art assets with the help of pictures provided by her family. To develop this game, the team had to get to know the character. They had several meetings with Magnolia's father because it was a very delicate subject and the team wanted to be true to Magnolia’s memory. Making this game come to life was an emotional rollercoaster. Every small detail in the game has a story behind it.

 

Drexel University

Sole

Drexel is proud to be invited to the Intel University Games Showcase since the beginning. It’s a great opportunity for Drexel students to have their work recognized on a national stage. Holding it at the premier game development industry conference makes it a great practical exercise in promoting their projects, as well as a strong networking opportunity, regardless of the competition outcome.

Drexel’s Digital Media program produces many student projects every year, so when they open their internal competition, they get student teams applying from different years (sophomores through PhD candidates), courses, and programs under the DIGM umbrella. They hold their own competition using a format similar to the actual event, modeling their judging process on the the Intel University Games Showcase rules, and adding the overall quality of the presentation to the gameplay and visual quality categories. With energetic discussions among the faculty, they choose from their 6-8 participating teams the one team that will represent Drexel.

It’s an exciting process that becomes a goal for the students, especially after Drexel’s first-place win for gameplay last year. Opportunities like the Intel University Games Showcase inspire students to stay focused on their projects as goals beyond just grades and portfolios.

 

The Game:

Sole is an abstract, aesthetic-driven adventure game where you play as a tiny ball of light in a world shrouded in darkness. The game is a quiet journey through desolate environments where you’ll explore the remnants of great cities to uncover the history of an ancient civilization. Paint the land with light as you explore an abandoned world with a mysterious past.

Sole is a game about exploring a world without light. As you move through the environment and discover your surroundings, you’ll leave a permanent trail of light wherever you go. Free from combat or death, Sole invites players of all skill levels to explore at their own pace. With no explicit instructions, players are left to discover their objective as they soak up the somber ambiance.

 

New York University

Consume Me

This game was an easy choice for the NYU Game Center. It’s a thoughtful, fun, and aesthetically unique game that is personal, smart, and clearly an artist's passion project.

 

The Game:

"Consume Me" puts the player into the mind of the dieter. These prototypes explore a three-way dynamic between the player, the character, and the fact that this character is based on Jenny, the author. What does it mean to push and prod the character into certain eating behaviors when the player doesn’t get full control of the character’s thoughts and internal state? The player is put in an awkward position of performing as the character, but only in a limited sense. These prototypes embrace an intimate and confessional mood and present a goal-oriented relationship with food using simple, but distressing mechanics. Cramming Tetris-shaped pieces of food on a plate to hit a calorie target, putting a flopping avatar through a fat-burning workout, and showing the protagonist’s distress as she tries on a crop top are mechanics which place the powerful feelings of self-consciousness and anxiety front and center, with a discomforting undercurrent of humor. Is it okay to "play"– or have fun – with someone else’s pain? By giving you permission to poke fun at her suffering, the mechanics of Jenny’s game attempt to bring humor and vulnerability to this serious and uncomfortable subject matter.

 

Rochester Institute of Technology

Gibraltar

MAGIC Spell Studios LLC at Rochester Institute of Technology (RIT) looks forward to the the Intel University Games Showcase each year. This is an incredible opportunity to experience what the most talented students in nationally ranked programs (by the Princeton Review) are creating. The future of our industry is in great hands with the talented, passionate visionaries who showcase their work at this event. Choosing John Miller to represent RIT was an easy decision to make. We were first introduced to John last spring following his participation in the Imagine Cup finals. John has all of the skills necessary to succeed and be a leader in this industry. His work ethic, passion, and talent are impressive and he is a terrific example of the caliber of students that we can offer to game development companies.

 

The Game:

"Gibraltar" is a quick, turn-based strategy game in which two players send armies of adorable robots into battle for control of the game board. The more territory the player controls on the board, the more they can move their robots on their turn. This means that players are free to craft their own strategies and game play is very fluid. A match of Gibraltar usually lasts between 5 and 10 minutes, so you can fit a match in anywhere. Gibraltar is meant to be played between two players, sitting at the same screen. That kind of head-to-head competitive experience is something John always enjoyed. The player has four cute robot types to choose from when setting up their army, each with its own strengths and weaknesses. You only get to spawn your army once, so choose its composition carefully! The synergy between the different pieces allows unique gameplay and endless strategies and is simple enough for everyone to pick up. Players can also use special abilities that can change the course of the game but they are expendable and cost action points to play. Gibraltar features a story mode to help introduce players to the game and includes a fun cast of characters. The game ships a built-in map editor for players to design their own maps and play them with their friends.

 

Savannah College of Art and Design

Kyon

The Savannah College of Art and Design participated in the the Intel University Games Showcase 2016 competition and found that it was a great experience for the students, giving them an opportunity to present their work in front of judges. This year, SCAD sent out a department-wide call for entries. Faculty members evaluated entries based on a balance of game design, aesthetics, and overall product polish. "Kyon" was chosen as the best among its peers after the students produced an interesting playable version in their first 10 weeks of development.

 

The Game:

Kyon is a top-down third-person adventure game where the player assumes the role of a sheepdog named Kyon in mythological Ancient Greece. Kyon is sent by his master, Polyphemus, to find lost sheep and bring the herd home. The player must guide the herd with physical movement and special bark commands through dangerous environments filled with AI threats. All art assets are made using a PBR workflow, and the art team utilized advanced software for realistic effects such as Neofur and Speedtree. Level streaming allows an entire playthrough with no loading screens to interrupt gameplay.

 

SMU Guildhall

Mouse Playhouse

SMU Guildhall was asked to participate in the inaugural University Games Showcase in 2014 and proudly participated with "Kraven Manor." The 2014 event was a great experience for both the students and the university, resulting in an invitation they look forward to annually. The team's selection process is the same every year. There is a small panel of three that reviews the capstone games developed over the school year. The panel members are: Gary Brubaker – Director of Guildhall; Mark Nausha – Deputy Director Game Lab: and Steve Stringer – Capstone Faculty. This panel uses three very high but simple measures: 1) quality in game play and visuals; 2) does the game demonstrate the team game pillars of the program?; and; 3) are the students excellent ambassadors of their game and the university? Guildhall has quite a few games and students that exceed the panel’s expectations, making their job very difficult in choosing only one team.

 

The Game:

"Mouse Playhouse" is a light-hearted VR puzzle game in which you manipulate objects to solve puzzles and guide your pet mice towards the cheese. In Mouse Playhouse, you can also throw objects around, play basketball, darts, and even play the xylophone. There are a total of 15 levels in the game and each one presents a different challenge. Players must use the blue objects to guide the mice away from trouble and towards the cheese. During development, the level designers created clever solutions that enabled them to record mixed reality using Unreal Engine. During development, Unreal Engine did not have support for more than two Vive controllers and mixed reality recording. So the level designers used various tools such as the Unreal Sequencer to "fake" mixed reality in the engine. This allowed the team to record gameplay and live footage on a green screen for their trailer.

 

University of California Santa Cruz

Project Perfect Citizen

The "Project Perfect Citizen" ("PPC") team was a part of the undergraduate Computer Science: Computer Game Design program. This year about 90 students produced a total of 19 games as a part of that program. At the end of the school year, the university held a ceremony called "The Sammy Awards" (the UCSC mascot is a banana slug named Sammy Slug) to celebrate the games from both the undergrad and masters programs. PPC won the grand prize, and UCSC has been helping show the game at various events.

 

The Game:

"PPC" is a surveillance story-telling game. It puts the player in the shoes of a pseudo-NSA surveillance officer. Their job is to investigate suspected cyber criminals by accessing their computers, reading through their files and emails, and using that information to determine if the suspect has committed a crime. Each game level is a series of puzzles immersed in a narrative, and players will have to understand the character's story if they want to complete the level. At the same time, players will be encouraged to question if their actions are justified, if the government should have the power to conduct such intensive surveillance, and whether or not the people they are investigating deserve to be punished, regardless of their guilt. All of this is presented in a simulated Windows* 95-era operating system to achieve a "hacker" aesthetic and retro feel.

 

University of Central Florida

The Channeler

FIEA's decision to participate was an easy one. In their view, the Intel University Games Showcase has turned into a great celebration and showcase of student games at GDC and they love competing with peer programs. "The Channeler" was a great game for them to pick because it has a mix of innovation, gameplay, and beautiful art. Also, because it uses eye-tracking (through a partnership with the Tobii Eye Tracker) as its main controller, they believe it will really stand out from the rest of the field.

 

The Game:

"The Channeler" takes place in a kooky city of spirits, where the denizens are plagued by mysterious disappearances. Fortunately, you are a Channeler. Gifted with the "Third Eye," you possess a supernatural ability to affect the world around you with merely your sight. Explore the spooky night market and solve innovative puzzles to find the missing spirits! Innovation is what really sets The Channeler apart from other games; not many games out there use eye-tracking as a main mechanic. Whether it’s trying to beat a seedy ghost in a shuffling shell game, tracing an ancient rune with your gaze, or confronting possessed statues that rush toward you with every blink—our game utilizes eye movement, blinking, and winking mechanics that provide only a sample of the vast possibilities for eye-tracking games.

 

University of Southern California

Arkology

The USC GamePipe Laboratory has participated in the the Intel University Games Showcase since it was created.

The GamePipe Laboratory faculty looks at the games being built in its Advanced Games course and other games shown at its Showcase, and then they agree on which game is the best. That game is the one that goes to the Intel University Games Showcase, and this year it is "Arkology."

 

The Game:

In "Arkology," the player has been chosen as the commander of Ark, a massive space-faring arcology designed to preserve humanity's continued prosperity and survival. The player can control the game using simple and intuitive motion control. From the Operations Room in the heart of the Ark, the player must strategize, command, and lead his forces to preserve what may be the last of humanity. The game can be described as a real-time tabletop war game where players need to control their miniature game pieces to fight the opposing force. A player's goal is to achieve the mission objective ranging from defending a valuable target to annihilating the enemy force.

Thematically, we want our players to feel like military commanders making strategic decisions and seeing their plans come to life. We want the players to feel like generals in World War II movies drafting their invasion plans over the map of Europe. We want to let the players live the scenes from the movie "Ender's Game" where the commander's will and orders get carried out by his officers.

Our focus for this project is in exploring novel virtual reality interactions to best utilize the fact that players have access to a three-dimension world. We are developing a series of virtual gears that will help a player better command an army and survey the battlefield. Some examples of what we have or are working on:

  • Adaptable controller for the player to quickly change the functionality for the situation at hand.
  • Augmented vision goggle to let the player see or hide additional game stats and information.
  • A utility-belt for players to store and access game elements.
  • Customizable battlefield camera and screen for players to monitor the battlefield.

 

University of Utah

Wrecked: Get Your Ship Together

In UU’s Entertainment Arts and Engineering program, all students working on their capstone projects and all masters student projects, are automatically entered into a university event where faculty not involved in the actual game projects review all entries. They select four finalists. A subcommittee of three faculty members chooses the finalist. This year, "Wrecked" was chosen.

 

The Game:

"Wrecked: Get Your Ship Together" is the living-room party game for VR! One player is on the Vive while everyone else in the room plays on their mobile phones. Together, they must repair their ship to escape the planet on time. The Vive player must navigate a foreign planet, both on foot and by hover ship, to scavenge parts and repair the team’s mothership. The mobile players command helpful drones which follow the player and give aid. Specifically, the mobile players can give directional guidance, or they can obtain speed boosts for their captain by successfully executing orders."

Another problem specific to VR is that of traveling world-scale environments in a room-scale experience, a living room is generally a bit smaller than the world of Skyrim. The development team’s solution is to give the player a hover ship. This means their actual physical chair is part of the play space. When they sit, they can fly around world-scale. When they stand up, they can experience the full joys of room-scale.

The development team feels both the mobile integration with VR and the physical augmentation of the game are compelling, and they are excited to be exploring this new space.

 

Implementation of Classic Gram-Schmidt in a Reservoir Simulator

$
0
0

Introduction

Reservoir simulators typically use Krylov methods for solving the systems of linear equations that appear at every Newton iteration. One key step in most Krylov methods, executed at every linear iteration, involves orthogonalizing a given vector against a set of (already) orthogonal vectors. The linear solver used in the reservoir simulator we worked on implements the Orthomin method and utilizes the Modified Gram-Schmidt algorithm to execute this operation. This process has, for some simulations, a high contribution to the total computation time of the linear solver. Therefore, performance optimizations on this portion of the code may provide performance improvements for the overall simulation.

Figure 1 shows the percentage of total time spent on the Linear Solver and in the Gram-Schmidt kernel for several parallel simulations. The percentage of time spent on the Modified Gram-Schmidt method inside the linear is a hotspot on the simulator, ranging from 6% on Case 1 up to 51% on the Case 2. This result demonstrates the importance of trying to optimize this computation kernel.


Figure 1. Gram-Schmidt and Linear solver total time percentage for parallel simulations. The number of MPI processes used in the run is indicated between parentheses.

This work describes an implementation of the Classic Gram-Schmidt method on the linear solver of a reservoir simulator under development. The implementation is based on the work developed by João Zanardi, a Graduate Student from Rio de Janeiro State University (UERJ) during his internship at Intel Labs (Santa Clara, USA). The proposed implementation of the Classic Gram-Schmidt method provides performance benefits by improving data locality in cache during the linear algebra operations and by reducing the number of collective MPI calls.

We achieved up to 1.7x performance speedup in total simulation time with our optimized Classic Gram-Schmidt when compared to the current implementation of the Modified Gram-Schmidt. However, the Classic Gram-Schmidt implementation does not over perform the current implementation in all cases. It seems that the part of the vectors associated to each thread needs to have a minimum size in order for the Classic Gram-Schmidt to be advantageous.

In Section 1 we describe the two versions of the Gram-Schmidt method, leaving for Section 2 a more detailed explanation of certain aspects of how the Classic version was implemented in our reservoir simulator. Section 3 presents the results of the tests we made, starting with a microbenchmark program specifically written to test the Gram-Schmidt kernel in isolation, then comparing the performance of the two approaches for the Linear Solver alone on matrices dumped from the reservoir simulator and finally testing the methods on actual simulation runs. Section 4 provides the conclusions of this study.

1. Description of the Classic and Modified Gram-Schmidt Methods

Listing 1 shows the current implementation of the Modified Gram-Schmidt method used in the reservoir simulator linear solver. Vector qj is orthogonalized with respect to vectors qi, i = 0 to j 1, while vector dj is updated as a linear combination of vectors di, i = 0 to j 1. For applications of interest in this work, qj and dj are vectors whose size can reach several million, while j, the number of vectors in the base, is a small number, typically a few dozen. For each pair of vectors qi and qj it is necessary to compute an inner product (line 9) through the brackets operator of the object ip. Then the scalar alpha is stored and vectors qj and dj are updated in lines 10 and 11.

V* q = &qv[0];
V* d = &dv[0];
V& qj = q[j];
V& dj = d[j];
for( int i = 0; i < j; i++ )
{
        const V& qi = q[i];
        const V& di = d[i];
        const double alpha = ip( qj, qi );
        qj -= alpha * qi;
        dj -= alpha * di;
}    

Listing 1.Source code for the Modified Gram-Schmidt utilized on the current linear solver. 

We can see in Listing 1 that the inner product and update (axpy) on lines 9, 10 and 11 are BLAS level one operations (vector operations) and, therefore, they have low arithmetic intensity, with their performance limited by the memory bandwidth. The Modified Gram-Schmidt method has a loop dependence since every pass on the loop updates the qj vector (line 10) to be used in the inner product of the next pass (line 9), which leaves no room for data reuse in order to improve data locality.

In order to overcome this limitation, one possibility is to use the Classic Gram-Schmidt method, as discussed in several references in the literature (e.g. 1 and 2). The Classic GS method is also the default option in the well-known PETSc library, as well as other solver libraries, and it is implemented in other reservoir simulators. Listing 2 presents the algorithm. The inner products needed to calculate the alpha values are performed before the qj vector is updated, removing the recurrence observed in the Modified version. All three loops in the algorithm can be recast as matrix-vector multiplications and, therefore, BLAS2 operations can be used, with better memory access. More specifically, let Q be the matrix whose columns are the vectors qi, i = 0 to j 1, then the loop in lines 5 to 9 calculates

alpha_vec = QT qj,    (1.1)

where alpha_vec is a size j 1 vector containing the alpha values, while the loop in lines 11 to 15 calculates

qj = qj − Q alpha_vec.    (1.2)

Similarly, if D denotes the matrix whose columns are the di vectors, the loop in lines 17 to 21 calculates

dj = dj − D alpha_vec.     (1.3)

In order to realize the potential performance benefits of interpreting the orthogonalization and updating as BLAS2 operations, as given by, and, blocking has to be used. The objective of blocking techniques is to organize data memory accesses. The idea is to load a small subset of a large dataset into the cache and then to work on this block of data without the need to bring it back to cache. By using/reusing the data already in cache, we reduce the need to go to memory, thus reducing memory bandwidth pressure 6.

Our implementation of the blocking technique will be shown in the next section, where our implementation in the simulator is detailed. It will also be clear that switching to BLAS2 results in less communication when running in parallel.

V* q = &qv[0];
V* d = &dv[0];
V& qj = q[j];
V& dj = d[j];
for( int i = 0; i < j; i++ )
{
   const V& qi = q[i];
   alpha[i] = ip( qj, qi );
}

for( int i = 0; i < j; i++ )
{
   const V& qi = q[i];
   qj -= alpha[i] * qi;
}

for( int i = 0; i < j; i++ )
{
   const V& di = d[i];
   dj -= alpha[i] * di;
}  

Listing 2. Source code for the Classic Gram-Schmidt method without blocking.

Note that the Modified version could be rewritten in such way that the update of dj is recast as matrix-vector calculation. In fact, line 11 of Listing 1 is independent of the other calculations in the same loop and could be isolated in a separated loop equal to the loop in lines 17 to 21 of Listing 2. We had tested this alternative implementation of the Modified version, but our preliminary results indicated that the speedup obtained is always very close to or lower to what it can be obtained with the Classic version and we decided to not pursue on any further investigation along those lines.

It is very important to note that the Classic and Modified versions are not equivalent and it is a well-known fact that, in the presence of round-off errors, Classic GS is less stable than Modified GS 5, with the consequence that it is more prone to loss of orthogonalization in the resulting vector basis. To what extent this is an issue for its application to Krylov methods has been discussed in the literature 1, 2, 3, 4, but apparently, it does not seem to be particularly serious, considering that, as alluded above, it is successfully applied in several solvers. This seems to be corroborated by our experience with the implementation in the simulator, as it will be shown in Section 3.

2. Implementation of Classic Gram-Schmidt on the Simulator

In order to explore the potential of data reuse introduced by recasting the calculations as BLAS2 operations, it is necessary to block the matrix-vector multiplications. Figure 2 depicts the blocking strategy for the matrix-vector multiplication. The contribution of each chunk of qj is calculated for all qi’s, allowing reuse of qj, improving memory access. Listing 3 shows the corresponding code for calculating the alpha vector using this blocking strategy. The Intel Compiler provides a set of pragmas to ensure vectorization 7. The reservoir simulator already made use of pragmas on several computational kernels and we also add pragmas to ensure vectorization.


Figure 2. Representation of the blocking strategy used to improve data traffic for.

const int chunkSize = 2048;
for(int k = 0; k < size; k += chunkSize)
{
    for( int i = 0; i < j; i++ )
    {
        double v = 0.0;
#pragma simd reduction(+:v)
       for(int kk = 0; kk < chunkSize; kk++)
       {
            v += qj[k + kk] * Q[i][k + kk];
       }
       alpha[i] += v;
    }
}  

Listing 3. Source code for local computation of the alpha factor using a matrix-vector blocked operation.

Careful examination of Figure 2 and Listing 3 reveals that implementation of the Classic method has another opportunity to improve performance, in addition to blocking. The alpha factors are calculated without using the inner product operator, reducing MPI communication calls, as only a single MPI All reduce call on the entire alpha vector is required. Note that in the Modified version, an inner product has to be done for each alpha due to the loop recurrence and, consequently, a reduction is triggered for each i in the loop in line 5 of Listing 1.

Similarly, blocking is also required to reduce data traffic for calculations and. Figure 3 and Listing 4 are the counterparts of Figure 2 and Listing 3 for the updating operations, showing how the blocking strategy is implemented in that case. The updating of each chunk of qj is calculated for all qi’s, allowing reuse of qj, improving memory access.


Figure 3.  Representation of the blocking strategy used to improve data traffic for. The same strategy can also be applied to.

const int chunkSize = 2048;
for(int k = 0; k < size; k += chunkSize)
{
    double temp[chunkSize];
    for( int i = begin; i < j; i++ )
    {
#pragma simd vectorlength(chunkSize)
        for(int kk = 0; kk < chunkSize; kk++)
        {
            temp[kk] += alpha[i] * Q[i][k + kk];
        }
    }
#pragma simd vectorlength(chunkSize)
    for(int kk = 0; kk < chunkSize; kk++)
    {
        qj[k + kk] -= temp[kk];
    }
}

Listing 4. Source code for the update of the vectors qj and dj using a modified matrix-vector blocked operation.

The optimizations used on this work are focused on our hardware used on production (Sandy Bridge). This kernel may show better performance on newer Intel® processors (Broadwell) that supports FMA (Fused Multiply Add) instructions and improved support for vectorization.

3. Tests

3.1 System Configuration

We performed all experiments on a workstation with two Intel® Xeon® processors and 128GB of DDR3 1600MHz memory. Table 1 shows the configurations of the processors used. All experiments were executed using Linux* RedHat 6 with kernel 2.6.32. Intel® MPI Library 5.0.1 and Intel® C++ Compiler XE 14.0.4 with compilation flags -O3, -fp-model fast and -vec were used.

Table 1.Description of the hardware used in the experiments.

3.2 Performance Comparison with a Microbenchmark

To evaluate the performance relation between the size of the vectors and the number of processes, we developed a microbenchmark that initializes a set of vectors in parallel and executes only the orthogonalization process, so that we can look directly to the Gram-Schmidt performance without any influences from the linear solver.

Table 2.Performance improvement of the Classic Gram-Schmidt over the Modified version in the microbenchmark for several vector sizes and the number of processes. Greater than one means Classic is faster than Modified, the higher the better.

Table 2 shows the performance improvement of the Classic method over the Modified version of the Gram-Schmidt method. In the table rows, we vary the qj and dj vector sizes and in the table columns, we vary the number of MPI processes used for parallel processing. Vector sizes were generated based on the number of cells of a regular N x N x N grid. The total number of vectors was 32 and chunk size was 2048. The value for the chunk size was obtained by performing a series of experiments with the microbenchmark program. Figure 4 shows the results of one such experiment performed in an early phase of this study, showing the performance benefits of increasing chunk size up to 1024. Further experiments determine that 2048 was slightly better. In all results to be presented in the next sections, this was the chunk size value used. It is expected that the ideal chunk size will depend on the machine features, such as cache size.


Figure 4.Ratio between the times for the Modified and Classic Gram-Schmidt implementations as a function of chunk size. Greater than one means Classic is faster than Modified, the higher the better. Vector size is 256 x 1024 and 32 is the number of vectors.

From Table 2 one can notice that Classic can be more than twice as fast as the Modified for the largest vectors. On the other hand, for vectors of intermediate size, Modified is faster when eight or more processors are used. For all numbers of processes, Classic loses performance, relatively to Modified, for the intermediate size vectors. So far, we have not reached a conclusion about the reason for this performance loss. Apparently, there is a minimum size for the part of the vector associated with a process in order to have the advantage when using Classic Gram-Schmidt.

3.3 Microbenchmark Profiling with Intel® VTune™ Amplifier and Intel® Trace Analyzer

To evaluate the reduction in communication time between the current implementation of the Modified method and our Blocked Classic method we utilized the microbenchmark and the Intel Trace Analyzer tool in the Linux operating system. To do so, we recompiled our microbenchmark by linking with appropriate Intel Trace Analyzer libraries.

We run the microbenchmark by varying the size of the vectors from 4,096 to 2,097,152 in the same execution with 16 MPI processes. Figure 5 shows the percentage of time spent with MPI communication in relation to the total time for the current implementation of the Modified method. In Figure 6 we have the percentage of time spent with MPI communication in relation to the total time for our implementation of the Classic method. Comparing the two figures we can notice a reduction of the percentage of time spends with MPI from 15.3% to 1.7%, which implies a reduction of time in the order of 15x in the Classic method.

Figure 5.Ratio of all MPI calls to the rest of the code in the application for the Modified method.

 

 Figure 6. Ratio of all MPI calls to the rest of the code in the application for the Classic method.

In addition, we also use an Intel VTune Amplifier tool to check the vectorization of the Modified method and our implementation of the Classic method. For this, we executed the microbenchmark with a single process and with vectors of sizes 2,097,152 using the General Exploration Analysis type on Intel VTune Amplifier.

In Figure 7 and Figure 8, we have images from the Intel VTune Amplifier focusing on the code generated to vector update operations (AXPY) for Modified and Classic methods, respectively. These figures show that in both versions of the method the compiler was able to generate versions of the code with vectorization.


Figure 7. Update vector section generated code from the Intel VTune Amplifier running the Modified method.


Figure 8.  Update vector section generated code from the Intel VTune Amplifier running Classic method.

In Figure 9 and Figure 10, we have the initial section of the Bottom-Up view in General Exploration Analysis for the Modified and Classic methods, respectively. In Figure 9, the code section responsible for updating the vectors (AXPY) is marked with a high incidence of LLC Hits. According to the VTune documentation: "The LLC (last level cache) is the last and highest latency, level in the memory hierarchy before the main memory (DRAM). While LLC hits are met much faster than DRAM hits, they can still incur a significant performance penalty. This metric also includes consistency penalties for shared data”.  Figure 10 shows that our Classic method implementation does not present a high incidence of LLC, showing that the blocking method implemented was efficient to maintain the data at the L1 and L2 cache levels.


Figure 9.Initial section of the General Exploration Analysis Bottom-Up view from VTune Amplifier executing the Modified method.


Figure 10.Initial section of the General Exploration Analysis Bottom-Up view from VTune Amplifier executing the Classic method.

3.4 Experiments with Extracted Matrices

In order to understand how the two methods compare within the overall linear solver, we used two matrices extracted from simulations and compare the linear solve performance using the Classic and Modified versions with 1, 2, 4, 8 and 16 processes. In the first case (Case 3) vector size is 2,479,544 and in the second (Case 2) 4,385,381. The number of iterations obtained by both methods was the same in all cases.


Figure 11. Time ratio of the Classic method over the Modified for matrices extracted from Case 3. Greater than one means Classic was faster, the higher the better.

Figure 11 and Figure 12 show the performance improvements for the two cases for the different numbers of processes. In all configurations, the Classic version yields substantial gains in the Gram-Schmidt kernel, ranging from 1.5x to 2.5x when compared to the Modified one. The corresponding benefit in the overall linear solution is also very expressive, ranging from 1.2x to 1.5x for Case 2 and from 1.1x to 1.3x for Case 3.


Figure 12.Time ratio of the Classic method over the Modified for matrices extracted from Case 2. Greater than one means Classic was faster, the higher the better.

For the Case 3 matrix with 16 MPI processes, we use the Intel® VTune™ Amplifier XE 2015 tool in Hotspot Analysis mode to evaluate the communication reduction. In Figure 13 and Figure 14 we show the profile of the Modified and the Classic methods, respectively. The MPI_Allreduce calls within the inner product method for the Modified version take 36 seconds of CPU time (6% of orthogonalization time). The Classic method profile shows 6.28 seconds spent on MPI_Allreduce calls (2% of orthogonalization of time), showing a large reduction in communication time. However, in this scenario of a small number of processes the communication does not represent a major portion of the orthogonalization time and, therefore, it does not have a big impact on the overall performance. This is likely to change when increasing the number of processes and running in cluster environments where communication takes place via the communication network.


Figure 13. Modified Gram-Schmidt benchmark profile from the Intel® VTune™ Amplifier XE 2015 for the Ostra matrix with 16 MPI processes.


Figure 14.Classic Gram-Schmidt benchmark profile from the Intel VTune Amplifier XE 2015 for the Ostra matrix with 16 MPI processes.

3.5 Performance Comparison with Full Simulations

In order to assess the impact of replacing the Modified Gram-Schmidt method by the Classic one in the performance of actual simulation runs, seven test cases were executed. Table 2 contains the main features of the cases. Note that vector sizes for most of the cases are in the intermediate range where Table 2 shows the least performance for the Classic when compared with the Modified, the exceptions being Case 2, which is beyond the largest sizes in Table 2, and Case 1, which is in the range of the smallest sizes.

Table 3. Main features for the seven test cases.

The number of time steps, cuts and linear and nonlinear iterations taken for each case with the two Gram-Schmidt implementations is shown in Table 2. In five out of the seven cases, the performance of the Modified and Classic is very close. For Case 2 and Case 4, Classic performs clearly better, particularly in Case 2 where the number of linear iterations decreases 16%.

Table 4.Numerical data for the seven test cases. Inc is the relative increment from Modified to Classic (negative when Classic took less time steps, cuts, and iterations).

Figure 9 shows the performance gains provided by using the Gram-Schmidt Classic method for the three serial runs. The performance of both methods is very close for Case 5 and Case 4, while there is around 10% improvement in the Gram-Schmidt kernel for Case 1. Those results seem to be in line with the findings from the microbenchmark program, as Case 1 vector size is in the small range where Table 2 shows benefits for the Classic version. The improvement in Gram-Schmidt does not extend to the Linear Solver whose performance is almost the same with both methods.


Figure 15.Time ratio of Classic Gram-Schmidt over Modified for the serial runs.

Figure 10 is similar to Figure 9 for the parallel runs. Case 7 shows a slight improvement with the Classic, while Case 4 and Case 1 show degradation in performance, particularly the latter, where time for the Modified is almost 20% smaller. The impact of those differences in Linear Solver and Asset time is minor. For Case 2, there is a substantial improvement in performance of the orthogonalization, with Classic being 2.8x faster. For this case, the benefits in Gram-Schmidt translate into a noteworthy improvement in both Linear Solver and Asset time, making the full simulation almost 1.7x faster. This is due both to the fact that vectors are very large and, therefore, Classic is supposed to over perform Modified by a large amount (see Table 2), as well as to improvement in linear and nonlinear iterations resulting from changing the orthogonalization algorithm. It is also important to note that Gram-Schmidt contributes with around half of the total simulation time for Case 2 (see Figure 16), which makes any benefits in this kernel to result in much clear improvements in total simulation time.


Figure 16. Time ratio of Classic Gram-Schmidt over Modified for the parallel runs.

4. Conclusions

The Classic Gram-Schmidt method was implemented in the reservoir simulator linear solver, using a blocking strategy for achieving better memory access. From the tests we made, the following conclusions can be taken:

  • The new implementation provides a substantial performance improvement over the current one, based on the Modified Gram-Schmidt algorithm, for large problems where the orthogonalization step takes a considerable share of total simulation time. Typically, this will be the case when the number of linear iterations to solve each system is big, making the Krylov basis large. Outside of this class of problems, the performance is either close to or slightly worse than the current implementation.
  • Using a performance analysis tool, we could observe a substantial reduction in communication time when using the Classic version. For the hardware configuration we used, it does not translate into a large benefit for the overall procedure, as the tests were executed in a workstation and parallelization was limited to at most 16 processes. It is expected that, for parallel runs in cluster environments with a large number of processes, the reduction in communication costs will become important to ensure good performance and parallel scalability.
  • Despite known to be less stable than the Modified version, we have not noticed any degradation in convergence of the Krylov method when switching to the Classic version in our test cases. In fact, convergence for the Classic was even better than Modified in two out of the seven actual simulation models we ran.
  • The blocking strategy adopted in the implementation depends on a parameter, the chunk size, which is hardware dependent. The study does not allow to say to what extent tuning this parameter to a specific hardware is crucial to obtain adequate levels of performance, as one single machine configuration was used in all tests.
  • Experiments with a microbenchmark program focused on the Gram-Schmidt kernel showed a decrease in the performance of the Classic relative to the Modified for intermediate vector sizes. The results obtained for full simulations seem to corroborate those findings. At the moment, we have not found any consistent explanation for this phenomenon, although it seems to be related to the division of work per thread or process. It is also still unclear if it is possible to avoid the performance downgrade of Classic Gram-Schmidt (relative to the Modified) by tuning implementation.

References

  1. Frank, J. & Vuik, C., Parallel Implementation of a Multiblock Method with Approximate Subdomain Solution, Applied Numerical Mathematics, 30, pages 403-423, 1999.
  2. Frayssé, V., Giraud, L., Gratton, S. & Langou, J., Algorithm 842: A Set of GMRES Routines for Real and Complex Arithmetics on High Performance Computers, ACM Transactions on Mathematical Software, 31, pages 228-238, 2005.
  3. Giraud, L., Langou, J. & Rozloznik, M., The Loss of Orthogonality in the Gram-Schmidt Orthogonalization Process, Computers and Mathematics with Applications, 50, pages 1069-1075, 2005.
  4. Greenbaum, A., Rozloznik, M. & Strakos, Z., Numerical Behaviour of the Modified Gram-Schmidt GMRES Implementation, BIT, 37, pages 706-719, 1997.
  5. Golub, G.H. & Van Loan, C.F., Matrix Computations, Third Edition, The Johns Hopkins University Press, Baltimore and London, 1996.
  6. Cache Blocking Techniques, https://software.intel.com/en-us/articles/cache-blocking-techniques accessed on 16/12/2016.
  7. Improve Performance with Vectorization, https://software.intel.com/en-us/articles/improve-performance-with-vectorization.
Viewing all 3384 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>