Implementing a Fault-Tolerant Algorithm for Persistent Memory Using NVML - A MapReduce Example

October 24, 2017, 5:53 pm

Latest and popular articles on Intel Technologies

≫ Next: Inconsistent program behavior on Red Hat Enterprise Linux* 7.4 if compiled with Intel compilers

≪ Previous: Deep Learning Training and Testing on a Single Node Intel® Xeon® Scalable Processor System Using Intel® Optimized Caffe*

Get Sample Code

Overview

In this article, I present a sample implementation of the famous MapReduce (MR) algorithm for persistent memory (PMEM), using the C++ bindings of libpmemobj, which is a core library of the Non-Volatile Memory Library (NVML) collection. The goal of this example is to show how NVML facilitates implementation of a persistent memory aware MR with an emphasis on data consistency through transactions as well as concurrency using multiple threads and PMEM aware synchronization. In addition, I show the natural fault-tolerance capabilities of PMEM by killing the program halfway through and restarting it from where it left off, without the need for any checkpoint/restart mechanism. Finally, a sensitivity performance analysis is shown by varying the number of threads for both the map and reduce workers.

What is MapReduce?

MR is a programming model introduced by Google* in 2004 that uses functional programming concepts—inspired by the map and reduce primitives from languages such as Lisp*—to make it easier for programmers to run massively parallel computations on clusters composed of thousands of machines.

Since all functions are data-independent from one another, that is, all input data is passed by value, this programming paradigm has presented itself as an elegant solution for dealing with data consistency and synchronization issues. Parallelization can be achieved naturally by running multiple instances of functions in parallel. The MR model can be described as a subset of the functional programming model where all computations are coded using only two functions: map and reduce.

Map Reduce flowchart
Figure 1:Overview of MapReduce. This figure is a modified version of Figure1 of the Google article referenced above.

A high-level overview of how MR works can be found in Figure 1. The input is composed of a set of files, which are split into chunks with a predefined size (usually between 16–64 MB). Each chunk is fed to a map task which creates key-value pairs that are grouped, sorted, and fed to a reduce task. Reduce tasks can then write their results directly to an output file, or pass them on to other reduce tasks for further reduction.

The typical example of a MR computation is word counting. Input chunks are split into lines. Each line then is fed to a map task which outputs a new key-value pair for each word found as follows: {key : 'word', value : '1'}. Reduce tasks then add all the values for the same key and create a new key-value pair with an updated value. If we have just one reduce task at the end, the output file will contain exactly one key-value entry per word with the value being the total count.

MapReduce Using PMEM

The way in which the MR model achieves FT is by storing its intermediate results in files residing in a local file system. This file system normally sits on top of either a traditional hard disk drive (HDD) or a solid state drive (SSD) attached to the node where the task that generates that data runs.

The first problem with this approach is of course the multiple orders of magnitude difference in bandwidth between these drives and volatile RAM (VRAM) memory. PMEM technology can narrow that gap significantly by running very close to VRAM speeds. Given this, one can consider as a first solution to simply switch from HDD or SSD to PMEM by mounting the local file system on top of a PMEM device. Although this surely helps, software still needs to be designed with a volatile-versus-persistent memory mentality. An example of this is when data has a different representation for VRAM (binary tree, heap, and so on) as it has for persistent storage (such as comma-separated values (CSV) files and structured query language (SQL) tables). This is where programming directly against PMEM using the library libpmemobj can greatly simplify development!

By programming directly against PMEM, the only thing needed to achieve FT is to specify what data structures should be permanent. Traditional VRAM may still be used for the sake of performance, but either in a transparent fashion (the same way processor caches are used relative to VRAM) or as a temporary buffer. In addition, a mechanism of transactions is put in place to make sure that permanent data structures do not get corrupted if a failure occurs in the middle of a write operation.

Design Decisions

This section describes the design decisions taken to make the sample PMEM aware.

Data Structures

This particular sample is designed to be run on one computer node only with one PMEM device attached to it. Workers are implemented as threads.

The first thing we need is a data structure that allows us to assign work to the map and reduce workers. This can be achieved with tasks. Tasks can be assigned to workers in either a (1) push fashion (from master to workers), or a (2) pull fashion (workers fetch from the master). In this example, the second option was chosen to simplify the implementation using a persistent list for tasks and a PMEM mutex for coordination between workers.

flowchart
Figure 2:Root data structure.

The first object in a PMEM pool is always the root. This object serves as the main anchor to link all the other objects created in the program. In my case I have four objects. The first two are the nvml::obj versions of the C++ standard mutex and condition variable. We cannot use the standard ones because libpmemobj needs to be able to reset them in the event of a crash (otherwise a permanent deadlock could occur). For more information you can check the following article about synchronization with libpmemobj. The third object is the input data, which is stored as a one-dimensional persistent string. The fourth object is our list of tasks. You may have noticed that the variable tlist is not declared as a persistent pointer. The reason for this is that tlist is never modified (that is, overwritten) after it is first created, so there is no need to keep track of that memory range during transactions. The head variables of tlist for map and reduce tasks, on the other hand, are declared as persistent pointers because their values do in fact change during program execution (by adding new tasks).

Now, let’s take a look at the list_entry class:

image of code
Figure 3: ListEntryclass.

The variable next is a persistent pointer to the next entry in the linked list.
The status flag can take the values TASK_ST_NEW (the task is new and a worker thread can start working on it right away), TASK_ST_BUSY (some thread is currently working on this task), TASK_ST_REDUCED (this task has the results of a reduction but it has not been combined with other reduced tasks yet) or TASK_ST_DONE (the task is finally done).
The task_type flag can take the values TASK_TYPE_NOTYPE, TASK_TYPE_MAP or TASK_TYPE_REDUCE.
start_byte holds the chunk’s start byte in the input data string. Only relevant for map tasks.
n_lines holds the number of lines in the chunk. Only relevant for map tasks.
kv is a pointer for the list of key-value pairs. This list is only relevant for reduce tasks.
kv_size is the size, in elements, of the kv list.
Finally, alloc_bytes is the size, in bytes, of the kv list.

The reason why kv is a persistent pointer to char[] is due to performance considerations. Originally, I implemented this list as a linked list of kv_tuple pairs. However, due to the large amount of allocations (sometimes hundreds of thousands per thread per task) of very small objects (between 30–40 bytes on average), and given that allocations are synchronized by libpmemobj to protect the integrity of metadata, my code was not able to scale beyond eight threads. The change allows each thread to only do a single large allocation when storing all the key-value pairs for a single task.

You may have also noticed that I am not using the persistent_string class mentioned before for the key in kv_tuple. The reason is persistent_string is designed for persistent string variables that can change over time, so for each new string two persistent pointers are created: one for the object itself and one for the raw string. For this particular sample, the functionality of persistent_string is not needed. Key-value tuples are allocated in bulk and set during construction, and never changed until they are destroyed. This reduces the number of PMEM objects that the library needs to be aware of during transactions, effectively reducing overhead.

Nevertheless, allocating key-value tuples this way is a little bit tricky.

struct kv_tuple {
	size_t value;
	char key[];
};

Before creating the persistent list of key-value tuples, we need to calculate what its size will be in bytes. We can do that since the principal computation and sorting steps are completed first on VRAM, allowing us to know the total size in advance. Once we have done that, we can allocate all the PMEM needed with one call:

void

list_entry::allocate_kv (nvml::obj::pool_base &pop, size_t bytes)
{
	nvml::obj::transaction::exec_tx (pop, [&] {
		kv = nvml::obj::make_persistent<char[]> (bytes);
		alloc_bytes = bytes;
	});
}

Then, we copy the data to our newly created PMEM object:

void
list_entry::add_to_kv (nvml::obj::pool_base &pop, std::vector<std::string> &keys,
                       std::vector<size_t> &values)
{
	nvml::obj::transaction::exec_tx (pop, [&] {
		struct kv_tuple *kvt;
		size_t offset = 0;
		for (size_t i = 0; i < keys.size (); i++) {
			kvt = (struct kv_tuple *)&(kv[offset]);
			kvt->value = values[i];
			strcpy (kvt->key, keys[i].c_str ());
			offset
			+= sizeof (struct kv_tuple) + strlen (kvt->key) + 1;
		}
		kv_size = keys.size ();
	});
}

The input to this function—apart from the mandatory pool object pop (this object cannot be stored in persistent memory because it is newly created on each program invocation)—are two volatile vectors containing the key-value pairs generated by either a map or a reduce task. kv is iterated by means of a byte offset (lines 9 and 12) because the size of each kv_tuple is not constant (it depends on the length of its key).

Synchronization

The following pseudocode represents the high-level logic of the worker threads:

Wait until there are new tasks available.
- map workers work only on one task at a time.
- reduce workers try to work on two tasks (and combine them) if possible. If not, then work on one task.
Work on task(s) and set to TASK_ST_DONE (or TASK_ST_REDUCED if it is a reduce worker working on a single task).
Store results in a newly created task with status TASK_ST_NEW (the last task has the results for the whole computation and it is created directly as TASK_ST_DONE).
If computation is done (all tasks are TASK_ST_DONE), then exit.
Go to (1).

Let’s take a look at step (1) for the map worker:

void
pm_mapreduce::ret_available_map_task (nvml::obj::persistent_ptr<list_entry> &tsk,
                                      bool &all_done)
{
	auto proot = pop.get_root ();
	auto task_list = &(proot->tlist);
        /* LOCKED TRANSACTION */
	nvml::obj::transaction::exec_tx (
	pop,
	[&] {
		all_done = false;
		if ((task_list->ret_map (tsk)) != 0) {
			tsk = nullptr;
			all_done = task_list->all_map_done ();
		} else
			tsk->set_status (pop, TASK_ST_BUSY);
	},
	proot->pmutex);
}

The most important part of this snippet is located between lines 8 and 18. This transaction needs to be locked because each task should be executed by only one worker (realize that I am using the persistent mutex from the root object in line 18). In line 12 we check if a new map task is available and, if so, we set its status to TASK_ST_BUSY (line 16), preventing it from being fetched by other workers. If no task is available, line 14 checks whether all map tasks are done (in which case the thread will exit).

Another important take-home lesson from this snippet is that every time a data structure is modified inside a locked region, the region should end at the same time as the transaction. If a thread changes a data structure inside a locked region and then fails right after it (but without finishing the transaction), all the changes done while locked are rolled back. At the same time this is happening, another thread may have acquired the lock and may be doing additional changes on top of the changes that the failed thread made (and are no longer meaningful), ultimately corrupting the data structure.

One way to avoid this is to lock the whole transaction by passing the persistent mutex to the transaction (as shown in line 18 of the snippet). There are cases, however, that this is not feasible (because the whole transaction is serialized de facto). In those cases, we can leave synchronized writes to the end of the transaction by putting them inside a nested locked transaction. Although nested transactions are flattened by default – which means that what we have at the end is just the outermost transaction – the lock from the nested transaction only locks the outermost one from the point where the nested transaction starts. This can be seen in the following snippet:

. . . . .
auto proot = pop.get_root ();
auto task_list = &(proot->tlist);
nvml::obj::transaction::exec_tx (pop, [&] {
	. . . . . .
	/* This part of the transaction can be executed concurrently
	 * by all the threads. */
	. . . . . .
	nvml::obj::transaction::exec_tx (
	pop,
	[&] {
		/* this nested transaction adds the lock to the outer one.
		 * This part of the transaction is executed by only one
		 * thread at a time */
		task_list->insert (pop, new_red_tsk);
		proot->cond.notify_one ();
		tsk->set_status (pop, TASK_ST_DONE);
	},
	proot->pmutex); /* end of nested transaction */
}); /* end of outer transaction */

The case (1) for the reduce worker is more complex, so I will not reproduce it here in its entirety. Nevertheless, there is one part that is worth some discussion:

void pm_mapreduce::ret_available_red_task (
nvml::obj::persistent_ptr<list_entry> (&tsk)[2], bool &only_one_left,
bool &all_done)
{
	auto proot = pop.get_root ();
	auto task_list = &(proot->tlist);
	/* locked region */
	std::unique_lock<nvml::obj::mutex> guard (proot->pmutex);
	proot->cond.wait (
	proot->pmutex,
	[&] { /* conditional wait */
		. . . . .
	});
	. . . . .
	guard.unlock ();

The main difference between my map and reduce workers is that reduce workers perform a conditional wait. Map tasks are created at once and before computation starts. Hence, map workers do not need to wait for new tasks to be created. Reduce workers, on the other hand, conditionally wait until other workers create new reduce tasks. When a reduce worker thread is waked up (another worker runs proot->cond.notify_one() after a new task is created and inserted in the list), a boolean function (lines 11–13) runs to check whether the worker should continue or not. A reduce worker will continue when either (a) at least one task is available or (b) all tasks are finally done (the thread will exit).

Fault Tolerance

The sample code described in this article can be downloaded from GitHub. This code implements a PMEM version of the wordcount program by doing inheritance from the general PMEM MapReduce class and completing the virtual functions map() and reduce():

class pm_wordcount : public pm_mapreduce
{
	public:
	/* constructor */
	pm_wordcount (int argc, char *argv[]) : pm_mapreduce (argc, argv) {}
	/* map */
	virtual void
	map (const string line, vector<string> &keys, vector<size_t> &values)
	{
		size_t i = 0;
		while (true) {
			string buf;
			while (i < line.length ()&& (isalpha (line[i]) || isdigit (line[i]))) {
				buf += line[i++];
			}
			if (buf.length () > 0) {
				keys.push_back (buf);
				values.push_back (1);
			}
			if (i == line.length ())
				break;
			i++;
		}
	}
	/* reduce */
	virtual void
	reduce (const string key, const vector<size_t> &valuesin,
	        vector<size_t> &valuesout)
	{
		size_t total = 0;
		for (vector<size_t>::const_iterator it = valuesin.begin ();
		     it != valuesin.end (); ++it) {
			total += *it;
		}
		valuesout.push_back (total);
	}
};

Build Instructions

There is a Makefile provided by the sample. To compile the sample, just type make; libpmemobj needs to be properly installed in your system, as well as a C++ compiler.

Instructions to Run the Sample

After compilation, you can run the program without parameters to get usage help:

$ ./wordcount
USE: ./wordcount pmem-file <print | run | write -o=output_file | load -d=input_dir> [-
m=num_map_workers] [-nr=num_reduce_workers]
command help:
    print    ->  Prints mapreduce job progress
    run      ->  Runs mapreduce job
    load     ->  Loads input data for a new mapreduce job
    write    ->  Write job solution to output file
command  not valid

To see how FT works, run the code with some sample data. In my case, I use all of the Wikipedia abstracts (the size of the file is 5GB so it may take a long time to load in your browser; you can download it by doing right-click > save as). The first step before running MR is loading the input data to PMEM:

$ ./wordcount /mnt/mem/PMEMFILE load -d=/home/.../INPUT_WIKIABSTRACT/
Loading input data
$

Now we can run the program (in this case I use two threads for map and two threads for reduce workers). After some progress has been made, let’s kill the job by pressing Ctrl-C:

$ ./wordcount /mnt/mem/PMEMFILE run -nm=2 -nr=2
Running job
^C% map  15% reduce
$

We can check the progress with the command print:

$ ./wordcount /mnt/mem/PMEMFILE print
Printing job progress
16% map  15% reduce
$

So far, our progress has been saved! If we use the command run again, computation will start where we left off (16% map and 15% reduce):

$ ./wordcount /mnt/mem/PMEMFILE run -nm=2 -nr=2
Running job
16% map  15% reduce

When computation is done, we can dump the results (command write) to a regular file and read the results:

$ ./wordcount /mnt/mem/PMEMFILE write -o=outputfile.txt
Writing results of finished job
$ tail -n 10 outputfile.txt
zzeddin	1
zzet	14
zzeti	1
zzettin	4
zzi	2
zziya	2
zzuli	1
zzy	1
zzz	2
zzzz	1
$

Performance

The system used has an Intel® Xeon® Platinum 8180 processor CPU with 28 cores (224 threads) and 768 GB of Intel® Double Data Rate 4 (Intel® DDR 4) RAM. To emulate a PMEM device mounted at /mnt/mem, 512 GB of RAM is used. The operating system used is CentOS Linux* 7.3 with kernel version 4.9.49. The input data used is again all of the Wikipedia abstracts (5 GB). In the experiments, I allocate half of the threads for map and half for reduce tasks.

image of chart
Figure 4:Time taken to count the words in all of the Wikipedia abstracts (5 GB) using our PMEM-MR sample.

As it is possible to see, our sample scales well (approximately halving completion time), all the way to 16 threads. An improvement is still observed with 32 threads, but of only 25 percent. With 64 threads we reach the scalability limits for this particular example. This is because the synchronization parts become a larger share of the total execution time as more threads are used with the same data.

Summary

In this article, I have presented a sample implementation of the famous MR algorithm using the C++ bindings of the PMEM library libpmemobj. I showed how to achieve data consistency through transactions and concurrency using a PMEM mutex and conditional variable. I have also shown how NVML—by allowing the programmer to code directly against PMEM (that is, by defining what data structures should be persisted)—facilitates the creation of programs that are naturally FT. Finally, I finished the article with a sensitivity performance analysis showing scalability when more threads are added to the execution.

About the Author

Eduardo Berrocal joined Intel as a Cloud Software Engineer in July 2017 after receiving his PhD in Computer Science from Illinois Institute of Technology (IIT) in Chicago, Illinois. His doctoral research interests were focused on (but not limited to) data analytics and fault tolerance for high performance computing. In the past he worked as a summer intern at Bell Labs (Nokia), as a research aide at Argonne National Laboratory, as a scientific programmer and web developer at the University of Chicago, and as an intern in the CESVIMA laboratory in Spain.

Resources

MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean and Sanjay Ghemawat, https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf
Parallel Programming Using Skeleton Functions, J. Darlington, et al., Dept. of Computer Science, University of Western Australia, http://pubs.doc.ic.ac.uk/parallel-skeleton/parallel-skeleton.pdf
Persistent Memory Programming, pmemobjfs - The simple FUSE based on libpmemobj, September 29, 2015, http://pmem.io/2015/09/29/pmemobjfs.html
Persistent Memory Programming, C++ bindings for libpmemobj (part 7) - synchronization primitives, May 31, 2016, http://pmem.io/2016/05/31/cpp-08.html
Persistent Memory Programming, Modeling strings with libpmemobj C++ bindings, January 23, 2017, http://pmem.io/2017/01/23/cpp-strings.html
link to sample code in GitHub*
https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-abstract.xml
pmem.io Persistent Memory Programming, How to emulate Persistent Memory, February 22, 2016, http://pmem.io/2016/02/22/pm-emulation.html

↧

Inconsistent program behavior on Red Hat Enterprise Linux* 7.4 if compiled with Intel compilers

October 24, 2017, 8:02 am

Latest and popular articles on Intel Technologies

≫ Next: Build an Autonomous Mobile Robot with the Intel® RealSense™ Camera, ROS*, and SAWR

≪ Previous: Implementing a Fault-Tolerant Algorithm for Persistent Memory Using NVML - A MapReduce Example

Reference Number : CMPLRS-41993, CMPLRS-45873, CMPLRS-4605

Version : Intel® C++ and Fortran Compilers 17.0, 18.0, older versions affected as well

Operating System : Red Hat Enterprise Linux* 7.4, Fedora* 25

Problem Description : There is an issue with calls via Procedure Linkage Table (PLT) to functions with custom calling conventions on Red Hat Enterprise Linux* 7.4. Such functions are widely used in Intel compiler libraries, such as LIBIRC, LIBSVML, etc. Intel compiler generates usual call sequence for such functions, and they may be called via PLT in the final executable. During the resolution of the PLT relocation the runtime may overwrite registers that do not need to be preserved according to the ABI. The code enclosing such a call can not assume that the call follows the defined custom calling convention anymore. The issue may cause inconsistent program behavior, such as FP exceptions (e.g. unexpected NaNs generation) or crashes(SIGSEGVs).

Note that the problem may be observed on any system containing glibc version 2.24-9 and newer (not just RHEL 7.4).

Resolution Status : A workaround is to do all required relocations at startup of a binary by specifying LD_BIND_NOW=1 environment variable. 17.0 Update 4 and Update 5 has a workaround implemented in the compiler and will avoid PLT relocations for the problem calls only under –fPIC option.

This issue has been resolved: upcoming 18.0 Update 1

↧

Build an Autonomous Mobile Robot with the Intel® RealSense™ Camera, ROS*, and SAWR

October 18, 2017, 10:10 am

Latest and popular articles on Intel Technologies

≫ Next: Unreal Engine* 4 Optimization Tutorial, Part 4

≪ Previous: Inconsistent program behavior on Red Hat Enterprise Linux* 7.4 if compiled with Intel compilers

Overview

The Simple Autonomous Wheeled Robot (SAWR) project defines the hardware and software required for a basic "example" robot capable of autonomous navigation using the Robot Operating System* (ROS*) and an Intel® RealSense™ camera. In this article, we give an overview of the SAWR project and also offer some tips for building your own robot using the Intel RealSense camera and SAWR projects.

Mobile Robots – What They Need

Mobile robots require the following capabilities:

Sense a potentially dynamic environment. The environment surrounding robots is not static. Obstacles, such as furniture, humans, or pets, are sometimes moving, and can appear or disappear.
Determine current location. For example, imagine that you are driving a car. You need to specify "Where am I?" in the map or at least know your position relative to a destination position.
Navigate from one location to another. For example, to drive your car to your destination, you need both driver (deciding on how much power to apply and how to steer) and navigator (keeping track of the map and planning a route to the destination) skills.
Interact with humans as needed. Robots in human environments need to be able to interact appropriately with humans. This may mean the ability to recognize an object as a human, follow him or her, and respond to voice or gesture commands.

The SAWR project, based on ROS and the Intel RealSense camera, covers the first three of these requirements. It can also serve as a platform to explore how to satisfy the last requirement: human interaction.

A Typical Robot Software Stack

To fulfill the above requirements, a typical robot software stack consists of many modules (see Figure 1). At the bottom of the stack, sensor hardware drivers, including those for the Intel RealSense camera in the case of the SAWR, deliver environmental information to a set of sensing modules. These modules recognize environmental information as well as human interaction. Several sources of information are fused to create various models: a world model, an estimate of the robot state (including position in the world), and command inputs (for example, voice recognition).

The Plan module decides how the robot will act in order to achieve a goal. For mobile robotics, the main purpose is navigating from one place to another, for which it is necessary to calculate obstacle-free paths given the current world model and state.

Based on the calculated plan, the Act module manages the actual movement of the robot. Typically, motor control is the main function of this segment, but other actions are possible, such as speech output. When carrying out an action, a robot may also be continuously updating its world model and replanning. For example, if an unexpected obstacle arises, the robot may have to update its model of the world and also replan its path. The robot may even make mistakes (for example, its estimate of its position in the world might be incorrect), in which case it has to figure out how to recover.

Autonomous navigation requires a lot of computation to do the above tasks. Some tasks can be offloaded to the cloud, but due to connectivity and latency issues this is frequently not an option. The SAWR robot can do autonomous navigation using only onboard computational resources, but the cloud can still be useful for adding other capabilities, such as voice control (for example, using Amazon Voice Services*).

Figure 1. A typical robot software stack.

Navigation Capabilities - SLAM

Simultaneous localization and mapping (SLAM) is one of the most vital capabilities for autonomous mobile robots. In a typical implementation, the robot navigates (plans paths) through a space using an occupancy map. This map needs to be dynamically updated as the environment changes. In lower-end systems, this map is typically 2D, but more advanced systems might use a 3D representation such as a point cloud. This map is part of the robot’s world representation. The “localization” part of SLAM means that in addition to maintaining the map, the robot needs to estimate where it is located in the map. Normally this estimation uses a probabilistic method; rather than a single estimated location, the robot maintains a probability distribution and the most probable location is used for planning. This allows the robot to recover from errors and reason about uncertainty. For example, if the estimate for the current location is too uncertain, the robot could choose to acquire more information from the environment (for example, by rotating to scan for landmarks) to refine its estimate.

In the default SAWR software stack, the open source slam_gmapping package is used to create and manage the map, although there are several other options available, such as cartographer and rgbd-slam. This module is continually integrating new sensor data into the map and clearing out old data if it is proven incorrect. Another module, amcl, is used to estimate the current location by matching sensor data against the map. These modules run in parallel to constantly update the map and the estimate of the robot’s position. Figure 2 shows a typical indoor environment and a 2D map created by this process.

Figure 2. Simultaneous localization and mapping (SLAM) with 2D mapping.

Hardware for Robotics

Figure 3 shows the hardware architecture of the SAWR project. Like many robotics systems, the architecture consists of a master and slave system. The master takes care of high-level processing (such as SLAM and planning), and the slave takes care of real-time processing (such as motor speed control). This is similar to how the brain and spinal reflexes work together in animals. Several different options can be used for this model, but typically a Linux* system is used for the master and one or more microcontroller units (MCUs) are used for the slave.

Figure 3. Robot architecture.

In this article, Intel RealSense cameras are used as the primary environmental sensor. These cameras provide depth data and can be used as input to a SLAM system. The Intel® RealSense™ camera R200 or Intel® RealSense™ camera ZR300 are used in the current SAWR project. The Intel® RealSense™ camera D400 series, shown in Figure 4, will soon become a common depth camera of choice, but since this camera provides similar data but with improved range and accuracy, and uses the same driver, an upgrade is straightforward. As for drivers, librealsense and realsense_ros_camera drivers are available on GitHub*. You can use any Intel RealSense camera with them.

Figure 4. Intel® RealSense™ Depth Camera D400 Series.

For the master computer, you can choose from various hardware, including Intel® NUC with Intel® Core™ i5 and Intel® Core™ i7 processors (see Figure 5). This choice provides maximum performance for robotics development. You can also use OEM boards for robotics, such as one of the Aaeon UP* boards, for rapid prototype-to-production for robotics development. Even the diminutive Aaeon UP Core* has enough performance to do SLAM. The main requirement is that the board runs Linux. The SAWR software stack uses ROS, which runs best under Ubuntu*, although it is possible to install it under other distributions, such as Debian* or Yocto*.

Figure 5. Intel® NUC.

SAWR Basic Mobile Robot

The following is a spec overview of the SAWR basic mobile robot, shown in Figure 6, which is meant to be an inexpensive reference design that is easy to reproduce (the GitHub site includes the files to laser-cut your own frame). The SAWR software stack can be easily adapted to other robot frames. For this design, the slave computers are actually embedded inside the Dynamixel servos. The MCUs in these smart motors take care of low-level issues like position sensing and speed control, making the rest of the robot much simpler.

Computer: Aaeon UP board

Camera: Intel RealSense camera

Actuation: Two Dynamixel MX-12W* smart servos with magnetic encoders

Software: Xubuntu* 16.04 and ROS Kinetic*

Frame: Laser-cut acrylic or POM, Polulo sphere casters, O-ring tires and belt transmission

Other: DFRobot 25W/5V power regulator

Extras: Jabra Speak* 510+ USB speakerphone (for voice I/O, if desired)

Instructions and software: https://github.com/01org/sawr

Figure 6. SAWR basic mobile robot.

One of distinctive parts of the SAWR project is that both the hardware and the software have been developed in an open source style. The software is based on modifying and simplifying the Open Source Robotics Foundation Turtlebot* stack, but adds a custom motor driver using the Dynamixel Linux* SDK. For the hardware, the frame is parametrically modeled using OpenSCAD*, and then converted to laser-cut files using Inkscape*. You can download all the data from GitHub, and then make your own frame using a laser cutter (or a laser-cutter service). Most of other parts are available from a hardware store. Detailed instructions, assembly, and setup plans are available online.

Using an OEM Board for Robotics

When you choose an OEM board for robotics, such as an UP board for SAWR or any other robotics system, using active cooling to get higher performance is strongly recommended. Usually robotics middleware consumes a high level of CPU resources, and lack of CPU resource sometimes will translate into low quality or low speed of autonomous movement. With active cooling, you can maintain the CPU’s highest speed indefinitely. In particular, the UP board can turbo with active cooling and run at a much higher clock rate with it than without.

You may be concerned about power resources for active cooling and higher clock rates. However power consumption is not usually a limiting factor in robotics, because motors are usually the primary power load. In fact, instead of the basic UP board, you can select the UP Squared*, which has much better performance.

Another issue is memory. The absolute minimum is 2 GB, but 4 GB is highly recommended. The SLAM system uses a lot of memory to maintain the world state and position estimate. Remember that the OS needs memory too, and Ubuntu tends to use about 500 MB doing nothing. So a 4 GB system has 7x the available space for applications than a 1 GB system, not just 4x.

ROS Overview

Despite its name, ROS is not an OS, but a middleware software stack that can run on top of various operating systems, although it is primarily used with Ubuntu. ROS supports a distributed, concurrent processing model based on a graph of communicating nodes. Thanks to this basic architecture, you can not only easily network together multiple processing boards on the same robot if you need to, but you can also physically locate boards away from the actual robot by using Wi-Fi* (with some loss of performance and reliability, however). From a knowledge base perspective, ROS has a large community with many existing open source nodes supporting a wide range of sensors, actuators, and algorithms. That and its excellent documentation are good reasons to choose ROS. From a development and debugging perspective, various powerful and attractive visualization tools and simulators are also available and useful.

Basic ROS Concepts

This section covers the primary characteristics of the ROS architecture. To learn more, refer to the ROS documentation and tutorials.

Messages and topics (see Figure 7). ROS uses a publish and subscribe system for sending and receiving data on uniquely named topics. Each topic can have multiple publishers and subscribers. Messages are typed and can carry multiple elements. Message delivery is asynchronous, and it's usually recommended to use this for most interprocess communication in ROS.

Figure 7. Messages and topics.
Service calls (see Figure 8). Service calls use synchronous remote procedure call semantics, also known as “request/response.” When using service calls, the caller blocks communication until a response is received. Due to this behavior, which can lead to various problems such as deadlocks and hung processes, you should consider whether you really need to build your communication with service calls. They are primarily used for updating parameters, where the buffering for messages creates too much overhead (for example, for updating maps) or where synchronization between activities is actually needed.

Figure 8. Service calls.
Actions (see Figure 9). Actions are used to define long-running tasks with goals, the possibility of failure, and where periodic status reports are useful. In the SAWR software stack actions are mainly used for setting the destination goal and monitoring the progress of navigation tasks. Actions generally support asynchronous goal-directed behavior control based on a standard set of topics. In the case of SAWR, you can trigger a navigation action by using Rviz (the visualizer) and the 2D Nav Goal button.

Figure 9. Actions.
Parameters (see Figure 10). Parameters are used to set various values for each node. A parameter server provides typed constant data at startup, and the latest version of ROS also supports dynamic parameter update after node launch. Parameters can be specified in various ways, including through the command line, parameter files, or launch file parameters.

Figure 10. Parameters.
Other ROS concepts. There are several other important concepts relevant to the ROS architecture.
- Packages: Collections of files used to implement or specify a service or node in ROS, built together using the catkin build system (typically).
- Universal Robot Description Format (URDF): XML files describing joints and transformations between joints in a 3D model of the robot.
- Launch files: XML files describing a set of nodes and parameters for a ROS graph.
- Yet Another Markup Language: Used for parameter specification on the command line and in files.

ROS Tools

A lot of powerful development and debug tools are available for ROS. The following tools are typically used for autonomous mobile robots.

Rviz (see Figure 11). Visualize various forms of dynamic 3D data in context: transforms, maps, point clouds, images, goal positions, and so on.

Figure 11. Rviz.
Gazebo. Robot simulator, including collisions, inertia, perceptual errors, and so on.
Rqt. Visualize graphs of nodes and topics.
Command-line tools. Listen to and publish on topics, make service calls, initiate actions. Can filter and monitor error messages.
Catkin. Build system and package management.

ROS Common Modules for Autonomous Movement

The following modules are commonly used for autonomous mobile robots, and SAWR also adopts them as well.

Tf (tf2) (see Figure 12). Coordinate the transform library. It's one of the most important packages for ROS. Thanks to tf, you can manage all coordinate values, including the position of the robot or relations between the camera and wheels. For treating various categories of coordinates, several distinctive concepts such as frame and tree are adopted.

Figure 12. tf frame example.
slam_gmapping. ROS wrapper for OpenSlam's Gmapping. gmapping is one of the most famous SLAM algorithms. While still popular, there are also several alternatives now for this function.
move_base. Core module for autonomous navigation. This package provides various functions, including planning a route, maintaining cost maps, and issuing speed and direction commands for motors.
Robot_state_publisher. Publishes the 3D poses of the robot links, which are important for a manipulator or humanoid. In the case of SAWR, the most important data maintained by this module is the position and orientation of the robot and the location of the camera relative to the robot’s position.

Tips for Building a Custom Robot using the SAWR Stack

SAWR consists of the following subdirectories, which you can use as-is if you want to utilize the complete SAWR software and hardware package (see Figure 13). You can also use them as a starting point for your original robot with the Intel RealSense camera. Also below are tips for customizing the SAWR stack for use with other robot hardware.

sawr_master: Master package, launch scripts.
- Modify if you change another ROS module.
sawr_description: Runtime physical description (URDF files).
- Modify urdf and xacro files according to your robot’s dimension (check with tf tree/frame).
sawr_base: Motor controller and hardware interfacing.
- Prepare your own motor controller and odometry libraries.
sawr_scan: Camera configuration.
- You can use as-is if you use the Intel RealSense camera R200 or ZR300. If you use the Intel RealSense camera D400 series, use ROS Wrapper 2.0 for Intel® RealSense™ Devices.
sawr_mapping: SLAM configuration.
- You can begin as-is if you use the same Intel RealSense camera configuration with SAWR.
sawr_navigation: Move-base configuration.
- Modify and tune parameters of global/local costmap, move_base. This is the most difficult part of tuning your own hardware.

Figure 13. SAWR ROS node graph viewed by rqt_graph.

Conclusion

Autonomous mobile robotics is an emerging area, but the technology for mobile robotics is already relatively mature. ROS is a key framework for robot software development that provides a wide range of modules covering many areas of robotics. The latest version is Lunar—the 12th generation.. Robotics involves all aspects of computer science and engineering, including artificial intelligence, computer vision, machine learning, speech understanding, the Internet of Things, networking, and real-time control—and SAWR project is good start point for developing ROS* based robotics.

About the Author

Sakemoto is an application engineer in the Intel® Software and Services Group. He is responsible for software enabling and also works with application vendors in the area of embedded systems and robotics. Prior to his current job, he was a software engineer for various mobile devices including embedded Linux and Windows*.

↧

Unreal Engine* 4 Optimization Tutorial, Part 4

October 25, 2017, 7:18 am

Latest and popular articles on Intel Technologies

≫ Next: Open book: Fractured Space developed in broad daylight

≪ Previous: Build an Autonomous Mobile Robot with the Intel® RealSense™ Camera, ROS*, and SAWR

Optimization Viewmodes

You can change the way in which the scene is rendered when editing the project by changing the View Mode within the scene view port. Under this tab is the Optimization Viewmodes section; each of these views will help to identify certain optimizations that you can make. We will go over them in the order that they appear.

Figure 1: Finding the selection of optimization viewmodes.

This documentation doesn’t cover the Texture Streaming Accuracy portion of the Optimization Viewmodes, as they are more relevant to the creation of 3D models, but for information on what they show see Epic's documentation on texture streaming.

Light Complexity

The Light Complexity view shows us how our scene is being lit by our lights, the influence area of those lights, and how much cost to performance they can infer throughout the scene.

When we add a point light to the center of the scene, we can see the influence this light has on the complexity of the lighting in the scene.

Figure 2: Single point light complexity.

If we create more point lights in the same area, we begin to see the cost of the overlapping through the visualizer. As more and more lights begin to overlap, the influenced areas will become red, then white.

Figure 3: Light Complexity of multiple point light sources.

The performance hit comes from each pixel in the overlap needing to be shaded by multiple sources. The cost of this on static lights is primarily a hit on the time it takes to bake the lighting of the scene, but when using moveable or stationary lighting the cost of dynamic lighting and shadows is increased.

To reduce the impact on scene performance, lights that cause overlap should be removed, switched to static, or their position and influence should be adjusted.

Figure 4: Lesser complexity by adjusting light radius.

Lightmap Density

The density of the lightmap for objects in your scene shows the texel density of the lighting effects that will be placed on those objects. For the viewport visuals, the greener the color-coded shade, the better the texel density of the lightmap for the object; while blue is less dense and red is denser. Because each lightmap is a texture, your texture streaming budget can be quickly used up if the number of objects in your scene goes up, or you use higher lightmap resolutions to increase the visual fidelity of the render.

Figure 5: Scene view of lightmap density.

Figure 6: Up close view of different lightmap densities.

To change the lightmap density for any object in your scene, adjust the Overridden Light Map Resolution value (default of 64) to another that better fits the texel density of that object. As with other textures, sticking to powers of two (2, 4, 8, 16, 32, 64, and so on) will reduce wasted rendering.

Figure 7: Overridden Light Map Resolution.

Below, we can see a streetlight in the scene with the default lightmap resolution of 64. When its value is switched to 16, we see its color change from red to green; a better density, and cheaper lightmap for streaming.

Figure 8: Object with a dense lightmap.

Figure 9: Same object with an optimal lightmap density.

Stationary Light Overlap

In the Unreal Engine* (UE*) the stationary light plays a unique role in the development process. However, there is a limit to the number of stationary lights that may overlap, and going over that limit causes a hit on performance.

Figure 10: Collection of stationary lights and their overlaps.

When more than four stationary lights overlap, any additional stationary lights are forced to work as moveable lights, increasing in the performance requirement of each additional light. If we switch all the lights to be stationary we can see their influence over each other. The lighter the green, the fewer overlaps.

Figure 11: Stationary lights with too many overlaps.

After four overlaps the color changes to red. Any light involved in the red overlap will be rendered as a moveable light; displayed as a red X over the light’s billboard. To fix this issue, some of the offending lights must be removed, switched to static, or have their influence area reduced or moved.

Figure 12: Stationary lights crossed out, showing that they will be used as moveable lights instead.

Shader Complexity (& Quads)

This view allows you to see the number of shaders being calculated per pixel in your current view, with green being the lowest, and red and white being the highest (most expensive).

If we look at Robo Recall*, we see that the shader complexity of the whole scene is relatively light, with most of the cost going into the street around the city. Epic Games* made a large effort to reduce their shader complex to help hit their 90 frames per second goal for the VR title.

Figure 13: View of a scene with an optimized shaders.

However, looking a forest scene, we see that the shader complexity of the trees and their leaves is ridiculously high and unoptimized.

Figure 14: View of a scene with a large number of complex shaders.

To reduce the complexity of your shaders, the number of materials and the number of texture lookups within those materials needs to be reduced.

In addition, one of the biggest hits to performance with shader complexity comes from opacity, a main component of foliage and particle systems, and the reason the trees have such a high complexity.

Figure 15: Up close view of how transparencies affect shader complexity.

To reduce the number of opacity materials in your scene, you can design around it (Robo Recall has trees that are solar panel towers shaped to look like trees) or use level of detail (LOD) for your foliage and particles to reduce the number of meshes with opacity at a distance, a problem that unfortunately needs to be thought about early in development, or made up for with model production later in the project’s timeline.

Quad Overdraw

When multiple polygons are all overlapping and being rendered within the same quad on the graphics processing unit, the performance of the scene will suffer as these renders are calculated and then unused.

This issue is caused by the complexity of the meshes and the distance they are from the camera; the smaller the visible triangles of a mesh, the more are within a quad. If we look at the two pictures we see Robo Recall, with some overdraw on the edges of models and at a distance, and our forest scene, where there is a large amount of overdraw occurring within the trees around the whole forest. As with shader complexity, opacity is a major performance hit and causes a lot of overdraw.

Figure 16: Scene view of quad-overdraw.

Figure 17: Scene view of a large amount of quad-overdraw.

To reduce the amount of overdraw, use LODs to reduce the poly count of models at a distance, and reduce transparencies in particle systems and foliage.

LOD Coloration

Knowing when your LODs switch is important for the visuals of your game, and the LOD coloration view gives you that information. LODs are crucial to optimizing any title, as they save on rendering details that cannot be seen from a distance and the triangle count of your models. Along with what was previously mentioned, LODs also help with quad overdraw and shader complexity performance issues.

Adding LODs can be done with modeling software or by using auto LOD generation (UE 4.12+), and the switch distance can be adjusted to work for your scene.

Figure 18: Scene view of level of details.

Figure 19: Skeletal mesh lineup of LODs.

Back to Part 3

↧

Open book: Fractured Space developed in broad daylight

October 24, 2017, 8:58 am

Latest and popular articles on Intel Technologies

≫ Next: Midair takes flight

≪ Previous: Unreal Engine* 4 Optimization Tutorial, Part 4

The original article is published by Intel Game Dev on VentureBeat*: Open book: Fractured Space developed in broad daylight. Get more game dev news and related topics from Intel on VentureBeat.

Fractured Space

“I was looking at what people were playing, and changes in the world and it was League of Legends and World of Tanks…and we were in the middle of making a space game. So it was a bit of a movie executive decision: a bit of Tanks, a bit of League, but set in space… Yeah, we could do that.”

That’s how James Brooksby, CEO of Edge Case Games, the studio behind Fractured Space, light-heartedly introduced how this large-scale, space-based capital ship combat game took shape.

But his motivation and direction tapped into greater ambitions including a vision that Fractured Space could be “the biggest space game in the world.”

Brooksby is clearly a keen student of the games industry, its foibles, trends, and fads, and it’s those observations that have powered Edge Case Games. The team has worked together in the U.K. for a number of years under a few different guises, most recently before Fractured Space as Born Ready Games, creating Strike Suit Zero.

Changes in the predilections of gamers as they shifted from PC gaming with Quake and Counter-Strike into console gaming forced adjustments to the type of game being made, and even who had the skillsets at the moment to execute a different plan.

A significant change was the emergence of the free-to-play model for Western developers, as being used by League and Tanks. But while the idea for Fractured Space was percolating in 2012, the team had to wait until it had shipped Strike Suit Zero in 2013 and then chase down funding to build a prototype, meaning this new game didn’t start until 2014.

Edge Case Games have opened their development kimono.
Above: Citing familiar space influences like Star Wars, Star Trek, and Battlestar Galactica, Fractured Space presents everything as big. Start your engines

This group had previously used its own internally developed game engine and “tinkered with Unity, but then Unreal went free, just at the right time, so we were early adopters of that,” says Brooksby.

Possibly the most significant decision made at the inception of Fractured Space was what Brooksby describes as “brave.” In the emerging age of Early Access, coupled with community development through the likes of YouTube and Twitch streamers, the plan was “to build a game entirely in the open… go really, really early access and build it in broad daylight. Warts and all,” says Brooksby.

While designed to incorporate community feedback at every step of development, it wasn’t necessarily a widely lauded decision within the studio. “Three months into development we released what was almost the first playable build on to Steam,” he says, “despite nearly everyone in the studio saying don’t ‘do this,’ and ‘my work is so not finished.”

This strategy didn’t simply impact how much feedback the team would have to manage from its community—alongside the potential of a negative reaction to such an early build and open process. It also had a significant effect on the development timeline, pushing a planned two-year project out much further.

Above: Players control their capital ships and managed loadouts, which means every detail in combat has to be perfectly balanced.

“We learned that you can’t build anywhere near as quickly when you have to fix the bugs, keep the service live, and regularly updated,” says Brooksby.

That prompted development changes as well as tweaks to the Edge Case team to accommodate three important strands. The first, and core aspect of traditional development was creating the new features that would go into the game. Next was addressing and fixing bugs in the live version of the game so that the players would stay engaged and returning, even when they knew this was still very much work-in-progress. The third was producing the live content that would be added into the game over the course of this transparent process.

Brooksby also had to accept that this process would either require a larger team size to maintain the originally anticipated development cycle, or accept it was just going to need the extra time.

“Another side of this was that we also needed different type of people on the team, and we brought in more community managers and someone very senior to manage them. You can’t say you’re going to engage with the community and then just give it lip service,” says Brooksby.

Above: Visually, Fractured Space has a beautiful take on space.

Progressive development

In addition to introducing the free-to-play model and releasing so early while maintaining the game as a live service, Edge Case Games is employing extensive data collection to manage its development. Timelines are split into six-week periods, and at the culmination of each, the teams, community managers, and data analysts come together to identify what the data is saying, and what the community is saying.

Several whiteboards—a familiar sight in any game studio—are plastered with sticky notes plotting the next features to come, those down the line, and the wish list for the future. Data and community feedback can be instrumental on moving some ideas forward in the chain, and some backwards.

“Not only are we very customer-centric, but a wonderful part of modern gaming is the data coming back from everyone playing the game. We take it very seriously, and everyone in the studio gets involved in the different reams of data,” says Brooksby.

Above: If you want to see all the details—warts and all—of game development, Edge Case Games have opened their development kimono.

One of the main additions to the original vision that addresses this new community reality is the spectator camera which is “still not quite where we want it to be,” adds Brooksby. Despite the transparent philosophy, Brooksby did ‘fess up to a couple of secrets they still had to introduce, suggesting they would take advantage of emerging technologies. Whether that’s Twitter integration to command ships, Discord technology allowing communication and more, the community can only speculate at this point. But it’s good to know the fan base that has followed the game from its very early access will have the latest and greatest to support the core gameplay.

For Brooksby, Fractured Space is largely the game that was envisaged back in 2012, and he has a clear path mapped out for its future to support that ambition of making it accessible (a recent optimization pass allowed the game to be played on lower-end PCs and laptops, which proved to have a very positive effect on the number of players joining the game, and more importantly, staying with it.) Remember, they have the data if you stay for just a few seconds!

So was that movie executive decision the right one, the best one? “It wasn’t in a cynical way,” says Brooksby, “but an excited one, because I’d love to play that game.”

And he has been from the very early beginnings.

↧

Midair takes flight

October 24, 2017, 10:08 am

Latest and popular articles on Intel Technologies

≫ Next: Indie games + press + peers = MIX

≪ Previous: Open book: Fractured Space developed in broad daylight

The original article is published by Intel Game Dev on VentureBeat*: Midair takes flight. Get more game dev news and related topics from Intel on VentureBeat.

Midair takes flight

“You don’t get 20 volunteers because they’re doing the same thing, you get 20 volunteers because they’re doing things that are important to their life.”

Chris Matthews, CEO at Archetype Studios, clearly gets where games like this need to be.

And by “like this”, he’s referring to the jetpack shooter, a genre that really launched with Tribes and, ahem, ascended with Tribes Ascend. The format is typical to most PC shooters, but the physics mechanics of skiing and sliding and jumping create a truly different kind of environment, and one that requires different skills to execute effectively.

“We grew up on this stuff,” says Matthews, and that passion led to a new experience. “When you play a game like Midair, it’s a very different experience… you’re not locked to the ground, and it’s much freer. It creates interesting moments you don’t see in other games.”

It’s a tradition that shows no sign of slowing. “It’s a passion that has lasted since the 90s,” he adds. Having worked as interns at Garage Games on Legions Overdrive, the evolution is natural when it’s understood that the community exists to support the gameplay.

And so Matthews began to ask questions: “Why don’t we make a community-based version of this game? Can we a brand new IP of our own thing?” And the evolving world of coverage and monetization allowed this to happen.“We started it as Project Z, building on what we thought the physics and the look should be.”

Kicking off, this group tested with 40 people in a community space “and it was very raw,” says Matthews. But the feedback was positive and engendered more support not only around the community, but around the world. Adding a composer in the U.K. and a graphics designer in Latvia and an art lead in Finland helped prove that point, as well as the global ongoing appeal of a game with jetpack physics.

“And we did the same thing a year later and we had more maps and 80 people. It was very different,” he adds. “And we got great feedback and more interest in the game… so then we had the chance to do a Greenlight campaign.”

Midair kickstarter how it works today

How it works today

Like many projects, approaching Kickstarter and hitting goals was also a huge part of the origin story. “We launched the Kickstarter campaign with a goal of $100,000, by which time our visuals had improved so much. Then after the Kickstarter campaign, we made stretch goals, and people wanted to contribute at a high rate.”

It’s partly driven by nostalgia, but also the fine-tuning of physics that Matthews’ team is implementing to both modernize and retain a conviction to the origins.

“It’s a stratification of the community of people with games they played in the past, such as Legions Overdrive or Tribes Ascend… but there’s still a community who played Tribes 1 and 2…and the way the jetpacks are different, the pieces are different… in many cases you see this feedback conflicting,” says Matthews.

So in this clash of cultures, Matthews finds a lot of detail in research. “We take it to heart,” he says of the challenge to determine what gamers are looking for, and what the evidence suggests they want.

“We realized people were saying it doesn’t feel right,” he says. Then physics entered the equation as the team realized that controls of typical first-person shooters were getting in the way.

Playing the future

Matthews recognizes that his small team is playing in a large pond with a tiny pool. Beyond wishing his team had a Destiny 2-like budget there are many other challenges that he recognizes do not exist in a vacuum. “It’s not a casual shooter,” he says, though he accepts that there needs to be a balance to attract the broader audience alongside those who already “get” the jetpack shooter style.

“We’re trying to observe the successes and failures of other games to see if we can appeal to all jetpack shooter fans,” he adds. Part of that is the research and part of it the observations. “Like muscle memory, you have to go to the gym for three months or so for it to kick in.”

Like Matthew’s father, who was playing Tribes back when it first emerged and experienced those initial trials and tribulations, there’s a clear understanding of the changes in culture, time, and expectations. “We’re talking to people who have never seen this before, or see this as something they can’t achieve. That’s because we want them to try these things because it’s challenging,” he says.

It’s a fascinating challenge of what might be considered old-school style gameplay with what a new audience expects. That also fits with ideas around eSports, where Matthews thinks the game will fit, but won’t push it into a pigeonhole. “We need to prove that we can develop and deliver on a game that players want before we can showcase what we want to do in eSports,” he says.

And that’s where Mid-Air sits in a space that might look familiar, but is actually unique. These are notions not lost on the team. “Not only do we have the nostalgia…but it is also a passion.”

That should be the best combination when a 20-year old franchise is reinvented.

↧

Indie games + press + peers = MIX

October 26, 2017, 9:13 am

Latest and popular articles on Intel Technologies

≫ Next: En Masse: Shifting and sharing with the times

≪ Previous: Midair takes flight

The original article is published by Intel Game Dev on VentureBeat*: Indie games + press + peers = MIX. Get more game dev news and related topics from Intel on VentureBeat.

indie game press peers mix

Of any professional fraternity, the bond between indie game developers is fierce. Whether sharing stories, suggestions, and solutions at meet-ups or simply offering moral support, help is easy to find. But where time-taxed indie devs can struggle is in the nuances of the business: promoting, marketing, and selling games, and generating attention from the games media.

Having experienced these issues first-hand as ‘Creative Domepiece’ at Interabang Entertainment, Justin Woodward recognized an opportunity to help. With MIX already in development as a source of support and promotion for indie developers, Woodward met long-time games industry marketer, Joel Dreskin. Motivated by the notion that the support mechanism that Woodward championed could be expanded to a larger footprint of indie game developers, MIX took its next steps.

“We just wanted to leverage our expertise and knowledge of the industry so that we could help all indie studios,” says Woodward. The outcome of this meeting and further discussion is the Media Indie Exchange (MIX) of which Woodward is Co-founder and Principal and Dreskin is Principal.

The challenges

Stories about success and failure, exultation and heartbreak, blood, sweat, and tears are littered across any collection of stories about indie game studios. Well, game development in general, truth be told. But for commonalities, they pretty much all start with a dream, even if it’s not well fleshed out.

They also often begin with just a handful of passionate folks sharing a vision. Since most teams understand what others are going through in pursuance of their dreams, an incredibly strong bond of fraternity has emerged. Often part of a similar demographic, with familiar skill sets and challenges (and, quite possibly, collective emboldening to “stick it to the Man,” “we’ll show [insert whoever chose not to support or fund a project]” or simply “you like beer? I like beer, too. Let’s go drink beer and talk”) indie developers are happy sharing tales among themselves.

Meet and greet

In these early, pre-MIX days, leading games media outlet IGN supported an incubator on its premises for indie developers around 2012, where Woodward was building Super Comboman, and figuring out how to generate press awareness from the belly of the games media beast.

“I ended up asking where we could hold an event, where we could show the game to press,” says Woodward. A supportive IGN knew what motivates the games press aside from coming to see great games. “So we ordered a bunch of appetizers, we ordered kegs, and we brought in a bunch of tables…and tons of press came,” he adds.

It was the early stages of the concept of an indie open house that would become a special event for showcasing games to press. Establishing a location and providing signage made it look rather impressive, and despite feeling like a special underground club, the open houses became popular.

Above: Meet and greet with press and peers at E3 to showcase top indie games.

The MIXer

“The whole idea was simply working with different devs and publishers to create a really great networking event,” says Woodward. Interested parties like the major platform publishers Sony, Microsoft, and Nintendo, were also included along with others looking to find talent and quality games, or simply support this collective goal.

“Press continues to be the main focus,” says Dreskin, “and that includes streamers as that’s the way coverage of this space is going. But also publishers and channels like Humble Bundle, Valve, and others…all as a method of providing opportunities for networking and building relationships.”

The MIX event at E3 saw some 55 games on display, providing the basis for this meet-and-greet setup to allow the formation of valuable partnerships. The games are also judged by a panel of experts pulled from around the games industry, which can help given an awareness boost to the top games on display. It also adds simple confidence as the games are voted on by peers and experts.

“We’re now doing five or six events a year, with our first one at PAX, and given the nature of that, we’re doing a press-only first hour!” says Dreskin. Support has come from companies like Intel, Gamemaker, Unity, and others “which is great to have companies who simply hope that small developers can succeed,” he adds.

The dominance of the Unity engine and Unreal engine becoming free has been integral to lowering the barrier-to-entry for developer dreamers to take steps down that path. “But with all these tools like Unity, Gamemaker, and Unreal, it can also lead to an over-saturation. Due to the rigorous hours, most small studios didn’t have time to do the press outreach or even communicate with publishers,” says Dreskin. MIX events help solve many of those challenges by putting all the right people in one room at one time.

Adds Woodward, “we’re even conscious that a lot of developers are introverted, so we’re creating an atmosphere that is safe, and not as crazy as something like a PAX.”

The altruistic goal of MIX is simply to provide developers with an avenue to succeed. “There is a style to the games industry and how it needs to be managed, and if you don’t understand it, it’s hard to know where to look for advice,” says Woodward.

MIX handles all the logistics of event space setup, signage, inclusion on the website for low costs that recognize budgetary limitations for most small studios. A one-time per year fee of $30 is required for a game to be submitted for consideration (see the FAQ and contact the MIX folks for more details).

“We just want to help developers succeed in this challenging space,” says Dreskin.

So say us all.

Get advice and find out more about the many MIX events. And see a list of latest events.

More details on the logistics can be found here.

↧

En Masse: Shifting and sharing with the times

October 27, 2017, 9:39 am

Latest and popular articles on Intel Technologies

≫ Next: UEBS: More than penguins vs. the Santa army

≪ Previous: Indie games + press + peers = MIX

The original article is published by Intel Game Dev on VentureBeat*: En Masse: Shifting and sharing with the times. Get more game dev news and related topics from Intel on VentureBeat.

En masse shifting and sharing with the times

“There is no one thing that creates success,” says En Masse CEO Sam Kim.

While you might hear plenty of ‘hear, hear’ responses agreeing with that sentiment from across the game development fraternity, it belies a formula on which En Masse Entertainment has built a powerful and effective string of technologies and strategies.

After a founding in South Korea, and opening West in 2009, En Masse has introduced and expanded the online action-RPG Tera with impressive fortitude and longevity. Emerging at a time when games from South Korea and elsewhere were still trying to find a foothold in the western market, En Masse formed as the North American publishing wing for a global launch.

Positioned as a premium AAA title alongside powerhouse franchises and recent releases like Diablo III and Star Wars: The Old Republic, the challenge was clear and present. But after the 2012 launch, a shift in 2013 to a free-to-play model helped propel Tera’s user-base, and as a result, lay the foundation for a game that continues to thrive in today’s testing marketplace.

Sam Kim CEO En Masse Left: Sam Kim, CEO, En Masse.

“User acquisition, retention, and even monetization—all those really big metrics—shot through the roof,” says Kim of the business adjustment to the free-to-play initiative. “We were able to leverage many actions around retention and engagement…so the scale of influx that came was greater than expected,” he adds.

Just five years ago this model would have likely been considered untenable. But Kim suggests that the community outreach and interaction was key to forging relationships that enabled Tera to succeed. “We found great success in Tera in the constant engagement with communities, in building features, and running events and web systems outside of the game that increased engagement inside the game,” he says.

And ‘outside the game’ is probably a key, under-appreciated aspect of generating a relationship that pays off for the in-game events.

“One thing we had fun with was a Kickstarter parody (where we created a tracking system, tier system, and profile page) that had a communicable format that was based on actions that occurred in-game. It had a thermometer that tracked events in-game that led to physical prizes, invites to events, etc.,” adds Kim.

This process (and commitment to its user-base) was not solely to embolden its players. The team got a big kick out of the reaction, too. “We found it a lot of fun as a publisher, and as gamers ourselves, but it also helped whether it was a bit of PR pick-up, or engagement in the game, or even in monetization.”

Above: Tera continues to enjoy considerable success, several years into its lifespan.

The business of a business

The business of games in the free-to-play market has shifted considerably, which is something a company like En Masse has embraced. As Kim describes it, Twitch serves as “a key piece in the awareness formula.” It also serves as a reminder to more traditional media outlets that their impact is decreasing: “[We see] a decrease in performance marketing (banner ads, more traditional areas). They are getting more expensive, but less effective.”

This returns to the out-of-game enterprise to engage and retain players. “There are so many games launching across platforms that it’s harder to keep someone’s attention,” Kim adds. “So it comes down to how we’re augmenting out-of-game relationship with players.”

Above: En Masse’s newest game, Closers, aims to ape classics like Golden Axe. And that can only be a good thing.

Naturally, it’s not easy. Most game launches are challenging with this ever-evolving climate of coverage and awareness. For a company like En Masse, that means building on its core strengths.

“For Tera 40-50 percent are players coming back to us after a long period of inactivity. We call it a win-back. And a large percent of those are serial win-backs. We know life happens, but if you come back and then you come back again, that’s awesome. That passion, there’s nothing that can fake it or recreate it…it’s authentic.”

While maintaining and embracing the current portfolio, the market moves forward, requiring adjustments and additions. So Tera Console is coming, as is Closers (release date undecided) that Kim describes as “an arcade stand-up beat-em-up style, like Teenage Mutant Ninja Turtles or Golden Axe (if you want to go way, way back). So Persona 5 art style meets the MMORPG aspect, with the level up of RPGs.”

Technology, of course, drives much of the conversation, even if it’s a case of being reticent about the opportunities. “I’ve seen some amazing content that I believe is compelling,” says Kim, adding “I’m a big believer in VR and AR, but it’s gated by that adoption…the more it gets out there the better for developers.”

It’s a notion not unheard among game developers hoping to adopt the next big thing. But current technologies in the works allow a company like En Masse to thrive. “The cloud has been so good for us,” says Kim. “When we launch games, sure, we’re a North American entity, but it’s a global audience. The U.S. might only account for 30-40 percent of our audience, and we get a good amount in Europe and South East Asia. Taking to the cloud opens the doors to the services we can offer to the user across the board. It means we might not have to create physical presences in Brazil or Frankfurt, so with increasingly global infrastructure, we can support those efforts from one location.”

The game is most certainly changing and figuring out the best direction for your company within this environment is key. Short-term benefit or long-term relationship?

As Sam Kim says, “Our vision is one that requires a long-lasting quality relationship with the player. Not one that’s just level one to five and you’re done. But one that would benefit from a long-tail relationship.”

So it be said.

↧

UEBS: More than penguins vs. the Santa army

October 28, 2017, 9:26 am

Latest and popular articles on Intel Technologies

≫ Next: People Can Fly v3.0

≪ Previous: En Masse: Shifting and sharing with the times

The original article is published by Intel Game Dev on VentureBeat*: UEBS: More than penguins vs. the Santa army. Get more game dev news and related topics from Intel on VentureBeat.

uebs more than penguins versus santa army

“It was when I released the video of 11,000 penguins versus the Santa Claus army that I realized that this could be something,” says Robert Weaver, CEO of Brilliant Game Studios, of his truly massive scale combat game, Ultimate Epic Battle Simulator.

Some 1.3 million views on YouTube and, perhaps more tellingly, 12,000 likes to only 543 dislikes (clearly from people without any sense of humor, or PETA members with a disdain for the wholesale slaughter of the adorable squawking buggers…or those who love Santa just a little too much) reveals the heart of the story. When you can pit tens of thousands of characters on screen in, as the title clearly describes, Ultimate Epic Battles, you have YouTube gold.

This wasn’t exactly what Weaver envisioned when he decided to strike out on his own after five years of contract programming work in the games industry, including the preceding two years as a solo independent programmer. “I was doing work for smaller studios and investors, and I was in the middle of probably the best contract I’d had, when I had, I guess, a ‘calling’ to break out and self-publish.”

Above: Is anyone rooting for the army of Santas in this scenario?

It was no random epiphany or crisis of conscience, however, says Weaver. “The reason was the rise of VR. I saw the Vive and immediately knew I had to pursue this market. At the time, the market was pretty empty and I figured that for a single developer, it could pay off.”

That’s an important distinction. Weaver working solo means just that. The contract work and development of UEBS is all his own. The limited studio footprint meant that his prediction of a VR game supporting a solo goal was accurate. “I created a game called The Last Sniper VR, he says, “and that game has sold over 10,000 units, which is pretty good for how small that market is.”

It also afforded Weaver the opportunity to start work on his next VR project, envisioned as an open-world Robin Hood game. “In the game, I wanted large-scale battles,” says Weaver, “so I started figuring out how to increase the number of characters on screen. I was going through a few different ideas, and suddenly had a breakthrough, and I had tens of thousands of characters on screen.”

Working within the Unity engine, this was quite the revelation, and unlike anything that had so far been demonstrated on the platform. Not that Unity took much notice, apparently, as despite this technical marvel built on its engine, the company hasn’t had any contact with Weaver.

Above: Bodies don’t disappear amid these massive battles, meaning they pile into dead-men mountains that alter the battlefield.

That doesn’t matter to Weaver who took his technical breakthrough and ran with it. “The first thing I thought was ‘Wow,’ this is incredible, and could be a game in itself,” he says.

The revelation led Weaver to conclude that the simplest thing he could do with this technology was to make a battle simulator. “I had seen some success of titles with similar concepts, but figured this kind of technology could change the game a lot.”

Not only was this about impressing gamers with unheard of technological accomplishments, but also speaking the language of the audience, particularly as it pertained to comedy. “It’s just so ridiculous the things you can do with those numbers. The entertainment is endless,” he says.

Within a year, Weaver had evolved his Robin Hood VR idea into a successful Early Access release that was engaging successfully with gamers on the platforms they most inhabit. “All the credit of the popularity of this game goes to YouTube,” says Weaver. It was a calculated plan, illustrating a keen understanding of the factors that can help drive engagement with a community and YouTube success. Of course, armies of penguins are going to help every step of the way.

Make ‘em laugh

“I spent a lot of time making videos and promoting them because I was seeing the success, some of the videos getting millions of views. I did spend a great deal of time communicating with the fans, and constantly making video and blog updates,” he says. Using this medium can be daunting and time-consuming for smaller studios, let alone a standalone developer, but Weaver’s commitment to the cause was clearly worth it.

“It’s something you have to do,” he says, clearly seceding a portion of the success away from the technology and gameplay and on to the community, comedy, and potential. “I did regular updates, just trying to blow their minds as often as I could.”

Dealing with the feedback is a process of moderation, patience, and mental fortitude. “Whenever I go on forums I have to limit my time there,” says Weaver. “There’s a lot of positivity, but a lot of negativity. People don’t hesitate to say anything on the internet. People get quite hostile, and I learned that quite quickly. And I learned ways to keep my sanity and keep enjoying what I’m doing.”

A good rule of thumb for most developers is to take both the praise and criticism with a grain of salt, even though it’s easy to focus on the tiny details of the negative and gloss over the praise of the positive. “I do like seeing what people don’t like, because it can help, but if you spend too much time, it can really get to you. Especially when it’s your baby, and you’ve put everything you have into it,” Weaver adds.

Nobody nothing beats Chuck Norris
Above: Nobody…nothing beats Chuck Norris!

For a game like UEBS and its unique premise, it’s hardly surprising that the community contributed ideas that enhanced the vision Weaver presented himself. One such internet-friendly addition was to include the caricature of Chuck Norris in the game (renamed Chunk Norris for copyright and image rights). Long a staple of internet memes across various forums, the Chunk Norris character quickly became a fan favorite in the game. “I believe someone suggested him way back, and I thought that would be great. I pushed those old internet jokes about how Chuck Norris can’t die. Literally, there’s nothing you can do to kill him. People have tried all these combinations. And you think he’ll die because his health is low, but when it gets to zero, it resets!”

Made for mods

With Weaver working on the core game engine and features there was a certain reliance on the community to produce content that would keep the audience engaged. Whether it was throwing 20,000 zombies against 1,300 WWII soldiers, and watching tens of thousands of tracers pepper the battlefield or witnessing Roman legions succumb to a single Chunk Norris, the canvas is there for gamers to paint as they see fit. “I was worried there would not be enough mods,” says Weaver, “and the learning curve involved.”

But the community has flocked to the game to express its creativity, taking advantage of Weaver’s added features like taking control of individuals among the massed ranks so that you can rally troops to execute a particular strategy. “I’m quite sure most people who are importing characters had never opened Unity in their lives,” says Weaver, “but in a matter of hours, people had it figured out.”

Weaver accepts that the process requires some core Unity knowledge, though he has worked to make the tools as easy as possible. “Most of the problems stem from trying to bring a character in that isn’t properly rigged, especially when making a character from scratch…there’s just no way around that learning curve,” he says.

Going forward, with the 1.0 release of the game now in the wild, Weaver’s priority is stability. “There are some obvious problems with the game,” he admits, “particularly with modding, because mods can conflict with each other, and generate errors when importing them. Getting that stable is my current goal…content comes second.”

Above: Added AI routines make the armies navigate more complex environments such as the inner walls of a castle.

And reinforcements are on the way to achieve these goals. Weaver revealed that he is setting up a small studio in Vancouver to both support UEBS and work on the next project. Some options requested by fans—such as having more than eight teams engaged in these massive battles—simply aren’t possible due to technical (in this case memory) challenges. However, Weaver has demonstrated that 100,000 units is possible, though with obvious slow-down, and it all depends on your machine’s horsepower to ensure a smooth gameplay experience.

Still, with content flowing from the community, UEBS has plenty of legs and new, creative, YouTube-friendly situations to explore.

Have fun storming the castle!

↧

People Can Fly v3.0

October 30, 2017, 9:49 am

Latest and popular articles on Intel Technologies

≫ Next: Primordian’s weird and alien world was created by a World of Warcraft art lead

≪ Previous: UEBS: More than penguins vs. the Santa army

The original article is published by Intel Game Dev on VentureBeat*: People Can Fly v3.0. Get more game dev news and related topics from Intel on VentureBeat.

People Can Fly

“Shooters are in our DNA,” says Sebastian Wojciechowski, CEO of independent-then-Epic-Games-owned-then-independent Polish studio, People Can Fly. “If we started People Can Fly again we’d not be able to pitch any other game; not that it’s a bad thing, it’s an important pedigree of our studio,” he says.

Knowing your core skillset and making waves with innovations in that space is a smart way to keep focused and establish your brand identity. For People Can Fly, that identity was formulated back in 2005 when the studio legally came into being, though the original personnel had been working on the first game since 2002.

While the faces may have changed over the past 15 years, that core competency hasn’t, though the studio has had to roll with the punches of an unforgiving industry. The studio’s first game, released in 2004—the critically acclaimed shooter, Painkiller—laid a positive foundation as it pulled on a powerful technology core and laid it on a dynamic action shooter foundation. It threw multiple enemies on-screen at one time, akin to a more hardcore version of the cartoon-like Serious Sam shooter series.

Its popularity in the days before widespread digital distribution still afforded the opportunity to add an expansion pack and Xbox port as the team worked on its sophomore project. The dark, supernatural adventure game Come Midnight was in development for publisher THQ, but was unceremoniously canceled in 2006, throwing the independent studio into a period of serious uncertainty.

A scene from the horror supernatural detective thriller Come Midnight that was canceled by THQ in 2006
Above: A scene from the horror/supernatural/detective thriller, Come Midnight that was canceled by THQ in 2006.

Turning to the Unreal engine, and by association, Epic Games, the team impressed the engine-holders who, of course, also enjoyed design success with the Gears of War franchise. So, version two of People Can Fly started to take shape when they began building out Bulletstorm, a shooter in the vein of Painkiller (and Unreal), but with a game show-style motif of creative combo attacks and team play. Under Epic’s guidance, Bulletstorm was a hit among a certain section of the shooter audience, but didn’t satisfy the story propulsion ideals of others clamoring for the next big thing from the storied developer and publisher.

“The adventure [with Epic Games] was super-positive,” says Wojciechowski. “We got additional experience working with the best team in the game dev industry. For the core of our team, who have been here a long time, they were a part of this adventure and able to get experience working in the U.S. on different games,” he adds.

Making games fly

Wojciechowski is pleased to see the emergence of more studios in Poland as the barrier to entry is lowered by available technology and distribution. In the long-term, it can all serve to help build People Can Fly’s position as a premium employer, not just in Poland, but for anyone looking to work on AAA shooters.

“A lot of small and medium-sized studios have emerged,” he says, “and now we have a couple thousand developers in garage industries of three or four people. Then there are those of 10 to 40 people…so even though we don’t have a lot of support on the educational side, those that are determined can still work on games with their friends.”

This increased pool of largely self-taught talent can also help further People Can Fly, where Wojciechowski reveals that most of the core development team have been employed for an average of nine years. “We have a team experienced in working on multiple AAA productions,” he adds, “so anyone joining the team can get experience that they wouldn’t at any other studio in the country.”

Above: Bulletstorm introduced a broad point-scoring system for the most creative kills, adding value to using the environment as well as core shooting skills.

People Can Fly worked on Bulletstorm and the signature series Gears of War – Judgment — which led in 2012 to Epic buying the remaining shares so that the studio became a wholly-owned subsidiary. “Around 2014, we realized the studio wanted something more. While working with Epic Games was great, we were known for doing our own games, and enjoyed having creative ownership of the project,” says Wojciechowski.

“At the beginning of 2015 we talked to the Epic executives and they were fine with us going our own way. It was the middle of 2015 when People Can Fly version 3.0 emerged” he adds. “We started a new adventure with the philosophy to create games we wanted to play, and work on daily.”

Unannounced…with plans

With the flexibility to pitch publishers—a relatively new experience for a studio that had been wholly owned for the past few years—People Can Fly was determined to stay close to its roots. “I would love to say we had more experience, but PCF is a well-known studio. Publishers know us from creating AAA shooters, and if we pitch in that genre, they expect it will be a good game. Second, we had a very good design pitch. And third, we created a tech demo to showcase what we’re capable of doing,” says Wojciechowski.

While he can’t reveal any more specifics about the game-in-development at this point, he does profess to keeping the team’s eyes on just one important ball. “Talking to other CEOs, they always say the challenge is to keep teams focused on the project. It’s important not to cannibalize it; when you have to deliver a milestone on one project, it can mean another gets behind schedule. It’s a problem to work on two projects under one roof,” he says.

Despite any warning signals, he is confident about the pursuit of the new game, and won’t be swayed by current vogue trends like PUBG and the influence of YouTube and Twitch streamers. “We have a very well thought-out project that we pitched to publishers, and really believe in what we’re doing,” says Wojciechowski, “we have a strong target that we’re marching towards.”

That march involves using the UE4 engine, a technology with which this studio has extensive experience. “We love shooters. But with the pedigree, with our talent, we knew our game had to be a shooter,” he adds.

How version 3.0 of People Can Fly fares depends on all the vagaries of the games industry, which Wojciechowski is quick to accept, though he hopes the project after this one is a sequel. “The one single thing that I believe is important in this industry to succeed is luck! And you don’t buy that,” he says.

No doubt, but that pedigree and focus should ensure that luck doesn’t factor into it.

↧

Primordian’s weird and alien world was created by a World of Warcraft art lead

October 31, 2017, 11:09 am

Latest and popular articles on Intel Technologies

≫ Next: Hands-On AI Part 19: Music Dataset Search

≪ Previous: People Can Fly v3.0

The original article is published by Intel Game Dev on VentureBeat*: Primordian’s weird and alien world was created by a World of Warcraft art lead. Get more game dev news and related topics from Intel on VentureBeat.

Primordians weird and alien world was created by a World of Warcraft art lead

“Someone brought one of the headsets into work, and that was literally the moment I realized I had to work on VR. It was that impressive to me.”

Jason Morris had been working at Blizzard for 10 years as an art lead on games like World of Warcraft and Titan before the sci-fi MMO was cancelled. But the moment he got his hands on a VR headset he knew that was what he had to work on, even if it meant starting from scratch and doing everything slowly by founding his own company, Stonepunk Studios.

Primordian is the result of that; a VR FPS set on an alien world at the center of the universe. It’s very much a world of two halves, as one side is constantly shrouded in darkness, while the other enjoys infinite daylight. This has given rise to two distinct cultures who typically keep to their own half of the planet, but a solar eclipse is coming, inspiring those stuck on the dark side to launch an invasion. Players will be one of those invaders.

Aside from the music, Morris is doing everything himself. “I love every aspect of the process of putting games together,” he says enthusiastically. His experiences, working with engineers, designers, and producers at Blizzard have given him some insights, but there’s also been a lot of trial and error, and a great many YouTube tutorials.

Since getting into VR, Morris has devoured every single FPS that’s been designed for the headsets. He’s been studying them, trying to figure out what mechanics feel most appropriate for Primordian. How should the player move? How much should they be able to interact with the world? There’s been a lot of experimentation.

“I’ve had the game out to people for about a year or so, just to get their thoughts on the smallest things. Like how does it feel picking up and dropping weapons? A lot of the feedback was that it felt sort of clunky, picking up and dropping things, and even though I spent two months working on that system, I ended up scrapping it. It just wasn’t adding to the pick-up-and-play, intuitive feel that I’m looking for.”

His ultimate goal is to make Primordian the sort of game you can understand immediately. You put the headset on, and you know exactly what to do. That’s why he opted for free locomotion, instead of letting players simply teleport. He doesn’t want to have to tell players what to do, however, or break their immersion. That means no tutorials and no obvious interface. He hopes players won’t need them.

The key to creating that immersive, intuitive feel is the world itself. Everything trickles down from there. “I was trying to think about the world as a whole, and one of the great things I learned from working at Blizzard was thinking about the world as a character, the biggest character.” Weapons — the main way players will interact with the game — are the most obvious example of how the world impacts the rest of the experience.

World at war

“First I laid out the areas of the world you could go through, and then I looked at what creatures I would add there, what plants I would make, and then I built all the weapons around what’s there. If there’s a certain tree in an area, or a grub, or another creature, then the local people would probably make their weapons out of it.”

One weapon is simply a hollow log with a grub living inside it. A piece of meat stuck on the front of the log makes the grub spit, and that spit can harm enemies. Most weapons are made out of wood or living creatures, including beetles that act like drones. Morris has created around 30 so far, but there are only going to be 16 in the game. Sometimes he ends up scrapping them because they simply don’t fit with the world, even if they feel really good to use.

Not all weapons are guns or gun-like; there are melee weapons too. The melee system was hard to design for motion controls, Morris recalls. When he let people test it, some would swing wildly, barely looking at what they were doing, while others would merely wiggle their sword. In both cases, they’d still be damaging enemies, but it didn’t seem satisfying.

“It took three months of playing every single day to figure it out. Now, the sword will do damage based on the velocity of your swing, and the rotation. If you swing it slightly, it will be pretty weak, but if you do a really big swing, it will charge the sword and give it two or three times more damage.” It’s a bit of a work out, he admits. It could be that fighting on an alien planet is a good way to keep fit. It’s still a work-in-progress, however, and Morris continues to develop it every day.

Primordian will contain a linear, single-player story, sending players on a quest to destroy temples of light so that the inhabitants of the dark side can take over the whole world

Primordian will contain a linear, single-player story, sending players on a quest to destroy temples of light so that the inhabitants of the dark side can take over the whole world. Morris is taking his inspiration from the likes of Half-Life and Half-Life 2, where the story drew players through the world organically. When it launches in early access in the near future, players will get a taste of that narrative. That’s also when Morris hopes to start working on the multiplayer version again.

Although several maps have been made and concepts played around with, the multiplayer is on the back-burner while Morris works on weapons, balance, and the single-player mode. He’s got some very interesting ideas, though. He envisions a sort of tribe-based deathmatch, where players can pick from a list of character models inspired by enemies and NPCs. Everyone choosing the same model will become part of the same tribe, but others can join them through a ritual.

“There are all sorts of balance problems with it, I’m sure,” Morris confesses. “But I just love the idea of it, that we could end up with nine people against one person, and I’ve had fun moments like that in other games, like Doom multiplayer.”

For the time being, Morris is content to continue experimenting with the rest of the game, as well as using feedback from testers. Whenever he gives out a code, he starts playing again and taking notes, trying to imagine what new players might try to do, or what they would need to know. Even when he doesn’t get any feedback, he’s still happy to know that people are playing.

“What really motivates me is the idea around the world and exploring.”

↧

Hands-On AI Part 19: Music Dataset Search

October 27, 2017, 10:37 am

Latest and popular articles on Intel Technologies

≫ Next: Hands-On AI Part 20: Music Data Collection and Exploration

≪ Previous: Primordian’s weird and alien world was created by a World of Warcraft art lead

A Tutorial Series for Software Developers, Data Scientists, and Data Center Managers

This is the 19th article in the AI Developer Journey Tutorial Series. The previous articles discussed the AI fundamentals and deep learning for images. Now that a set of emotion-eliciting images has been selected, it is time to do the analogous step with the music generation side of the project. The search for a music dataset is the topic of this article.

The goal of this project is to create an app that takes in a set of images, determines the emotional valence of these images, and generates a piece of music that fits this emotion. Given the selected approach to emotion-modulated music generation, see the (Project Planning) article—two set of music data need to be found:

A training data set for the long short-term memory (LSTM) neural network used for melody completion and harmonization.
A set of base melodies, which will be modulated using the emotion-based modulation algorithm (to be further explored in a future article).

Training Dataset for LSTM Neural Network

BachBot* was the model used for melody completion and harmonization¹.

To define a set of criteria for the dataset, this article illustrates some of the potential challenges faced by algorithm-generated music.

A connectionist (neural network) paradigm for generating music uses regularity learning and generalization. However, music as an art form is diverse and anything but regular. Conventions in one genre of music may break the rules of another genre. For example, extended and altered chords are commonplace in jazz, but almost never found in Baroque and Renaissance music. Furthermore, many genres of music have very few or ill-defined regularities. These factors may cause trouble for a model that is trying to learn regularities. Therefore, the selected music should all be from one style that is somewhat regular.

Another important aspect to consider is the complexity of music (number of instruments, harmonic vocabulary, and so on). Clearly, completing only the melody would not be sufficient (or very interesting!) for the purposes of this project. However, generating overly complex music from a single melody line would not yield good results. Thus, a balance of complexity and feasibility is required.

Furthermore, to generate an effective and robust LSTM neural network, a large amount of training data is required. With a large dataset, it is also important that it is in a format that is easily adaptable to code representations—converting from image or pdf files to a usable format is a large project in and of itself!

Lastly, the dataset should be out of copyright (in the public domain) and/or licensed for non-commercial use. It is important to note here that although the music may be in the public domain, the particular arrangement, sheet music or encodings of that music may still be under copyright protection⁵.

Therefore, the criteria is as follows. The training dataset:

Must contain samples of music that have shared regularities.
Must be sufficiently large.
Should be in a format that is easily adapted to code representations.
Should be out of copyright (in public domain), and/or licensed for non-commercial use.
The selected music must have an apt balance of complexity and feasibility.

The dataset used by Bachbot is a collection of chorales written by Johann Sebastian Bach and found in the music21* toolkit. Given this set of criteria, it can be seen why the Bachbot dataset was ideal:

Most music in the Baroque period followed specific guidelines and practices (rules of counterpoint)⁶. Furthermore, chorales share the same structure and arrangement (four voices: soprano (melody), alto, tenor, and bass, grouped in a series of phrases). These standard practices result in a dataset in which samples share many regularities. Additionally, the output of the music-generation algorithm can be tested against these rules to qualitatively evaluate success.

The Bach chorales have a good level of complexity without being infeasible in a music-generation model. Chorales used diatonic (based on a single scale) harmony, and the availability of four voices gives just enough room to make music that is sufficiently interesting.

Bach wrote over 100 chorales in his lifetime. Note that, if this number doesn’t seem very large at first, remember that chorales are just one type of composition. In the music domain, such a number of compositions by one composer of one type of composition is rarely found.

A collection of Bach chorales in MusicXML* format was compiled by Margaret Greentree, and is available as a part of the music21 corpus³. Music21 is a Python* based toolkit for computer-aided musicology that is freely available on the web ². The MusicXML format of the Bach chorales is already a code representation of musical notation! An example of the MusicXML format is shown in Figure 1.

Furthermore, music21 provides a set of tools that allow for easy manipulation of these files. Another advantage of the music21 corpus is that the included music is either out of copyright in the United States and/or are licensed for non-commercial use⁴.

<?xml version="1.0" encoding="UTF-8" standalone="no"?><!DOCTYPE score-partwise PUBLIC"-//Recordare//DTD MusicXML 3.0 Partwise//EN""http://www.musicxml.org/dtds/partwise.dtd"><score-partwise version="3.0"><part-list><score-part id="P1"><part-name>Music</part-name></score-part></part-list><part id="P1"><measure number="1"><attributes><divisions>1</divisions><key><fifths>0</fifths></key><time><beats>4</beats><beat-type>4</beat-type></time><clef><sign>G</sign><line>2</line></clef></attributes><note><pitch><step>C</step><octave>4</octave></pitch><duration>4</duration><type>whole</type></note></measure></part></score-partwise>

Figure 1: An example MusicXML* file that represents ‘middle C’ on the treble clef ⁷.

As you can see, the corpus of Bach chorales found in the music21 toolkit soundly satisfies all the requirements for a training dataset for the LSTM neural network. Hence, the project is able to proceed using Bachbot as the music completion algorithm.

Base Melodies

Five basic melodies were selected after considering several parameters. First of all, as with the training dataset, each melody must be out of copyright (in the public domain) and/or licensed for non-commercial use.

Entertaining and interactive qualities of our project imposes another condition—input melodies have to be popular and recognizable. Amongst the music in the public domain, American, English, and French folk or children’s songs comply with our condition most of all. We have also chosen two author’s melodies because of their high popularity.

Later processing (rearranging algorithm and BachBot) requires melodies to be in a proper source format such as a musical instrument digital interface (MIDI) file or music score. Being rearranged in a particular mood, a melody as a MIDI file will be converted into the MusicXML format for the BachBot.

Our algorithm of rearranging a melody according to the mood is an experimental project, so to decrease the complexity of processing and to obtain a more or less predictable artistic result we have chosen simple, monophonic (one line, unharmonized) melodies. Additionally, as Bachbot was trained on diatonic and tonal-based music, the selected melodies should also follow this musical subsetting for the best result.

It should be noted that a rearranged melody could be already out of public domain. The list of melodies and websites we found our MIDI files on is included in the article ⁸, ⁹, ¹⁰.

List of base melodies:

Aura Lee (George R. Poulton/W. W. Fosdick, 1861)

Happy Birthday to You (Patty and Mildred J. Hill, 1893)

Brother John (unknown/traditional, 1780)

Old McDonald (unknown, 1917)

Twinkle, Twinkle, Little Star (traditional, 1761)

Conclusion

All in all, the search for the music data itself was not a very time-consuming task. However, a lot of careful consideration was required to come up with criteria for a dataset to optimize performance for the purposes of each project. Regardless of the project, datasets should be unrestricted by copyright.

Now that the dataset has been found, the project can proceed to collecting, storing, and processing this data.

References and Links

1. Liang, F. (2016). BachBot: Automatic composition in the style of Bach chorales Developing, analyzing, and evaluating a deep LSTM model for musical style (Unpublished master's thesis, 2016). University of Cambridge.

2. Cuthbert, M., & Ariza, C. (2008). Music21 Documentation. Retrieved May 24, 2017, from http://web.mit.edu/music21/doc/index.html

3. Cuthbert, M., & Ariza, C. (2008). List of Works Found in the music21 Corpus. Retrieved May 25, 2017, from http://web.mit.edu/music21/doc/about/referenceCorpus.html

4. Cuthbert, M., & Ariza, C. (2008). Music21 Authors, Acknowledgments, Contributing, and Licensing. Retrieved May 24, 2017, from http://web.mit.edu/music21/doc/about/about.html

5. Copyright and the Public Domain. (n.d.). Retrieved May 24, 2017, from http://www.pdinfo.com/copyright-law/copyright-and-public-domain.php

6. Zbikowski, L. (2009). Guidelines for Species Counterpoint. Retrieved May 24, 2017, from http://hum.uchicago.edu/classes/zbikowski/species.html

7. Hello World. (n.d.). Retrieved May 25, 2017, from http://www.musicxml.com/tutorial/hello-world/

8. Best Known Popular Public Domain Songs, http://www.pdinfo.com/pd-music-genres/pd-popular-songs.php

9. Folk Songs, http://www.pdmusic.org/folk.html

10. Folklore, http://www.csufresno.edu/folklore/

Find more helpful resources at the Intel® Nervana™ AI Academy.

↧

Hands-On AI Part 20: Music Data Collection and Exploration

October 27, 2017, 10:56 am

Latest and popular articles on Intel Technologies

≫ Next: Hands-On AI Part 21: Emotion-Based Music Transformation

≪ Previous: Hands-On AI Part 19: Music Dataset Search

A Tutorial Series for Software Developers, Data Scientists, and Data Center Managers

This is the 20th article in the AI Developer Journey Tutorial Series and it continues the description of data collection and preparation in articles (Image Data Collection) and (Image Data Exploration) with a discussion on data collection and exploration for the music data. Be sure to check out previous articles in this series for help on team formation, project planning, dataset search, and other related topics.

The goal of this project is to:

Create an application that takes in a set of images.
Extract the emotional valence of the images.
Output a piece of music that fits the emotion.

This project’s approach to creating emotion-modulated music is to use an algorithm (Emotion-Based Music Transformation) to alter a base melody according to a specific emotion, and then harmonize and complete the melody using a deep learning model. To do this, the music datasets that are required are:

A training dataset for the melody completion algorithm (Bach chorales).
A set of popular melodies that serve as a template for emotion modulation.

Music Data Collection and Exploration

Bach Chorales—Music21* Project¹

The choice to use Bach chorales as the training dataset for the music generation was explained in detail in the (Music Dataset Search) article. In that article, the music21* project was briefly introduced. Here, the music21 corpus access will be discussed in more detail.

Music21 is a Python* based toolkit for computer-aided musicology, and includes a complete collection of Bach chorales as a part of its core corpus. Thus, data collection was as simple as installing the music21 toolkit (instructions available for macOS*, Windows*, and Linux*).

Once installed, the set of Bach chorales may be accessed using the following code:

from music21 import corpus
for score in corpus.chorales.Iterator(numberingSystem='bwv', returnType='stream'):
    pass
    # do stuff with scores here

Figure 1: Iterating through all Bach chorales.

Alternatively, the following code returns a list of the filenames of all Bach chorales, which can then be processed with the parse function:

from music21 import corpus
chorales = corpus.getBachChorales()
score  = corpus.parse(chorales[0])
# do stuff with score

Figure 2: Getting a list of all Bach chorales.

Exploring the Data

Once a dataset has been collected (or accessed in this case), the next step is to examine and explore the features of this data.

The following code will display a text representation of the music file:

>>> from music21 import corpus>>> chorales = corpus.getBachChorales()>>> score = corpus.parse(chorales[0])>>> score.show('text')

{0.0} <music21.text.TextBox "BWV 1.6  W...">
{0.0} <music21.text.TextBox "Harmonized...">
{0.0} <music21.text.TextBox "PDF ©2004 ...">
{0.0} <music21.metadata.Metadata object at 0x117b78f60>
{0.0} <music21.stream.Part Horn 2>
    {0.0} <music21.instrument.Instrument P1: Horn 2: Instrument 7>
    {0.0} <music21.stream.Measure 0 offset=0.0>
        {0.0} <music21.layout.PageLayout>
        {0.0} <music21.clef.TrebleClef>
        {0.0} <music21.key.Key of F major>
        {0.0} <music21.meter.TimeSignature 4/4>
        {0.0} <music21.note.Note F>
    {1.0} <music21.stream.Measure 1 offset=1.0>
        {0.0} <music21.note.Note G>
        {0.5} <music21.note.Note C>
        {1.0} <music21.note.Note F>
        {1.5} <music21.note.Note F>
        {2.0} <music21.note.Note A>
        {2.5} <music21.note.Note F>
        {3.0} <music21.note.Note A>
        {3.5} <music21.note.Note C>
    {5.0} <music21.stream.Measure 2 offset=5.0>
        {0.0} <music21.note.Note F>
        {0.25} <music21.note.Note B->
        {0.5} <music21.note.Note A>
        {0.75} <music21.note.Note G>
        {1.0} <music21.note.Note F>
        {1.5} <music21.note.Note G>
        {2.0} <music21.note.Note A>
        {3.0} <music21.note.Note A>
    {9.0} <music21.stream.Measure 3 offset=9.0>
        {0.0} <music21.note.Note F>
        {0.5} <music21.note.Note G>
.
.
.>>> print(score)<music21.stream.Score 0x10bf4d828>

Figure 3: Text representation of a chorale.

Figure 3 shows a text representation of the chorale as a music21.stream.Score object. While it is interesting to see how music21 represents music in code, it is not very helpful for the purpose of examining the important features of the data. Therefore, a software that can visualize the scores is required.

As mentioned in Emotion Recognition from Images Model Tuning and Hyperparameters, scores in the music21 corpus are stored as MusicXML* files (.xml or .mxl). A free application that can view these files in staff notation is Finale NotePad* ² (an introductory version of the professional music notation suite Finale*). Finale NotePad is available for Mac and Windows. When installing NotePad on macOS, in some cases it may prevent software from installing if it has not been digitally signed for the OS. Take a look at (Mac) Security settings prevent running MakeMusic installers to avoid this problem. Once Finale Notepad is downloaded, run the following code to configure music21 with Finale Notepad:

>>> import music21>>> music21.configure.run()

We can now run the same code as in Figure 3 but with score.show() instead of score.show(‘text’). This will open up the MusicXML file in Finale, which looks like this:

Figure 4: First page of a Bach chorale in staff notation.

This format gives a clearer visual representation of the chorales. Looking at a couple of the chorales confirms that the data is what we expected it to be: Short pieces of music with (at least) four parts (soprano, alto, tenor, and bass), separated into phrases by fermatas.

A common thing to do as a part of data exploration is to calculate some descriptive statistics. In this case we could find out how many times each key or pitch is used in the corpus. An example of how to calculate and visualize the number of times each key is used is shown below.

from music21 import*
import matplotlib.pyplot as plt

chorales = corpus.getBachChorales()
dict = {}

for chorale in chorales:
   score = corpus.parse(chorale)
   key = score.analyze('key').tonicPitchNameWithCase
   dict[key] = dict[key] + 1 if key in dict.keys() else 1

ind = [i for i in range(len(dict))]
fig, ax = plt.subplots()
ax.bar(ind, dict.values())
ax.set_title('Frequency of Each Key')
ax.set_ylabel('Frequency')
plt.xticks(ind, dict.keys(), rotation='vertical')
plt.show()

Figure 5: Frequency of each key in the corpus. Minor keys are labelled as lowercase and major keys are labelled as uppercase letters. Flats are notated with a ‘-’.

Below are some other statistics about the corpus.

Figure 6: Distribution of pitches used over the corpus³.

Figure 7: Note occurrence positions calculated as offset from the start of measure in crotchets³.

The descriptive statistics that are interesting to calculate will differ for each project. However, they can generally help to get a grasp on what kind of data you have, and even guide certain steps in the preprocessing. These statistics can also serve as a baseline to see the effects of preprocessing on the data.

Base Melodies

Musical instrument digital interface (MIDI) files for the base melodies were simply downloaded from the Internet (for a discussion on the selection process and links see the article (Music Dataset Search).

Conclusion

Data collection for the music data was a relatively straightforward process and only involved installing the music21 toolkit. Exploration of the dataset involved looking at the different representations of the score as well as calculating descriptive statistics on the data.

Now, as all the relevant data has been collected and explored, the project can move on to the exciting part of implementing the deep learning models!

References and Links

1. Cuthbert, M., & Ariza, C. (2008). Music21 Documentation. Retrieved May 24, 2017, from http://web.mit.edu/music21/doc/index.html

2. Finale NotePad [Computer software]. (2012). Boulder, CO: Makemusic. https://www.finalemusic.com/products/finale-notepad/

3. Liang, F. (2016). BachBot: Automatic composition in the style of Bach chorales Developing, analyzing, and evaluating a deep LSTM model for musical style (Unpublished master's thesis, 2016). University of Cambridge.

Find more helpful resources at the Intel® Nervana™ AI Academy.

↧

Hands-On AI Part 21: Emotion-Based Music Transformation

October 28, 2017, 11:02 am

Latest and popular articles on Intel Technologies

≫ Next: Hands-On AI Part 22: Deep Learning for Music Generation 1—Choosing a Model and Data Preprocessing

≪ Previous: Hands-On AI Part 20: Music Data Collection and Exploration

A Tutorial Series for Software Developers, Data Scientists, and Data Center Managers

In this article we present our ideas about emotional-based transformations in music. In the first part, theoretical aspects are presented, including score examples for each transformation. The second part consists of brief notes about the implementation process—tools used, challenges, and limitations that we faced.

Theory

There are two basic mediums of expression in music—pitch and rhythm. These mediums will be used as parameters to rewrite our melody in the selected mood.

In musical theory, when we speak about pitch in melody, we speak about relations between tones. A system of musical notes based on a pitch sequence is called scale. Intervals or a measure of each sequence step width can be different. This difference or its absence creates relations between tones and melodic tendencies, where stabilities and attractions are perceived as expressions of mood. In western musical tradition, when it comes to a simple diatonic scale, the position of a tone on a scale, relative to the first note, is called a scale degree (I-II-III-IV-V-VI-VII). According to stabilities and attractions, scale degree provides a tone with its function in a system. This makes the scale degree concept very useful in analysis of simple melodic pattern and in coding it with the possibility of assigning different values.

On the primary level, we have to choose artistically suitable scales to create any particular mood. So, if we need to change the mood of a melody, we have to investigate its functional structure using the described scale concept and then assign new values to the existing scale degree pattern. Like a map, this pattern also contains information about directions and periods in a melody.

For every particular mood, we will use some extra parameters to make our new melodies more expressive, harmonica, and meaningful.

Rhythm is a way to organize sounds in time. It includes such information as order in which tones appear, their relative length, pauses between them, and accents. This is how periods are created to structure musical time. If we need to save an original melodic pattern, we need to save its rhythmical structure. Only a few parameters in rhythm are needed to be changed—length of notes and pauses—but still, this would be right enough artistically. Also, to make our rewritten melody more expressive, we can take into account several extra ways to accent the mood.

ANXIETY could be expressed through the minor scale and more energetic rhythm. Our original scale degree pattern is V-V-VI-V-I-VII V-V-VI-V-II-I V-V-V(upper)-III-I-VII-VI IV-IV-III-I-II-I.

Figure 1: Original pattern.

Original melody is written in a major scale that differs from minor by three notes—in the major scale III, VI, and VII degrees are major (high), and in the minor scale they are a half-tone lower. So if we need to change the scale, we just need to replace high degrees with low. But to create a more specific effect of anxiety, we need to keep our VII degree major (high)—this will increase the instability of this tone and sharpen our scale in a specific way.

To make rhythm more energetic, we can create a syncopation, or an off-beat interruption of the regular flow of the rhythm by changing the position of some notes. In this case, we will move several similar notes one beat further.

Figure 2: ANXIETY transformation.

SADNESS can also be expressed through the simple minor scale, but should be rhythmically calm. So we replace major degrees by minor, including VII. To make rhythm calmer, we need to fill in pauses by extending the length of notes before them.

Figure 3: SADNESS transformation.

To express AWE, we should avoid determined and strict intonations—this will be our principle of converting. As you can see, according to the difference in sequence, steps degrees have different meaning and destination from the first (I) degree; it creates their meaning. So movement from the IV to the I degree is very direct because of its function. Intonation from the V to the I also sounds very direct. We will avoid those two intonations to create an effect of space and generality.

So in every piece of the pattern, where the V goes to the I, the IV goes to the I, or vice versa, we replace one of these notes or both with other closest degrees. Rhythm can be changed the same way as in the creation of SADNESS effect—just by slowing down the tempo.

Figure 4: AWE transformation.

DETERMINATION is about powerful movement, so the simplest way to show it is to change rhythm the same way as we did for ANXIETY, but we also need to cut the length of all notes except for every last note of every period, V-V-VI-V-I-VII V-V-VI-V-II-I.

Figure 5: DETERMINATION transformation.

Major scale sounds positive and joyfully as such, but to accent and express HAPPINESS/JOY we use a major pentatonic scale. It consists of the same steps except with two degrees—the fourth and the seventh, I-II-III-V-VI.

So every time we meet those two degrees in our pattern, we replace them with the closest ones. To accent the simplicity that our scale provides, we use a descending melodic piece consisting of five or more notes as a place to represent our scale, step-by-step.

Figure 6: JOY transformation.

TRANQUILITY/SERENITY can be expressed not just by changing scale, but through reorganizing melodic movement. For this, we need to analyze the original pattern and define similar segments. The first note of every segment defines the harmonical context of the phrase; that is why we need those most of all: V-V-VI-V-I-VII V-V-VI-V-II-I V-V-V-III-I-VII-VI IV-IV-III-I-II-I.

For the first segment we should use only these degrees, IV-V-I-VII; for the second, V-VII-II-I; the third, VI-VII-III-II; and the fourth, VII-IV-I-VII.

These sets of possible degrees are actually another type of musical structure, chords. But still, we can use them as a system to reorganize a melody. Degrees can be replaced by the closest ones from these prescribed chord patterns. If the whole segment starts from a lower tone than it was originally, we need to replace all degrees with lower ones available from the prescribed pattern. Also, to create a delay effect, we need to split the length of every note on eighth notes, and gradually decrease the velocity of those new eighth notes.

Figure 7: TRANQUILITY/SERENITY transformation.

To accent GRATITUDE, we need to use stylistic reference in rhythm, creating an effect of arpeggio; we will come back to the first note at the end of every segment (phrase). We need to cut a half-length of every last note in the segment and place there the first one of this segment.

Figure 8: GRATITUDE transformation.

Practice

Python* and music21* toolkit

The transformational script was implemented by using Python* and the music21* toolkit ¹.

Music21 presents a very flexible and high-level class for manipulations with musical conceptions such as notes, measure, chord, scale, and so on. It makes it possible to operate directly in the domain area, compared to low-level manipulations with a raw musical instrument digital interface (MIDI) data from a file. However, direct work with MIDI files in music21 isn’t always suitable, especially when it comes to score visualization ². Therefore, for needs of visualization and algorithm implementation, a more convenient way is to convert source MIDI files to musicXML* ³. Moreover, the musicXML format is an input format for BachBot*, which is the next stage in our processing chain.

Conversion could be performed via Musescore* ⁴:

for musicXML out:
```
musescore input.mid -o output.xml
```
for MIDI out:
```
musescore input.mid -o output.mid
```

Jupyter*

The music21 toolkit is well integrated with Jupyter* ⁵. As well, integration with Musescore enables showing the score right in the Jupyter notebook and listening to the results via an integrated player during the development and experimentation process.

Figure 9: Jupyter* notebook with code, score, and player.

The Score Show feature is especially useful for the teamwork of the programmer and musician-theoretician. The combination of Jupyter's interactive nature, domain-specific representation of music21, and simplicity of the Python language makes this workflow especially promising for this kind of сross-disciplinary research.

Implementation

Transformational script was implemented as a Python module, thus it enables performing a direct call:

python3 emotransform.py --emotion JOY input.mid

Or, via an external script (or Jupyter):

from emotransform import transform transform('input.mid','JOY')

In both cases the result is an emotion-modulated file.

Transformations, related with notes degree changes—ANXIETY, SADNESS, AWE, JOY—are based on using the music21.Note.transpose function in combination with analysis of current and target note degree position. Here we use the music21.scale module and its functions for building the needed scale from any root note ⁶. For retrieving the root note of a particular melody we can use the function analyze('key') from the music21.Stream module ⁷.

Phrase-based transformations—DETERMINATION, GRATITUDE, TRANQUILITY/SERENITY—are required to be additionally researched. The research will allow us to detect the beginning and the ending of phrases strictly.

Conclusion

In this article we presented the core idea behind emotion-based music transformation—manipulation with position of a particular note on a scale relative to the tonic (note degree), piece tempo, and musical phrase. The idea was implemented as Python script. However, theoretical ideas are not always easy to implement in the real world, so we met some challenges and possible directions for future research. This research mostly related to musical phrase detection and its transformations. The right choice of tools (music21) and research in the field of music information retrieval is key for resolving such tasks.

Emotion-based transformation is the first stage in our music processing chain; the next one is feeding transformed and prepared melody to BachBot.

References and Links

1. music21: a toolkit for computer-aided musicology, http://web.mit.edu/music21/

2. Parsing MIDI Files, Parsing midi files in Music21

3. musicXML, http://www.musicxml.com/

4. Create, play and print beautiful sheet music, https://musescore.org/

5. Jupyter, http://jupyter.org/

6. music21.scale, Music21 scale module

7. music21 Stream, Music21 Stream.analyze

Find more helpful resources at the Intel® Nervana™ AI Academy.

↧

Hands-On AI Part 22: Deep Learning for Music Generation 1—Choosing a Model and Data Preprocessing

October 29, 2017, 11:09 am

Latest and popular articles on Intel Technologies

≫ Next: Hands-On AI Part 23: Deep Learning for Music Generation 2—Implementing the Model

≪ Previous: Hands-On AI Part 21: Emotion-Based Music Transformation

A Tutorial Series for Software Developers, Data Scientists, and Data Center Managers

This is the 22nd article in the Hands-On AI Developer Journey Tutorial Series and it focuses on the first steps in creating a deep learning model for music generation, choosing an appropriate model, and preprocessing the data.

This project uses the BachBot* model¹ to harmonize a melody that has been through the emotion-modulation algorithm.

Music Generation—Thinking About the Problem

The first step in solving many problems using artificial intelligence is reducing it to a fundamental problem that is solvable by artificial intelligence. One such problem is sequence prediction, which is used in translation and natural language processing applications. Our task of music generation can be reduced to a sequence prediction problem, where we are predicting a sequence of musical notes.

Choosing a Model

There are a number of types of neural networks to consider for a model: feedforward neural network, recurrent neural network, and the long short-term memory network.

Neurons are the basic abstractions that are combined to form neural networks. Essentially, a neuron is a function that takes in an input and returns an output.

Figure 1: A neuron¹.

Layers of neurons that take in the same input and have their outputs concatenated can be combined to make a feedforward neural network. A feedforward achieves strong performance due to the composition of nonlinear activation functions throughout many layers (described as deep).

Figure 2: A feedforward neural network¹.

A feedforward neural network works well in a wide variety of applications. However, a drawback that prevents it from being useful in the music composition (sequence prediction) task is that it requires a fixed input dimension (music can vary in length). Furthermore, feedforward neural networks do not account for previous inputs, which makes it not very useful for the sequence prediction task! A model that is better suited for this task is the recurrent neural network (RNN).

RNNs solve both of these issues by introducing connections between the hidden nodes so that the nodes in the next time step can receive information from the previous time step.

Figure 3: An unrolled view on an RNN¹.

As can be seen in the figure, each neuron now takes in both an input from the previous layer, and the previous time point.

A technical problem faced by RNNs with larger input sequences is the vanishing gradient problem, meaning that influence from earlier time steps are quickly lost. This is a problem in music composition as there important long-term dependencies that need to be addressed.

A modification to the RNN called long short-term memory (LSTM) can be used to solve the vanishing gradient problem. It does this by introducing memory cells that are carefully controlled by three types of gates. Click here for details on Understanding LSTM Networks³.

Thus, BachBot proceeded by using an LSTM model.

Preprocessing

Music is a very complex art form and includes dimensions of pitch, rhythm, tempo, dynamics, articulation, and others. To simplify music for the purpose of this project, only pitch and duration were considered. Furthermore, each chorale was transposed to the key of C major or A minor, and note lengths were time quantized (rounded) to the nearest semiquaver (16th note). These steps were taken to reduce the complexity and improve performance while preserving the essence of the music. Key and time normalizations were done using the music21* library⁴.

def standardize_key(score):"""Converts into the key of C major or A minor.
    Adapted from https://gist.github.com/aldous-rey/68c6c43450517aa47474"""
    # conversion tables: e.g. Ab -> C is up 4 semitones, D -> A is down 5 semitones
    majors = dict([("A-", 4),("A", 3),("B-", 2),("B", 1),("C", 0),("C#",-1),				("D-", -1),("D", -2),("E-", -3),("E", -4),("F", -5),("F#",6),				("G-", 6), ("G", 5)])
    minors = dict([("A-", 1),("A", 0),("B-", -1),("B", -2),("C", -3),("C#",-4),				("D-", -4),("D", -5),("E-", 6),("E", 5),("F", 4),("F#",3),					("G-",3),("G", 2)])

    # transpose score
    key = score.analyze('key')
    if key.mode == "major":
        halfSteps = majors[key.tonic.name]
    elif key.mode == "minor":
        halfSteps = minors[key.tonic.name]
    tScore = score.transpose(halfSteps)

    # transpose key signature
    for ks in tScore.flat.getKeySignatures():
        ks.transpose(halfSteps, inPlace=True)
    return tScore

Figure 4: Code to standardize key signatures of the corpus into either C major or A minor ².

Quantizing to the nearest semiquaver was done using the music21’s function, Stream.quantize(). Below is a comparison of the statistics about the dataset before and after preprocessing:

Figure 5: Use of each pitch class before (left) and after preprocessing (right). Pitch class refers to pitch without regard for octave ¹.

Figure 6: Note occurrence positions before (left) and after preprocessing (right)¹.

As can be seen in Figure 5, transposition of key into C major and A minor had a large impact on the pitch class used in the corpus. In particular there are increased counts for the pitches in C major and A minor (C, D, E, F, G, A, B). There are smaller peaks at F# and G# due to their presence in the ascending version of A melodic minor (A, B, C, D, E, F#, and G#). On the other hand, time quantization had a considerably smaller effect. This is due to the high resolution of quantization (analogous to rounding to many significant figures).

Encoding

Once the data has been preprocessed, the chorales needed to be encoded into a format that can be easily processed by an RNN. The format that is required is a sequence of tokens. The BachBot project opted for encoding at note level (each token represents a note) instead of the chord level (each token represents a chord). This decision reduced the vocabulary size from 128⁴ potential chords to 128 potential notes, which improves performance.

An original encoding scheme was created for the BachBot project ¹. A chorale is broken down into semiquaver time steps, which are called frames. Each frame contains a sequence of tuples representing the musical instrument digital interface (MIDI) pitch value of the note, and whether it is tied to a previous note at the same pitch (note, tie). Notes within a frame are ordered by descending pitch (soprano → alto → tenor → bass). Each frame may also have a fermata that signals the end of a phrase, represented by (.). START and END symbols are appended to the beginning and end of each chorale. These symbols cause the model to initialize itself and allow the user to determine when a composition is finished.

START
(59, True)
(56, True)
(52, True)
(47, True)
|||
(59, True)
(56, True)
(52, True)
(47, True)
|||
(.)
(57, False)
(52, False)
(48, False)
(45, False)
|||
(.)
(57, True)
(52, True)
(48, True)
(45, True)
|||
END

Figure 7: Example encoding of two chords. Each chord is a quaver in duration, and the second one has a fermata. ‘|||’ represents the end of a frame¹.

def encode_score(score, keep_fermatas=True, parts_to_mask=[]):"""
    Encodes a music21 score into a List of chords, where each chord is represented with
    a (Fermata :: Bool, List[(Note :: Integer, Tie :: Bool)]).
    If `keep_fermatas` is True, all `has_fermata`s will be False.
    All tokens from parts in `parts_to_mask` will have output tokens `BLANK_MASK_TXT`.
    Time is discretized such that each crotchet occupies `FRAMES_PER_CROTCHET` frames."""
    encoded_score = []
    for chord in (score
            .quantize((FRAMES_PER_CROTCHET,))
            .chordify(addPartIdAsGroup=bool(parts_to_mask))
            .flat
            .notesAndRests): # aggregate parts, remove markup
        # expand chord/rest s.t. constant timestep between frames
        if chord.isRest:
            encoded_score.extend((int(chord.quarterLength * FRAMES_PER_CROTCHET)) * [[]])
        else:
            has_fermata = (keep_fermatas) and any(map(lambda e: e.isClassOrSubclass(('Fermata',)), chord.expressions))

            encoded_chord = []
            # TODO: sorts Soprano, Bass, Alto, Tenor without breaking ties
            # c = chord.sortAscending()
            # sorted_notes = [c[-1], c[0]] + c[1:-1]
            # for note in sorted_notes:
            for note in chord:
                if parts_to_mask and note.pitch.groups[0] in parts_to_mask:
                    encoded_chord.append(BLANK_MASK_TXT)
                else:
                    has_tie = note.tie is not None and note.tie.type != 'start'
                    encoded_chord.append((note.pitch.midi, has_tie))
            encoded_score.append((has_fermata, encoded_chord))

            # repeat pitches to expand chord into multiple frames
            # all repeated frames when expanding a chord should be tied
            encoded_score.extend((int(chord.quarterLength * FRAMES_PER_CROTCHET) - 1) * [
                (has_fermata,
                    map(lambda note: BLANK_MASK_TXT if note == BLANK_MASK_TXT else (note[0], True), encoded_chord))
            ])
    return encoded_score

Figure 8: Code used to encode a music21* score using the specified encoding scheme².

Conclusion

This article discussed some of the early steps in implementing a deep learning model using BachBot as an example. In particular, it discussed the advantages of RNN/LSTM for music composition (which is fundamentally a problem in sequence prediction), and the critical steps of data preprocessing and encoding. Because the steps taken for preprocessing and encoding are different in each project, we hope that the considerations described in this article will be helpful.

Check out the next article for information about training/testing the LSTM model for music generation and how this model is altered to create a model that completes and harmonizes a melody.

References and Links

1. Liang, F., Gotham, M., Johnson, M., and Shotton, J. (2017). Automatic stylistic composition of Bach chorales with deep LSTM. 18th International Society for Music Information Retrieval Conference, Suzhou, China.
https://ismir2017.smcnus.org/wp-content/uploads/2017/10/156_Paper.pdf

2. Liang, F. (2016) BachBot Datasets [Computer program]. Available at https://github.com/feynmanliang/bachbot/blob/master/scripts/datasets.py

3. Olah, Christopher (2015) Understanding LSTM Networks [Web log post]. (2015). Retrieved from http://colah.github.io/posts/2015-08-Understanding-LSTMs/.

4. Cuthbert, M., & Ariza, C. (2008). Music21 Documentation. Retrieved May 24, 2017, from http://web.mit.edu/music21/doc/index.html.

Find more helpful resources at the Intel® Nervana™ AI Academy.

↧

Hands-On AI Part 23: Deep Learning for Music Generation 2—Implementing the Model

October 30, 2017, 11:27 am

Latest and popular articles on Intel Technologies

≫ Next: Using Intel® Math Kernel Library with Arduino Create

≪ Previous: Hands-On AI Part 22: Deep Learning for Music Generation 1—Choosing a Model and Data Preprocessing

A Tutorial Series for Software Developers, Data Scientists, and Data Center Managers

At this point in the tutorial, all the relevant datasets have been found, collected, and preprocessed. For more information about these steps please check out the earlier articles in this series. The BachBot*¹ model was used to harmonize the melody. This article describes the processes of defining, training, testing, and modifying BachBot.

Defining a Model

In the previous article, (Deep Learning for Music Generation 1-Choosing a Model and Data Preprocessing), it was explained that the problem of automatic composition could be reduced to a problem of sequence prediction. In particular, the model should predict the most probable next note, given the previous notes. This type of problem is best suited for a long short-term memory (LSTM) neural network. Formally, the model should predict P(x_t+1 | x_t, h_t-1), a probability distribution of the possible next notes (x_t+1) given the current token (x_t), and the previous hidden state (h_t-1). Interestingly, this is the exact same operation performed by recurrent neural network (RNN) language models.

In composition, the model is initialized by the START token (see the previous article for more about the encoding scheme), and then picks the next most-likely token to follow it. After this, it continues to pick the most probable next token using the current note and the previous hidden state until it generates the END token. There are temperature controls, which introduce a degree of randomness to prevent BachBot from composing the same piece over and over again.

Loss

In training a prediction model, there is typically a function that should be minimized, called loss, that describes the difference of the model’s prediction to the ground truth. BachBot chose to minimize cross entropy loss between the predicted distribution (xt+1) and the actual target distribution. Cross entropy loss is a good starting point for a wide range of tasks, but in some cases you may have your own loss function. Another valid approach is to try different loss functions and keep the model that minimizes the actual loss in validation.

Training/Testing

In training the RNN, BachBot used to correct the token as x_t+1, instead of the prediction of the model. This process, known as teacher forcing, is used to aid convergence, as the model’s predictions will naturally be poor in the beginning of training. In contrast, during validation and composition, the prediction of the model (x_t+1) should be reused as input for the next prediction.

Other Considerations

Practical techniques that were used in this model to improve performance, and are common in LSTM networks, are gradient norm clipping, dropout, batch normalization, and truncated backpropagation through time (BPTT).

Gradient norm clipping mitigates the problem of the exploding gradient (the counterpart to the vanishing gradient problem, which was solved by using an LSTM memory cell architecture). When gradient norm clipping is used, gradients that exceed a certain threshold are clipped or scaled.

Dropout is a technique that causes certain neurons to randomly turn off (dropout) during training. This prevents overfitting and improves generalization. Overfitting is a problem that occurs when the model becomes optimized for the training dataset, and is less applicable to samples outside of the training dataset. Dropout often worsens training loss, but improves validation loss (more on this later).

Computing the gradient of an RNN on a sequence of length 1000 costs the equivalent of a forward and backward pass on a 1000 layer feedforward network. Truncated BPTT is used to reduce the cost of updating parameters in the training process. This means that errors are only propagated a fixed number of time steps backward. Note that learning long-term dependencies are still possible when using BPTT, as the hidden states have already been exposed to many previous time steps.

Parameters

The parameters that are relevant in RNN/LSTM models are:

The number of layers. As this increases, the model may become more powerful but slower to train. Also, having too many layers may result in overfitting.
The hidden state dimension. Increasing this may improve model capacity, but can cause overfitting.
Dimension of vector embeddings
Sequence length/number of frames before truncating BPTT.
Dropout probability. The probability that a neuron drops out at each update cycle.

Finding the optimal set of parameters will be discussed later in the article.

Implementation, Training and Testing

Choosing a Framework

Nowadays, there are many frameworks that help to implement machine learning models in a variety of languages (even JavaScript*!). Some popular frameworks are scikit-learn*, TensorFlow*, and Torch*.

Torch³ was selected as the framework for the BachBot project. TensorFlow was tried first, however it used unrolled RNNs at the time, which overflowed the graphics processing unit’s (GPU’s) RAM. Torch is a scientific computing framework that runs on the speedy language LuaJIT*. Torch has great neural network and optimization libraries.

Implementing and Training the Model

Implementation will clearly vary depending on the language and framework you end up choosing. To see how LSTMs were implemented using Torch in BachBot, check out the scripts used to train and define BachBot. These are available on Feynman Liang’s GitHub* site ²

A good starting place in navigating the repository is 1-train.zsh. From there you should be able to find your way to bachbot.py.

Specifically, the essential script that defines the model is LSTM.lua. The script that trains the model is train.lua.

Hyperparameter Optimization

To find the best hyperparameter settings, a grid search was used on the following grid.

Parameter
Number of layers	1	2	3	4
Hidden state dimension	128	256	384	512
Dimension of vector embeddings	16	32	64
Sequence length	64	128	256
Dropout probability	0.0	0.1	0.2	0.3	0.4	0.5

Figure 1: Parameter grid used in BachBot* grid search ¹.

A grid search is an exhaustive search over all the possible combinations of parameters. Other suggested hyperparameter optimizations are random search and Bayesian optimization.

The optimal hyperparameter set found by the grid search was: number of layers = 3, hidden state dimension = 256, dimension of vector embeddings = 32, sequence length = 128, and dropout = 0.3.

This model achieved 0.324 cross entropy loss in training, and 0.477 cross entropy loss in validation. Plotting the training curve shows that training converges after 30 iterations (≈28.5 minutes on a single GPU) ¹.

Plotting training and validation losses can also illustrate the effect of each hyperparameter. Of particular interest is dropout probability:

Figure 2: Training curves for various dropout settings¹.

From Figure 2 we can see that dropout indeed prevents overfitting, as although dropout = 0.0 has the lowest training loss, it has the highest validation loss; whereas higher dropout probabilities lead to higher training losses but lower validation losses. The lowest validation loss in BachBot’s case was when the dropout probability was 0.3.

Alternate Evaluation (optional)

For some models, especially for creative applications such as music composition, loss may not be the most appropriate measure of success. Instead, a better measure could be subjective human evaluation.

The goal of the BachBot project was to automatically compose music that is indistinguishable from Bach’s own compositions. To evaluate this, an online survey was conducted. The survey was framed as a challenge to see whether the user could distinguish between BachBot’s and Bach’s compositions.

The results showed that people who took the challenge (759 participants, varying skill levels) could only accurately discriminate between the two samples 59 percent of the time. This is only 9 percent above random guessing! Take The BachBot Challenge yourself!

Adapting the Model to Harmonization

BachBot can now compute P(x_t+1 | x_t, h_t-1), the probability distribution of the possible next notes given the current note and the previous hidden state. This sequential prediction model can then be adapted into one that harmonizes a melody. This adapted harmonization model is required for harmonizing the emotion-modulated melody for the slideshow music project.

In harmonization, a predefined melody is provided (typically the soprano line), and the model must then compose music for the other parts. A greedy best-first search under the constraint that melody notes are fixed is used for this task. Greedy algorithms involve making choices that are locally optimal. Thus, the simple strategy used for harmonization is described as follows:

Let x_t be the tokens in the proposed harmonization. At time step t, if the note is given as the melody, x_t equals the given note. Otherwise x_t is the most likely next note as predicted by the model. The code for this adaptation can be found on Feynman Liang’s GitHub: HarmModel.lua, harmonize.lua.

Below is an example of BachBot’s harmonization of Twinkle, Twinkle, Little Star, using the above strategy.

Figure 3: The BachBot* harmonization of Twinkle, Twinkle, Little Star (in the soprano line). Alto, tenor and bass parts were filled in by BachBot ¹.

In this example, the melody to Twinkle, Twinkle, Little Star is provided in the soprano line. The alto, tenor and bass parts are then filled by BachBot using the harmonization strategy. This is what that sounds like.

Despite the BachBot’s decent performance on this task, there are certain limitations to this model. Specifically, it doesn’t look ahead in the melody and uses only the current melody note and past context to generate notes. When people harmonize melodies, they can examine the whole melody, which makes it easier to infer appropriate harmonizations. The fact that this model can’t do that may result in surprises from future constraints, which cause mistakes. To solve this, a beam search may be used.

Beam searches explore multiple trajectories. For example, instead of only taking the most probable note (what is currently being done) it may take the four or five most probable, and explore each of these notes. Exploring multiple options can help the model recover from mistakes. Beam searches are commonly used in natural language processing applications to generate sentences.

Emotion-modulated melodies can now be put through this harmonization model to be completed. The way this is done is detailed in the final article describing application deployment.

Conclusion

This article used BachBot as a case study in discussing the considerations of building a creative deep learning model. Specifically, this article discussed techniques that improve generalization, and accelerate training for RNN/LSTM models, hyperparameter optimization, evaluation of the model, and ways to adapt a sequence prediction model for completion (or generation).

All of the parts of the Slideshow Music project are now complete. The final articles in this series will discuss how these parts are put together to form the final product. That is, they will discuss how the emotion-modulated melodies may be provided as an input to BachBot’s harmonization model, and deployment of the completed application.

References and Links

1. Liang, F., Gotham, M., Johnson, M., & Shotton, J. (2017). Automatic stylistic composition of Bach chorales with deep LSTM. 18th International Society for Music Information Retrieval Conference, Suzhou, China.

2. Liang, F. (2016) BachBot [Computer program]. Available at https://github.com/feynmanliang/bachbot/blob/master/scripts/

3. Collobert, R., Farabet, C., Kavukcuoglu, K., & Chintala, S. (2017). Torch (Version 7). Retrieved from http://torch.ch/.

Find more helpful resources at the Intel® Nervana™ AI Academy.

↧

Using Intel® Math Kernel Library with Arduino Create

November 1, 2017, 9:20 am

Latest and popular articles on Intel Technologies

≫ Next: Announcing Arduino Create* support for Intel®-based platforms and the UP Squared* Grove IoT Development Kit

≪ Previous: Hands-On AI Part 23: Deep Learning for Music Generation 2—Implementing the Model

Overview

This article presents use cases and provides examples which make use of the Intel® Math Kernel Library (Intel® MKL). Arduino Create*, a cloud-based IDE and UP2*, a single board computer based on Intel’s Apollo Lake platform, are used for the examples. The use cases are intended to expose the user to the capabilities provided by the Intel® MKL and the examples provide short code samples to implement the use cases.

Note: any hardware device supporting Intel® MKL requirements can used as the target hardware.

Requirements

Hardware Requirements

UP2 (recommended) or
Hardware device containing an Intel processor with SSE2 (Streaming SIMD Extensions 2) support

See Intel® MKL requirements for supported hardware.

Software Requirements

Download the Intel® MKL
Create an Arduino Create* account

About the Intel® Math Kernel Library (Intel® MKL)

Software applications that require mathematical functions such as a matrix multiplication can achieve an increase in performance (faster response times) by leveraging the Intel® MKL. See https://software.intel.com/en-us/mkl/features/benchmarks for a complete review and comparison of Intel® MKL benchmarks for Intel® Core™ i7, Intel® Xeon® and Intel® Xeon Phi™ processors.

With an offering of hundreds of functions, developers can choose which areas make the most sense to optimize based on the requirements of the application. Figure 1 below shows the components of the Intel® MKL, two of which are highlighted and are the focus of this article.

Figure 1 – Components offered in the Intel® MKL

Component	Name	Description
BLAS	Basic Linear Algebra Subroutines	GEMM - Generic Matrix Multipication
LAPACK	Linear Algebra Package	SVD - Single Value Decomposition

Table 1 - MKL Components referenced in this article

The performance improvements gained by using the Intel® MKL can only be achieved when using IA-32 or Intel® 64 architecture that supports at a minimum the SSE2 instruction set, which includes most CPUs released with the Pentium® processor 4.

The Intel® MKL basic requirements can be found here: https://software.intel.com/en-us/articles/intel-mkl-111-system-requirements

Intel® MKL Applications for Edge Devices

This section will provide an overview of three use cases related to data compression and image manipulation. And the code samples section provides the code required to implement these examples.

Being closest to the origin of data collected in the real world, Internet of Things (IoT) devices are an ideal response time optimization candidate. Two ways to improve this response time on an edge device is to 1) execute the code on the device itself or 2) reduce the size of the data that needs to be transferred for analysis.

Making a decision on the device itself allows for the quickest response time because it avoids forwarding data to another service or the cloud for analysis and awaiting a response. Alternatively, there are scenarios where it is not possible to compute at the edge, in which the data must be forwarded to a separate system for analysis. Reducing the size and compressing the data helps to improve performance response time in those situations. Finally, filtering and scientific functions are required at the edge more than ever, giving developers the ability to quickly compute, transform and derive are critical for near real-time responses.

Two example areas that can benefit from a performance boost at the edge include:

Image manipulation and analysis
Data compression

Because images can be represented in memory in a matrix form, they can be manipulated and analyzed using familiar and powerful linear algebra algorithms. A few types of transformations and their real world applications include: Scaling (Matrix Multiplication), Translations (Matrix Addition and Transpose), and Compression using Singular Value Decomposition.

The Intel® Developer Zone offers many resources targeted to assist developers with the Intel® MKL, including a nice overview of matrix fundamentals.

Use Case 1: Noise Filtering Using Matrix Addition

Using Matrix addition and subtraction, the elements of binary matrices can be modified in a manner to shift and filter individual elements. In the field this can be used to reduce contrast, change colors, or shift pixel values entirely. Below is an example of adding values to a binary grayscale image that adds 100,000 to certain elements that had a grayscale value below 100,000.

grayscale value below 100,000

Figure 4 – Before and after effects of adding 100,000 to each pixel that has a value < 100,000

Use Case 2: Image flipping Using Matrix Transposing

The values in a matrix can be transposed, or flipped around the diagonal to change orientation of an image. Here is an example output of a grayscale bitmap 200x200 image transform. Look closely and you will notice it is not simply rotated, but instead transformed around the diagonal.

Image flipping using Matrix

Figure 5– Before and after Matrix Transform of 24 bit grayscale image

Use Case 3: Image Size Reduction Using Singular Value Decomposition

A simple 24 bit grayscale bitmap image that is 500x500 will take up just over 40K of memory. Representing one pixel takes up 4 bytes = 200*200*4 = 40000, this is not including the one kilobyte header.

The Singular Value Decomposition Theorem states that all MxN matrices can be factored into three smaller matrices, which when recombined can provide a representation of the original matrix that is often acceptable for a suited purpose. Luckily the MKL has a built in routine for computing the SVD of a matrix, however here is a quick overview of how it works.

Any given matrix A that represents an image of MxN pixels (where m=rows and n=columns), can be represented by the product of three matrices: columns of left singular vectors (U), rows of right singular vectors (V^T), and diagonals of real singular values.

To calculate the SVD, the three matrices are constructed through derivation of eigenvectors of

AA^T and A^TA to form the columns of V and U respectively. Similarly, the singular values of S are computed by taking the square roots of the eigenvalues of AA^T or A^TA. Once calculated, these three matrices can be sampled by taking a set of rows from each that is less than m, and multiply them together to obtain a very close approximation of the original value. Figure 5.2 below shows SVD usage in the real world to reduce an image size by up to 90%.

Image Compression Comparison
Figure 5.2– Before and after Matrix compression of 24 bit grayscale image from 40K to 4K

Integration with Arduino Create

Arduino Create is a cloud-based IDE for programming IoT devices. Complete with a management dashboard, users can now remotely manage and program their IoT devices, pushing code effortlessly as if the devices were directly connected. To get started, visit http://create-intel.arduino.cc/ and create an account.

The examples in this section use the UP2 hardware running Ubuntu 16.04 operating system with Intel® MKL 2017, however the overview will work for any compatible stack that is supported.

Figure 6 – Screenshot of Arduino Create devices dashboard

Installation

Before getting started, the Intel® MKL libraries will need to be installed on the board and configured properly.

To install the Intel® MKL on Ubuntu, visit http://software.intel.com/mkl and follow the download instructions. Registration is required prior to downloading, however it is complete free. Select the Intel® Performance Libraries for the operating system you will be working with, along with the latest version. Next, click on the Intel® MKL link to start the download.

Figure 7– Screenshot of Intel’s MKL download options

After downloading, initiate the install by unpacking the archive, running the install, and setting environment variables. The default installation folder is /opt/intel/mkl/

tar –zxvf [Name of MKL file].tgz cd [unpacked folder] sudo ./install.sh OR sudo ./install_GUI.sh (if running on a desktop GUI) cd [install folder] (Default is /opt/intel/mkl/bin/) sudo ./mklvars.sh intel64 (This script will set environment variables for your platform)
Figure 8– Installation instructions after downloading Intel® MKL for Linux

The Intel® MKL comes with many examples that help developers get up and running as fast as possible. Explore the /examples/ subfolder under the default installation folder (typically /opt/intel/mkl) and modify code to suit your specific requirements. The code examples outlined in this document have been taken from the Intel® MKL default examples and migrated to work with the Arduino style program structure. Migration to Arduino from C simply means ensuring that the standard setup() and loop() functions are available, and also that the <ArduinoMKL.h> is referenced, which is a header wrapper for the Intel® MKL libraries. Note that any libraries referenced in code must be locatable on the Arduino cloud during compile time. Table 2 below provides an example of the common actions taken to migrate example code from the Intel® MKL into Arduino*.

Intel® MKL Example C source	Arduino Migrated source
#include “MKLSpecificHeader.h” int main(argc, arv){ func();} func1(){//poll or //setupCallbacks}	#include “ArduinoMKL.h” setup() { func(); } loop() { //poll or //setupCallbacks /ordonothing }

Intel® MKL Example C source

Arduino Migrated source

#include “MKLSpecificHeader.h”

int main(argc, arv){ func();}
func1(){//poll or //setupCallbacks}

#include “ArduinoMKL.h”

setup() { func(); }
loop() { //poll or //setupCallbacks /ordonothing }

Table 2 - Example code structure migration required for MKL C to Arduino

Verify Installation

Now that the Intel® MKL is installed, return to the Arduino Create Cloud IDE (URL) and run a sample application that leverages the MKL. On navigation menu, select libraries and then search for ‘MKL’. Open and explore the mkl-lab-solution example which demonstrates simple matrix multiplication using DGEMM – Double Precision General Matrix Multiplication.

Figure 9– Screenshot of Arduino Create Libraries with search requested for MKL

Next, open the Serial Monitor window by clicking Monitor on the navigation menu on the left. This will bring up the familiar debugging window available in the standard Arduino IDE that allows interfacing with the program as well as printing out debug statements. The Arduino Create IDE should now show both the source code and the Serial Monitor window as shown in Figure 10.

Figure 10 – Arduino Create Editor and Monitor window

At this stage, the program can either be Verified or Uploaded directly to the board. Figure 11 shows an example of the mkl-lab-solution ready for upload to a device named ‘Up2Ubuntu’. During this process, the MKL is actually compiled in the cloud as part of the verification process. The MKL libraries are dynamically linked and referenced when executed on the target device.

Figure 11– Arduino Create Upload sketch to device

As shown in Figure 12, the bottom pane will show the compiler output concluded by a results summary that indicates program size and percentage of storage space used.

Figure 12– Build output from Arduino Create shows the Process ID number (PID)

By logging into the target platform, the process ID can verified using ps –A and even monitored by running top –p2001

Figure 13– output of ps –A shows the matching processID is indeed executing

In the Arduino Create IDE, notice the monitor window is requesting size of the matrices to multiply. Using a value less than seven will show the output of the matrix multiplication, allowing you to manually verify the results if desired. Explore the code and try out different values. Figure 14 shows the output execution of the sample lab solution.

Figure 14– Output of mkl-lab-solution with a matrix less than 7

Code Samples

BLAS

Example 1: Double Precision General Matrix Multiplication (DGEMM)

Now that we have the basic example working, the code can be modified to examine the performance differences when using standard matrix multiplication and compare the results against the MKL’s DGEMM. We will refactor the example code into a few functions to help with readability, implement a very basic CMM (Classic Matrix Multiplication) algorithm, and provide a testing interface to vary matrix size and number of runs. The code snippet in Figure 18 provides a general guideline to test out the performance differences with redundant code from the mkl-lab-solution omitted.

Intel’s Software Developer Zone offers many resources targeted to assist developers with the MKL, including a nice overview of matrix fundamentals:

https://software.intel.com/en-us/mkl-developer-reference-c-matrix-fundamentals

The specific matrix multiplication routine leveraged in this example is a can be reference here:

https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm

void cblas_dgemm (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, constCBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const double alpha,const double *a, const MKL_INT lda, const double *b, const MKL_INT ldb, const double beta, double*c, const MKL_INT ldc);

Figure 15– MKL BLAS GEMM Routine Signature Definition

… / Includes

void setup()
{
// Maximize the number of threads
	max_threads = mkl_get_max_threads();
	printf (" Requesting Intel(R) MKL to use %i thread(s) \n\n", max_threads);
	mkl_set_num_threads(max_threads);

…
	printf("\n\nEnter matrix size OR -1 to exit");
	scanf("%d",&N);
	MM_Standard();
	MM_Optimized();

}

void Dgemm_multiply(double* a,double*  b,double*  c, int N)
{
  double alpha = 1.0, beta = 0.;
  int incx = 1;
  int incy = N;
  cblas_dgemm(CblasRowMajor,CblasNoTrans,CblasNoTrans,N,N,N,alpha,b,N,a,N,beta,c,N);
}

void MM_Optimized()
{
  start = clock();
  Dgemm_multiply(a,b,c,N);
  stop = clock();
  time_Optimized = (double)(stop - start) / CLOCKS_PER_SEC;
}
void MM_Standard(){
  int row, col, i=0, k=0;
  start = clock();
  for (row = 0; row < N; row++){
  	for (col = 0; col < N; col++){
      	    for(k=0; k < N; k++){
          	       c[N*row+col] += a[N*row+k] * b[N*k+col];
      	    }
  	}
  }
  stop = clock();
  time_Manual = (double)(stop - start) / CLOCKS_PER_SEC;
}

void loop() {
   exit(0);
}

Figure 18 - Sample code snippets used to compare CMM and MKL DGEMM

Example 2: Matrix Addition

Adding two matrices together can be accomplished using the omatadd() routine. Matrix addition can be applied to images to incorporate interesting effects as shown in the introductory section. The example code below will add the matrices ‘Matrix_Input’ and ‘Matrix_Fade’ together, and store the output in an array called ‘Matrix_Out’.

https://software.intel.com/en-us/mkl-developer-reference-c-mkl-omatadd

void mkl_domatadd (char ordering, char transa, char transb, size_t m, size_t n, const doublealpha, const double * A, size_t lda, const double beta, const double * B, size_t ldb, double * C,size_t ldc);

Figure 19 - MKL BLAS Matrix Addition Routine Signature Definition

// Pseudocode for adding matrices

…
mkl_domatadd ('R', 'N', 'N', height, width, 1.0, Matrix_Input, height, 1, Matrix_Fade, height, Matrix_Out, height);

Figure 20– Code example for Matrix Addition

Example 3: Matrix Transpose

Another example usage of the MKL BLAS is for transposing a matrix of data. Transposing a matrix is the operation of converting the row values to column values, an operation that is foundational to other more complex linear algebra theorems. Transposing a matrix using the MKL is bundled in a matrix copy routine, where you can also copy matrices while at the same time electing to transform all or part of the matrix. https://software.intel.com/en-us/mkl-developer-reference-c-mkl-imatcopy

void mkl_simatcopy (const char ordering, const char trans, size_t rows, size_t cols, const floatalpha, float * AB, size_t lda, size_t ldb);

Figure 21 - MKL BLAS Matrix Copy Signature Definition

Example of using mkl_simatcopy transposition

Source matrix:
----------
1  1  1  1  1  1  1  1  1  1
0  1  0  0  0  0  0  0  0  0
0  0  1  0  0  0  0  0  0  0
0  0  0  1  0  0  0  0  0  0
0  0  0  0  1  0  0  0  0  0
0  0  0  0  0  1  0  0  0  0
0  0  0  0  0  0  1  0  0  0
0  0  0  0  0  0  0  1  0  0
0  0  0  0  0  0  0  0  1  0
0  0  0  0  0  0  0  0  0  1

-----------
Transposed matrix:
1  0  0  0  0  0  0  0  0  0
1  1  0  0  0  0  0  0  0  0
1  0  1  0  0  0  0  0  0  0
1  0  0  1  0  0  0  0  0  0
1  0  0  0  1  0  0  0  0  0
1  0  0  0  0  1  0  0  0  0
1  0  0  0  0  0  1  0  0  0
1  0  0  0  0  0  0  1  0  0
1  0  0  0  0  0  0  0  1  0
1  0  0  0  0  0  0  0  0  1
[Figure 22 – Output of Transposed matrix example]

  size_t n=10, m=10; /* rows, cols of source matrix */
  float src[]= {
 	1,1,1,1,1,1,1,1,1,1,
 	0,1,0,0,0,0,0,0,0,0,
 	0,0,1,0,0,0,0,0,0,0,
 	0,0,0,1,0,0,0,0,0,0,
 	0,0,0,0,1,0,0,0,0,0,
 	0,0,0,0,0,1,0,0,0,0,
 	0,0,0,0,0,0,1,0,0,0,
 	0,0,0,0,0,0,0,1,0,0,
 	0,0,0,0,0,0,0,0,1,0,
 	0,0,0,0,0,0,0,0,0,1
  };

  printf("\nExample of using mkl_simatcopy transposition\n\n");

  printf("Source matrix:\n----------\n");
  print_matrix(n, m, 's', src);

  //Copy matrix and transpose using Row-major order
  mkl_simatcopy('R' /* row-major ordering */,'T' /* A will be transposed */,
            	m   /* rows */,
            	n   /* cols */,
            	1.  /* scales the input matrix */,
            	src /* source matrix */,
            	m   /* src_lda */,
            	n   /* dst_lda */);

  printf("\n-----------\nTransposed matrix:\n");
  print_matrix(n, m, 's',src);

LAPACK

Example 4: SVD for compression

A final use-case worth mentioning is related to data compression. When dealing with large matrices at the edge, it can be very beneficial to employ a size reduction on a matrix of data. Large matrices can take up a lot of memory, which is a concern for local storage and during transport if the data needs to be sent to another location using lower bandwidth mediums. Using a popular Linear Algebra theorem, Singular Value Decomposition, a matrix can be significantly reduced in size to help solves these problems at the edge.

The MKL has built in support for computing the SVD of a matrices of both real and complex numbers, and includes example source code in MKLHOME/examples/lapacke/ folder. As all examples in this article, the code can be migrated from the C examples deployed with MKL directly to Arduino through migration steps outlined in Table 2.

The specific routine leveraged in this example can be reference here: https://software.intel.com/en-us/node/521150

lapack_int LAPACKE_sgesvd( int matrix_layout, char jobu, char jobvt, lapack_int m, lapack_intn, float* a, lapack_int lda, float* s, float* u, lapack_int ldu, float* vt, lapack_int ldvt,float* superb );

Figure 24 - MKL LAPACK General Singular Value Decomposition Signature Definition

Figure 25– Arduino Create Monitor SVD output

void setup() {
…
          	info = LAPACKE_dgesvd( LAPACK_ROW_MAJOR, 'A', 'A', m, n, a, lda,
                                       	s, u, ldu, vt, ldvt, superb );
          	/* Check for convergence */
          	if( info > 0 ) {
                         	printf( "The algorithm computing SVD failed to converge.\n" );
                         	exit( 1 );
          	}
          	/* Print singular values */
          	print_matrix( "Singular values", 1, n, s, 1 );
          	/* Print left singular vectors */
          	print_matrix( "Left singular vectors (stored columnwise)", m, n, u, ldu );
          	/* Print right singular vectors */
          	print_matrix( "Right singular vectors (stored rowwise)", n, n, vt, ldvt );
….

Figure 26 – Code snippet for calling the dgesvd routine

Conclusion

The Math Kernel Library offered by Intel provides highly optimized math functions and algorithms that are designed to work only with Intel hardware. For applications that utilize complex math functions such as matrix algebra or Singular Value Decomposition, and require faster response times than typical software optimized programming can provide, the MKL can provide much quicker response times. Whether running in the cloud or at the Edge, Intel offers hardware designed to provide the optimizations demanded by math intensive, scientific applications.

Appendix: Code Vectorization

A core capability that makes the Intel® MKL enhancements possible is produced by the compiler optimizations that implement code vectorization through SSE optimizations. Code vectorization is a way of ensuring that at compile time, each computer operation contains both data and instructions, leveraging the SSE registers in a parallel fashion. SSE (Streaming Single Instruction/Multiple Data Extension) is an architectural implementation of code vectorization. Introduced with the release of the Intel® Pentium® 3 processor in 1999, SSE included an array of eight 128 bit dedicated floating point registers and 70 new operations, and enhancing performance for operations that are replicated between different sets of data.

Another feature leveraged by Intel® MKL is Advanced Vector Extensions (AVX). It is an additional instruction set that is included in Intel® Core™ microarchitecture, however is not available with Pentium® or Celeron® processors.

How to Determine if Your System Supports Intel® MKL

To determine hardware capabilities of a particular CPU, take a look at /proc/cpuinfo

cat /proc/cpuinfo | grep ‘avx’ cat /proc/cpuinfo | grep ‘avx2’ cat /proc/cpuinfo | grep ‘sse’
Figure 2– Unix commands to determine processor capabilities

Developers interested in calling the SSE directly should check out the Intel® Intrinsics Guide. It offers C APIs for SSE and SSE2, giving a developer direct access to SIMD features without requiring experience with assembly language.

Note: this product is completely separate from the Intel® MKL, however it is worth mentioning while on the discussion of SSE.

Intel’s Link Line Advisor for Building Linker Flags

Intel’s Link Line Advisor is a web-based tool to help developers quickly construct linker flags (for linking libraries) that meet their platform requirements. Your platform requirements are the inputs to the advisor (for example, 64 bit integer interface layer) and the output is a link line and compiler options.

Figure 3– Screenshot of Intel’s Link Line Advisor that helps to build linker flags

About the Author

Matt Chandler is a senior software and applications engineer with Intel since 2004. He is currently working on scale enabling projects for Internet of Things including software vendor support for smart buildings, device security, and retail digital signage vertical segments.

References

Intel® Math Kernel Library 2017 Install Guide

Intel® Math Kernel Library Cookbook

Intel® Math Kernel Library In-Depth Training

Using Intel® Math Kernel Library with Java

↧

Announcing Arduino Create* support for Intel®-based platforms and the UP Squared* Grove IoT Development Kit

November 2, 2017, 8:59 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® Xeon® Scalable Processors Deliver Performance Boosts for IT and Developers

≪ Previous: Using Intel® Math Kernel Library with Arduino Create

We are excited to announce the release of Arduino Create* support for commercial and production grade Intel®- based platforms for IoT and the UP Squared* Grove* IoT Development Kit !

Arduino Create* Support of Intel®-based Platforms for IoT

We worked closely with Arduino* to bring the “easy to use” rapid prototyping capabilities to professional IoT developers using Intel®-based platforms to develop and deploy industry use cases. Our goal is to provide a consistent and simple software experience supporting a developer journey from prototype to product to deployment. With a combination of Intel® architecture support, Arduino Create Platform, Ubuntu* and Wind River Pulsar* Linux* OS support, you now have access to an even more expansive set of tools, libraries, and code samples available for IoT development across verticals. Arduino Create now supports a very clear path to commercial deployment via product export to Intel® System Studio tool suite for solution optimization and debug.

UP Squared* Grove IoT Development Kit

The UP Squared* Grove* IoT Development Kit was a collaboration between AAEON*, Canonical*, SEEED*, Arduino* and Intel. It is a high performance kit that provides a clear path to production, simple set up and configuration with pre-installed Ubuntu OS, expanded I/O to help with rapid prototyping, and a means to incorporate complex and advanced libraries in an intuitive fashion. All bundled into one package along with prototyping sensors with a clear path to industrial grade sensors for commercial deployments.

Here’s some highlights:

Reduce set up time with native integration of UP Squared Grove Development Kit with Arduino Create
Pre-installed custom Ubuntu* Server 16.04 OS on the UP Squared Grove Development Kit
Simple getting started experience in Arduino Create for Intel®-based IoT platforms running Ubuntu on Intel® Atom, Intel® Core, or Intel® Xeon processors.
Integrated libraries and SDKs such as UPM sensor libraries supporting over 400+ sensors, OpenCV, Intel® Math Kernel Library, Amazon Web Services (AWS)*, Microsoft Azure*, and more
Supports the ability to run multiple sketches / programs at the same time
Export your sketch to a CMake project providing an easy development bridge to Intel® System Studio 2018
Integrates mraa, the hardware abstraction layer by Intel, into the Arduino core libraries enabling support for all Intel® platforms
Coordinated with key partners to enable a robust path to commercial scalable deployments:
- Intel offers a range of higher end processor family of products to migrate as the compute needs evolve with commercial deployments and offers an advanced software tool suite with Intel® System Studio for advanced optimization and debug tools
- AAEON* offers reduced lead times from development to mass production with UP Squared boards with complete customization services for your deployment needs
- Canonical* offers professional services to support through the solution deployment such as legal, image customization, factory process, training, and security
- SEEED* offers a simple migration path to industrial-grade sensor with customization services for mass production and scale
- Arduino* offers export feature to a CMake project for application optimization and debugging

What Makes this Combination of Hardware and Software so Powerful?

Use the UP Squared Grove Development Kit from prototyping to mass production
Access the power of Intel® architecture through simple device and project setup
Development made easy on the familiar Arduino environment with integrated examples and built-in libraries
Export to CMake project and bridge to Intel® System Studio for access to optimizations, analyzers and debug tools

Use the UP Squared* Grove Development Kit from Prototyping to Mass Production

This developer kit is ideal for building complex IoT deployments through rapid prototyping. It includes a single-board computer (the UP Squared board) pre-configured with Ubuntu 16.04 OS, and comes bundled with several common prototyping IoT sensors, including an LCD, rotary angle sensor, light sensor, button, temperature and humidity sensor, and an LED. The UP Squared Grove Development Kit can be deployed with a 3G or a M.2 2230 Wi-Fi kit and industrial grade chassis to enable production usages. The kit is available for purchase on up-shop.org.

Access the Power of Intel® Architecture Through Simple Device and Project Setup

You can now easily get started building IoT solutions that demand high performance and compute using commercial grade hardware at the edge as easy as you would a simple ‘maker’ project at home. It can be done in a matter of minutes, and supports Linux class target devices for commercial solutions with the ability to manage your sketches remotely. The customized flow for the UP Squared Grove Development kit makes the setup even easier.

Development Made Easy on Familiar Arduino* Environment with Integrated Examples and Built-in Libraries

Arduino Create is a high performance, easy-to-use web based tool that enables you to develop your first IoT application in minutes thanks to the combination of a simple installation process, code sharing functionality, and the Arduino programming libraries. It includes examples specifically built for the UP Squared Grove IoT Dev Kit, as well as libraries such as OpenCV and Intel® Math Kernel Library (MKL).

Export to CMake Project and Bridge to Intel® System Studio for Access to Optimizations, Analyzers and Debug Tools

A new feature enables you to export your sketch as CMake project making it easy to bridge to more advanced tools such as Intel® System Studio 2018. Why export or bridge from Arduino Create? you now have access to :

Power and performance optimization capabilities
Advanced debug and trace tools
Optimized compilers and highly tuned libraries

We love to hear from you. Any and all input most welcome!

Learn More

For more information on the UP Squared Grove IoT Development Kit go to our new page

To learn more about Arduino Create with Intel®-based platforms for IoT check out this link

Get Started

Click here to purchase a kit
Try out Arduino Create

↧

Intel® Xeon® Scalable Processors Deliver Performance Boosts for IT and Developers

November 2, 2017, 11:13 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® Parallel Computing Center at Freie Universität Berlin

≪ Previous: Announcing Arduino Create* support for Intel®-based platforms and the UP Squared* Grove IoT Development Kit

Ongoing improvements to Aerospike* software and Intel® architecture deliver substantial performance gains with the latest generation of the database and Intel® Xeon® Scalable processors.

This year, Intel launched the highly anticipated Intel® Xeon® platform, our highest-performance, most versatile data center platform ever. Intel is constantly increasing the performance of its platforms and this relentless improvement benefits everyone from developers to end-users. Some features of the new scalable platform include more processor cores, increased memory bandwidth, Intel® Advanced Vector Extensions 512 (Intel® AVX-512), and an array of other advancements. Each new generation of Intel platforms deliver new performance levels and capabilities out of the box, but this debut is the biggest in a decade – and offers developers and IT an opportunity to achieve even further performance gains with software optimizations.

Aerospike* Sees Four Times the Performance Increase

One great example is Aerospike, a hybrid memory architecture database. Aerospike has enhanced their software with capabilities like auto-tuning, in order to take advantage of new Intel microarchitectures. The Intel Xeon platform is based on the new Intel® Mesh Architecture design, which provides significant per-core advances and a lower-latency L1/L2 cache hierarchy that directly benefits customers’ analytics efforts with Aerospike’s data-intensive, latency-sensitive workloads. Increased hardware parallelism of up to 28 cores per system socket complements the enhanced core affinity that is built into the latest version of Aerospike Database*, leading to better performance, more efficient workloads, and most importantly faster actionable insights! The Intel Xeon platform also has a redesigned memory subsystem and support for up to 24 DIMMs of DDR4-2667 RAM, allowing database servers the ability to scale up system memory to aid in the handling of very large data sets.

Thanks to these features and Aerospike’s optimizations, Aerospike saw up to a 4X¹ increase in throughput by replacing systems that were a few years old with ones featuring Intel® Xeon® Platinum 8180 processors and Aerospike 3.12.1 software. To hear more about what Aerospike was able to achieve and how this enables them to deliver a better customer experience, check out their blog “Aerospike Redefines “Fast” Thanks to Intel® Xeon® Scalable Processors.” We also published a solution brief, which goes into more detail on how these impressive achievements were possible.

New Platform Equals Business Advantages for IT

The gains that Aerospike saw with Intel Xeon Scalable processors translate into real business benefits like agility and higher performance. Intel Xeon Scalable processors offer an outstanding hardware foundation for organizations that want to provide customers with a great experience by maintaining low latency, scaling almost without limit, protecting their business with mission-critical reliability and uptime, and doing all of this while containing costs with outstanding performance per server. Intel has delivered a platform that provides businesses with all the necessary tools to modernize their data centers and increase their performance, agility, and security. This new platform can help IT managers run and grow their business, respond to competitive threats, and provide better response time to end-users. There simply has never been a better time to refresh your infrastructure to take full advantage of these solution innovations.

What Developers Can Do to Optimize

Aerospike saw significant performance benefits thanks to their ongoing optimizations for Intel-based platforms. Most applications will see compelling performance gains “out of the box,” but for developers that want the guide map to squeeze out every ounce of performance possible, Intel has created a wide array of developer tools to help optimize software for our latest platforms. Visit the Intel® Developer Zone at https://software.intel.com/en-us for all of the tools you need to develop on Intel hardware and software. And for details on how to optimize for the new Intel® Xeon® Scalable processors, visit https://software.intel.com/en-us/articles/intel-xeon-processor-scalable-family-technical-overview.

1: For more information on the performance and system configuration please see #7 at www.intel.com/xeonconfigs.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

↧

Intel® Parallel Computing Center at Freie Universität Berlin

November 3, 2017, 9:07 am

Latest and popular articles on Intel Technologies

≫ Next: Konrad-Zuse-Zentrum für Informationstechnik Berlin (ZIB)

≪ Previous: Intel® Xeon® Scalable Processors Deliver Performance Boosts for IT and Developers

Freie Universität Berlin

Principal Investigators:

Prof. Dr. Knut Reinert holds the chair of the Algorithms in Bioinformatics group in the institute of Bioinformatics. In addition, he is a Max Planck fellow at the Max Planck institute for Molecular Genetics. He and his team focus on the development of novel algorithms and data structures for problems in the analysis of biomedical mass data. Previously, Knut was at Celera Genomics, where he worked on bioinformatics algorithms and software for the Human Genome Project, which assembled the very first human genome.

Description:

New technologies have reduced the cost of sequencing by many orders of magnitude in the last decade. In recent years, next generation sequencing (NGS) data have begun to appear in many applications that are clinically relevant, such as resequencing of cancer patients, disease-gene discovery and diagnostics for rare diseases, microbiome analyses, and gene expression profiling. The management-consulting firm McKinsey currently endorsed NGS as one of the most disruptive technologies that will transform life, business, and the global economy. Its prospective scope of economic impact is broad and possibly changing the biomedical field. It promises to transform how doctors diagnose and treat cancer and other diseases, possibly extending lives. With rapid sequencing and advanced computing power, scientists can systematically test how genetic variations can bring about specific traits and diseases, rather than using trial and error.

Unfortunately, lack of expertise or programming infrastructure often makes it impossible or very time-consuming to develop bioinformatics solutions meeting the growing demand. The analysis of sequencing data is demanding because of the enormous data volume and the need for fast turnaround time, accuracy, reproducibility, and data security. This requires a large variety of expertise: algorithm design, strong implementation skills for analyzing big data on standard hardware and accelerators, statistical knowledge, and specific domain knowledge for each medical problem. Consequentially the development of tools is often fragmented, mainly driven by academic groups and SMEs (Small and Medium Enterprises) with different levels of expertise in the required domains.

We aim to address this problem by enabling academic groups and SMEs to significantly accelerate their time to market for innovative technical solutions in medical diagnostics by providing the open source software development kit (SDK) that enables researchers and software engineers to build efficient, hardware- accelerated, and sustainable tools for the analysis of medical NGS data. In this proposal we will address specifically the tight integration of Intel® Xeon® and Intel® Xeon Phi® processor families to provide fast, well-tested, algorithmic components for medical next generation sequence (NGS) analysis by extending the existing and well-established C++ library SeqAn. Using the library will enable academic groups and SMEs to develop and maintain their own hardware-accelerated, efficient tools for medical NGS analysis at an unprecedented time-scale. To achieve this we plan to fully integrate modern multicore hardware for core data structures such as string indices or pairwise sequence alignment algorithms. This will make modern hardware accelerators available to non-expert programmers. In addition we will add the combination of data parallelism with compute parallelism as a strategy to reduce the computational effort required to process many genomes at once in main memory. This will allow a seamless scale-up of tools that need to process the very large data volumes associated with a large number of individual genomes, a challenge clearly visible for medical applications.

Publications:

Knut Reinert, 11/22/2015, Topic: The SeqAn C++ library for efficient NGS sequence analysis - applications and HPC modernization using generic programming, HPC Dev Conf 2015
Knut Reinert, 10/1/2015, Freie Universität Berlin has been selected as new Intel® Parallel Computing Center (Intel® PCC), Free University Berlin

Related Websites:

Reinert lab website: http://www.reinert-lab.de/
SeqAn website: http://www.seqan.de/

↧

Overview

What is MapReduce?

MapReduce Using PMEM

Design Decisions

Data Structures

Synchronization

Fault Tolerance

Build Instructions

Instructions to Run the Sample

Performance

Summary

About the Author

Resources

Overview

Mobile Robots – What They Need

A Typical Robot Software Stack

Navigation Capabilities - SLAM

Hardware for Robotics

SAWR Basic Mobile Robot

Using an OEM Board for Robotics

ROS Overview

Basic ROS Concepts

ROS Tools

ROS Common Modules for Autonomous Movement

Tips for Building a Custom Robot using the SAWR Stack

Conclusion

About the Author

Optimization Viewmodes

Light Complexity

Lightmap Density

Stationary Light Overlap

Shader Complexity (& Quads)

Quad Overdraw

LOD Coloration

Progressive development

How it works today

Playing the future

The challenges

Meet and greet

The MIXer

The business of a business

Make ‘em laugh

Made for mods

Making games fly

Unannounced…with plans

World at war

A Tutorial Series for Software Developers, Data Scientists, and Data Center Managers

Training Dataset for LSTM Neural Network

Base Melodies

Conclusion

References and Links

Music Data Collection and Exploration

Bach Chorales—Music21* Project1

Exploring the Data

Base Melodies

Conclusion

References and Links

Theory

Practice

Python* and music21* toolkit

Jupyter*

Implementation

Conclusion

References and Links

Music Generation—Thinking About the Problem

Choosing a Model

Preprocessing

Encoding

Conclusion

References and Links

Defining a Model

Loss

Training/Testing

Other Considerations

Parameters

Implementation, Training and Testing

Choosing a Framework

Implementing and Training the Model

Hyperparameter Optimization

Alternate Evaluation (optional)

Bach Chorales—Music21* Project¹