Download WebAssembly article PDF [PDF 263 KB]
Introduction
This first in a series of articles gives a high-level introduction to WebAssembly (wasm). While the language is not fully formed, with many changes to come, this article gives you a rough idea of the current state of wasm. We will provide follow-up articles to provide news of changes as they come along.
Wasm’s goal is to enable more performance for JavaScript*. It defines a new, portable, size- and load-time-efficient file format suitable for compilation to the web. It will use existing web APIs and is destined to become an integral part of Web technology. Although “WebAssembly” contains the word web in it, the language is not only destined uniquely for browsers, but the goal is to also provide the same technology to non-web usages. This opens the door to a lot more coverage and potential traction of wasm.
As with all dynamically typed languages, it is difficult to get anything close to native performance running JavaScript, so the ability to reuse existing native (C/C++) code in web pages was used for many years. Alternatives such as NaCl and PNaCl run in web pages alongside JavaScript. The most practical is asm.js, a JavaScript subset restricted to features that can be compiled into code approaching native performance. None of these alternatives has been universally accepted in web browsers as the solution for performance and code reuse. Wasm is an attempt to fix that.
Wasm is not meant to replace JavaScript but instead to provide a means to attain near-native performance for key parts of an application, web-based or not. For this reason, browser, web, compiler, and general software engineers are working to create the definition of the technology. One of the goals is to define a new platform-independent binary code format for the web.
Near-native performing code on web pages could transform how new features are brought to the web. In native code, new features are often provided by an SDK, providing a native library for access. Web pages are not permitted to access those native libraries for security reasons. Web support for new capabilities often requires complex standardized APIs offered by the web browser to correct related issues; for example, JavaScript libraries are too slow.
With wasm, those standard APIs could be much simpler, and operate at a much lower level, with cached, multithreaded, and SIMD-capable dynamic link libraries of wasm providing functionality not currently possible. For example, instead of complex standard APIs for facial recognition or 3D image construction for use with a 3D camera, a much simpler standardized API could just provide access to the raw 3D data stream, with a wasm module doing the processing that a native SDK library now does. This would allow for downloading and caching of commonly used dynamic link libraries and rapid access to new capabilities from the web, long before the standardization process could complete.
This article provides an introduction to a very fast evolving technology. It will remain relatively high-level as many things are still evolving on the wasm specification.
More details about the high-level goals of wasm can be found here:
https://github.com/WebAssembly/design/blob/master/HighLevelGoals.md
General Overview
Design Versus Spec
The original design documents of wasm can be found in the following repository (repo), which contains these notable files:
- AstSemantics.md provides an overview of the format itself.
- MVP.md: what defines the Minimal Viable Product - requirements for the first iteration of wasm.
- HighLevelGoals.md: the high-level goals of wasm and the use cases it is trying to solve.
In order to precisely define, and also to verify and check the decisions recorded in the design document, the spec repo contains an OCaml interpreter of the wasm language. It also contains a test suite folder that has some initial wasm tests. The test suite ranges from general integer and floating-point calculations to memory operations.
A few other tools in the main wasm github repository use the same test suite to test for regression. For example, both wasm-to-llvm-prototype and wasmint use the same test suite for their regression tests.
Prototypes
The base github page for wasm contains various active projects. We mentioned a few of them in this article and there are many others of note. We recommend that the reader peruse the various projects to see where the wasm community is putting their efforts, but you can divide the repositories in five major groups:
- Design and specification: the “bible” of the wasm definition.
- binaryen: C/C++ to wasm and sometimes more.
- sexpr-wasm-prototype: wasm to a binary format.
- wasm-to-llvm-prototype, wasm-jit-prototype, wasmint: wasm to a runtime/executable code.
- polyfill-prototype-2: JavaScript to wasm.
Many of these repositories are proof-of-concept type repositories, meaning that they try to get something up and running to gather experience and don't necessarily represent the final result. The engineers working on the repositories often are playing around with wasm to determine how everything works. Examples are the various binary formats that are being tested such as polyfill-prototype-2 or the v8 binary format.
WebAssembly: An Initial View
Modules, the Larger Construction Blocks
Wasm defines modules as its highest construct, containing functions and memory allocation requests. A wasm module is a unit of distributable, executable code. Each module has its own separate linear memory space, imports and exports, and code. This could be an executable, a dynamic link library (in future versions of wasm) or code to execute on a web page (used where ECMAScript 6* modules can be used).
Though the current test files in the test suite allow multiple modules defined in the same file, it is currently thought that this will not be the case in the final version. Instead, it is probably going to be common to have a single big module for an entire program. Most C/C++ programs will therefore be translated into a single wasm module.
Functions
Wasm is statically typed with the return, and all parameters are typed. For example, this line from the i32.wast file in the test suite repository shows an addition of two parameters:
(func $add (param $x i32) (param $y i32) (result i32) (i32.add (get_local $x) (get_local $y)))
All are 32-bit integers. The line reads like this:
- Declaration of a function named $add.
- It has two parameters $x and $y, both are 32-bit integers.
- The result is a 32-bit integer.
- The function’s body is a 32-bit addition.
- The left side is the value found in the local variable/parameter $x.
- The right side is the value found in the local variable/parameter $y.
- Since there is no explicit return node, the return is the last instruction of the function, hence this addition.
There will be more information about functions and wasm code later in this article.
Similar to an AST
Wasm is generally defined as an Abstract Syntax Tree (AST), but it also has some control-flow concepts and a definition of local variables allowing the handling of temporary calculations. The text format S-Expression, short for Symbolic Expression, is the current text format used for wasm (although it has been decided that it will not be the final wasm text representation).
However, it works well for explaining wasm in this document. If you ignore the general control flow constructs such as ifs, loops, and blocks, the wasm calculations are in AST format. However, for the following calculation:
(3 * x + x) * (3 * x + x)
You could see the following wasm code:
(i32.mul (i32.add (i32.mul (i32.const 3) (get_local 0)) (get_local 0)) (i32.add (i32.mul (i32.const 3) (get_local 0)) (get_local 0)) )
This means that the wasm-to-native compiler would have to do the common subexpression elimination to ensure good performance. To alleviate this, wasm allows the code to use local variables to store temporary results.
Our example can then become:
(set_local 1 (i32.add (i32.mul (i32.const 3) (get_local 0)) (get_local 0)) ) (i32.mul (get_local 1) (get_local 1) )
There are discussions about where the optimizations should lie:
- Between the original code, for example C/C++, to wasm
- Between wasm and the binary code used for the target architecture in the browser or in the non-web case
Memory
The memory subsystem for wasm is called a linear memory where the module can request a given size of memory starting at address 0. Loading and storing from memory can either use constants for simple examples or code but also variables containing addresses.
For example, this would store the integer 42 at the address location 0:
(i32.store (i32.const 0) (i32.const 42))
Wasm does define sign or zero extension for the memory operations. The operations can also define the alignment of the memory operation in case the architecture could take advantage of it for better code generation. Finally, the operation also has the offset parameter to permit loads of (for example) a structure field.
Wasm-to-LLVM Prototype
The wasm-to-LLVM prototype, the tool that I am contributing to wasm, provides a means to compile wasm code directly into x86 code via the LLVM compiler. Though wasm is intended for use in the web browser, there are plans to use wasm in non-web-based scenarios, as defined in the high-level goals of the language.
The general structure of the wasm-to-LLVM prototype is to parse the wasm test file using the open source tools flex and bison to construct an Intermediate Representation (IR). In this intermediate representation, there is a pass, and there most likely will be more in the future, that walks the IR before performing the code generation via the LLVM compiler.
Figure 1:wasm-to-LLVM prototype.
Figure 1 shows the base structure of the tool: it starts with using as input the temporary textual file format for wasm. This format is known as the S-expression. The S-expression is parsed by a lexical and semantic parser, implemented using the flex and bison tools. Once the S-expression is parsed, an internal Intermediate Representation (IR) is created and a pass system is used to change it slightly before generating the LLVM IR. Once in LLVM IR, the code is then sent to the LLVM optimizer before generating Intel® x86 code.
A First Wasm Example
Here we include one basic example: a simple sum of the values in an array. The example demonstrates how wasm is easy to understand and a few things to keep in mind.
Wasm is intended to be generated by a compiler, and C/C++ is a source language that will be first used to generate wasm. There is not a specific compiler that will transform C/C++ into wasm, although LLVM has already seen some wasm-related work committed to the project. We will likely see other compilers such as GCC and MSVC support the wasm language and environment. While writing wasm by hand will be rare, it is interesting to look at and understand how the language is meant to interact with the browser/OS and the underlying architecture.
Sum of an array
;; Initialization of the two local variables is done as before: ;; local 1 is the sum variable initialized to 0 ;; local 2 is the induction variable and is set to the ;; max element and is decremented per iteration (loop (if_else (i32.eq (get_local 2) (i32.const 0)) (br 1) (block (set_local 1 (i32.add (get_local 1) (i32.load (get_local 2)))) (set_local 2 (i32.sub (get_local 2) (i32.const 4))) ) ) (br 0) )
Note: this is not necessarily the optimal way of doing things. It is a useful example to demonstrate key parts of the wasm language and logic. A future article will explore various constructs that can better mimic the underlying hardware’s instructions.
As the example above shows, a loop is defined by a loop node. Loop nodes can define the start and exit block’s names or be anonymous as it is here. To better understand the code, the loop uses two local variables: Local 1 is the sum variable while walking the array, and local 2 is the induction variable being updated. Local 2 actually represents the pointer to the current cell to be added.
Here is the C-counterpart of the code:
// Initialization is done before // local_1 is the sum variable initialized to 0 // local_2 is the induction variable and is set to // the max element and is decremented per iteration do { if (local_2 == start) { break; } local_1 = local_1 + *local_2; local_2--; } while(1);
The loops in wasm actually work like do-while constructs in C. They also are not implicitly looping back; we need to define an explicit branch at the end of the wasm loop. Further, the “br 0” node at the end says to branch to top of the loop; the 0 represents the level of the loop nest we want to go to from here.
The loop starts with checking if we want to do an extra iteration. If the test is true, we will do a “br 1”, which you might infer is to go out one level of the loop nest. In this case, since there is only one level, we leave the loop.
In the loop, notice that the code actually is using a decrementing pointer to reach the start of the array. In the C-version there is a convenient variable called start, representing the array the code is summing.
In wasm, since the memory layout starts at address 0 and since this is the only array of this overly simplified example, it is arbitrarily determined that the start address of the array would be 0. If we put the array anywhere else, this comparison would look more like the C-version and compares the local variable 2 with a parameter-passed offset.
Notice the difference in handling the induction variable’s update between the C and the wasm versions. In C, the language allows the programmer to simply update the pointer by one and it will transform this into an actual decrement into four later down the line. In wasm, we are already at a very low level, hence the decrement of four is already there.
Finally, as we showed that (get_local 2) is the loop counter, (get_local 1) is the actual sum of the vector. Since we are doing a sum in 32-bit, the operation is using the i32.add and i32.load opcodes. In this example, the vector we are summing is at the beginning of the linear memory region.
Wasm-to-llvm Code Generation
The wasm-to-llvm-prototype generates the following loop code quite easily:
.LBB4_2: movl %edi, %edx addl (%rdx,%rcx), %eax addl $-4, %edi jne .LBB4_2
This is quite a tight loop when you consider the original wasm text format code. If you consider the following equivalent C-version:
for (i = 0; i < n; i++) { sum += tab[i]; }
The GNU* Compiler Collection (GCC), version 4.7.3, produces in the -O2 optimization level, the following code is similar in its logic:
.L11: addl (%rdi,%rdx,4), %eax addq $1, %rdx cmpl %edx, %esi jg .L11
In the wasm case, we have generated a countdown loop and use the subtraction result directly as the condition for the loop jump. In the GCC case, a comparison instruction is required. However, the GCC version does not use an additional move instruction that the wasm with LLVM uses.
In O3, GCC vectorized the loop. What this shows is that the wasm-to-llvm prototype still has a bit of work in order to generate the optimal code (vectorized or not). This will be described in a future article.
Conclusion
This article introduced you to the wasm language and showed a single simple example of a sum of an array. Future articles will dive more into detail about wasm and its different existing tools.
Wasm is a new language and a lot of tools are being developed to help determine its potential, to support it, and to understand what it can and cannot do. There are many elements about wasm that need to be determined, explained, and explored.
About the Author
Jean Christophe Beyler is a software engineer in the Intel Software and Solutions Group (SSG), Systems Technologies & Optimizations (STO), Client Software Optimization (CSO). He focuses on the Android compiler and eco-system but also delves into other performance related and compiler technologies.