This article was taken from a blog posting on IDZ by Leigh Davies at Intel Corp, highlighting work and results completed by Leigh and his colleague Filip Strugar in the new AA technique being referred to as Conservative Morphological Anti-Aliasing. Below is the content of the blog along with the available code project download for your examination
This sample presents a new, image-based, post-processing antialiasing technique referred to as Conservative Morphological Anti-Aliasing and can be downloaded here[SF1] . The technique was originally developed by Filip Strugar at Intel for use in GRID2 by Codemasters®, to offer a high performance alternative to traditional multi sample anti-aliasing (MSAA) while addressing artistic concerns with existing post-processing antialiasing techniques. The sample allows CMAA to be compared with several popular post processing techniques together with hardware MSAA in a real time rendered scene as well as to an existing image. The scene is rendered using a simple HDR technique and includes basic animation to allow the user to compare how the different techniques cope with temporal artifacts in addition to static portions of the image.
Figure 1: CMAA Sample using HDR and animating geometry
MSAA has long been used to reduce aliasing in computer games and significantly improve their visual appearance. Basic MSAA works by running the pixel shader once per pixel but running the coverage and occlusion tests at higher than normal resolution, typically 2x through 8x, and then merging the results together. While significantly faster than super sampling it still represents a significant additional cost compared to no anti-aliasing and is difficult to implement with certain techniques– for example, this sample uses a custom fullscreen pass needed to get correct post HDR tone-mapping MSAA resolve [Humus article in ShaderX6] [6].[SF2]
An alternative to MSAA is to use an image-based post-process anti-aliasing (PPAA), which became practical with GPU ports of Morphological antialiasing (MLAA) [Reshetov 2009] [1] and further developments such as “Enhanced Subpixel Morphological Antialiasing“(SMAA) [2] and NVidia’s “Fast approximate anti-aliasing” (FXAA) [3]. Compared to MSAA, these PPAA techniques are easy to implement and work in scenarios where MSAA does not (such as deferred lighting and other non-geometry based aliasing), but lack adequate sub-pixel accuracy and are less temporally stable. They also cause perceptible blurring of textures and text, since it is difficult for edge-detection algorithms to distinguish between intentional colour discontinuities and unwanted aliasing caused by imperfect rendering.
Currently two of the most popular PPAA algorithms are:
1. SMAA is an algorithm based on MLAA but with a number of innovations and improvements, and with a number of quality/performance presets. It implements advanced pattern recognition and local contrast adaptation, and the more expensive variations use temporal super-sampling to reduce temporal instability and improve quality. The SMAA algorithm version referenced in this document is the latest public code v2.7.
2. FXAA is a much faster effect. However, FXAA has simpler colour discontinuity shape detection, causing substantial (frequently unwanted) image blurring. It also has fairly limited kernel size by default, so it doesn't sufficiently anti-alias longer edge shapes, while increasing the kernel size impacts performance significantly. FXAA algorithm version referenced in this document is v3.8 unless otherwise specified (newest v3.11 was added to the sample in the last minute, in addition to 3.8).[SF3]
In this sample we introduce a new technique called Conservative Morphological Anti-Aliasing (CMAA). CMAA addresses two requirements that are currently not addressed by existing techniques:
1. To run efficiently on low-medium range GPU hardware, such as integrated GPUs, while providing a quality anti-aliasing solution. A budget under 3ms was used as a guide when developing the technique at a resolution of 1600x900 running on a 15watt, 4th Generation Intel® Core™ processor.
2. To be minimally invasive so it can be acceptable as a replacement to 2xMSAA in a wide range of applications, including worst case scenarios such as text, repeating patterns, certain geometries (power lines, mesh fences, foliage), and moving images.
CMAA is positioned between FXAA and SMAA 1x in computation cost (0.9-1.2x the cost of default FXAA 3.11 and 0.45-0.75x the cost of SMAA 1x) on Intel 4th Generation HD Graphics hardware and above[SF4] . Compared to FXAA, CMAA provides significantly better image quality and temporal stability as it correctly handles edge lines up to 64 pixels long and is based on an algorithm that only handles symmetrical discontinuities in order to avoid unwanted blurring (thus being more conservative). When compared to SMAA 1x it will provide less anti-aliasing as it handles fewer shape types but also causes less blurring, shape distortion, and has more temporal stability (is less affected by small frame-to-frame image changes).
CMAA has four basic logical steps (not necessarily matching the order in the implementation):
- Image analysis for colour discontinuities (afterwards stored in a local compressed 'edge' buffer). The method used is not unique to CMAA.
- Extracting locally dominant edges with a small kernel. (Unique variation of existing algorithms).
- Handling of simple shapes.
- Handling of symmetrical long edge shape. (Unique take on the original MLAA shape handling algorithm.)
Step 1: Image analysis for colour discontinuities (edges)
Edge detection is performed by comparing neighboring colours using:
- Sum of per-channel Luma-weighted colour difference in sRGB colour space (default)
- Luminance value calculated from the input in sRGB colour space (faster)
- Weighted Euclidean distance [6] (highest quality, slowest)
An edge (discontinuity) exists if the difference of neighboring pixel values is above a preset threshold (which is determined empirically).
Figure 2: Showing results of a default edge detection algorithm:
dot( abs(colorA.rgb-colorB.rgb), float3(0.2126,0.7152,0.0722)) > fThreshold
Step 2: Locally dominant edge detection (or, non-dominant edge pruning)
This step serves a similar function to “local contrast adaptation” in SMAA and “local contrast
test” in FXAA but with a smaller kernel. For each edge detected in Step 1, colour delta value above threshold (dEc) is compared to that of neighboring 12 edges (dEn):
Figure 3: The edge remains an edge if its dEc > lerp( average(dEn), max(dEn), ldeFactor), where ldeFactor is empirically chosen (defaults to 0.35).
This smaller local adaptation kernel size is somewhat less efficient at increasing effective edge detection range. However, it is more effective at preventing blurring of small shapes (such as text), reducing local shape interference from less noticeable edges, avoiding some of the pitfalls of large kernels (visible kernel-sized transition from un-blurred to blurry), and has better performance.
Step 3: Handling of simple shapes
Edges detected in step 1, and refined in step 2, are used to make assumptions about the shape of the underlying edge before rasterization (virtual shape). For simple shape handling, all pixels are analyzed for existence of 2, 3 and 4 edge aliasing shapes, and colour transfer is applied to match the virtual shape colour coverage and achieve the local anti-aliasing effect (Figure 4). While this colour transfer is not always symmetrical, the amount of shape distortion is minimized to sub-pixel size.
Figure 4: 2-edge, 3-edge and 4-edge shapes; reconstructed virtual shape shown in yellow; black arrows showing anti-aliasing colour blending direction
Step 4: Handling of symmetrical long edge shape.
4a. Each edge-bearing pixel is analyzed for a potential Z-shape, representing the center of the virtual shape rasterization pixel step (which is mostly a triangle edge). Criterion used for this detection is illustrated in Figure 5. Four Z-shape orientations (with 90° difference) are handled.
Figure 5: Z-shape detection criterion is true if edges illustrated blue are present while red ones are not; green arrows show subsequent edge tracing
4b. For each detected Z-shape, the length of the edge to the left/right is determined by tracing the horizontal (for two horizontal Z-shapes) edges on both sides, and stopping if none are present on either side, or a vertical edge is encountered (Figure 6).
4c. The edge length from the previous step is used to reconstruct the location of the virtual shape edge and apply colour transfer (to both sides of the Z-shape) to match coverage that it would have at each pixel. This step overrides any anti-aliasing done in Step 3 on the same pixels.
Figure 6: Long edge (Z-shapes): edge length tracing marked blue, with Z shape at center; reconstructed virtual shape shown in yellow; black arrows showing anti-aliasing colour blending direction
The inherent symmetry of this approach better preserves overall image average colour and shape, ignores borderline cases, and better preserves original shapes while also being more temporally stable. One pixel (or few pixels) changes do not induce drastic colour transfer and shape modification (when compared to SMAA 1X, FXAA 3.8/3.11[SF5] and older MLAA-based techniques).
Figure 7: Typical detection and handling of symmetrical Z shapes (circled in yellow)
Figure 8: All CMAA shapes: original image, edge detection and final anti-aliased image (with/without edges)
The sample UI allows a direct comparison between several anti-aliasing techniques that are selectable from within a drop down menu along with several debug features. All the techniques can be viewed in high detail using a zoom box that can be enabled from the UI and positioned by using a right mouse click. For both CMAA and SAA additional debug information is shown that highlights the actual edges that have been detected by the algorithm; slider allows you to adjust the threshold used for edge detection; In the case of CMAA both the edge threshold and the non-dominant edge removal threshold can be modified.
The effect on performance caused by modifying the threshold can be viewed if the application is run with vsync disabled. GPU performance metrics are displayed in the upper left hand corner of the application showing the overall cost of rendering the scene and the time taken in the post processing anti-aliasing code. When viewing the stats, additional debug information such as the zoombox and the edge view should be disabled as they both lower performance by forcing sub-optimal code paths to be used. When viewing CMAA in the zoombox with ”Show Edges” enabled the zoombox will also animate showing the effects of applying CMAA to the image, this doesn’t affect the rest of the display.
For precise profiling of each technique, “Run benchmark for: …” button can be used to activate automatic multiple frame sampling and comparison, with results (cost delta compared to the base non-AA version) displayed in a message box after the run is finished.[SF6]
Figure 9: Debug information including zoombox and edge detection overlay.
In addition to showing the effect of the various post-processing effects on the real-time scene the application allows a static image to be loaded and used as the source for the effect; the currently supported file format is PNG. A synthetic sample image is provided in the samples media directory (Figure 9). Attempting to run the sample with 2x and 4x MSAA will have no effect as these would normally affect the image source but CMAA, SMAA, FXAA and SAA can all be applied to the image. This feature quickly allows anyone considering using any of the post processing techniques in the sample to load images taken from their own application and experiment with the various threshold parameters.
The following figures show a number of quality and performance comparisons:[SF7]
Figure 10: Performance impact (frames per second) of an older implementation of CMAA and MSAA on a Consumer Ultra-Low Voltage (CULV) i7-4610Y CPU with HD Graphics 4200 GPU, in GRID2 by Codemasters®[SF8]
Figure 11: Cost and scaling of CMAA 1.3 and other post-process anti-aliasing effects measured using the sample from the article, applied 10 screenshots from various games, averaged, on Intel 4th generation CPUs (HD 5000 and GD5200 graphics) and AMD R9-290X, using different resolutions for different GPUs[SF9]
Figure 12: Quality comparison for text and image anti-aliasing. CMAA 1.3 manages high quality anti-aliasing of the image while preserving text and without over blurring the geometry.
Figure 13: Quality comparison for synthetic shapes and game scenes with high frequency textures. CMAA preserves original high frequency texture data better than FXAA 3.11 and SMAA 1x, while still applying adequate anti-aliasing (although below the quality level of SMAA 1x).
Figure 14: Quality comparison in a game 3D scene. CMAA preserves most of the original high frequency texture data and original geometry shapes, while still applying adequate anti-aliasing.[SF10]
Figure 15: Impact of various techniques on GUI elements. Any post-process AA should always be applied before GUI to avoid unwanted blurring, but there are cases when this is unavoidable (such as in a driver implementation or if the GUI is part of the 3D scene)[SF11]
References:
[1] MLAA, Reshetov, A. (2009). Morphological antialiasing. In HPG’09: Proceedings of the
Conference on High Performance Graphics 2009, pages 109–116, New York,NY, USA. ACM
[2] SMAA, Jorge Jimenez and Jose I. Echevarria and Tiago Sousa and Diego Gutierrez 2012,
JIMENEZ2012_SMAA,
"SMAA: Enhanced Morphological Antialiasing", Computer Graphics Forum (Proc.
EUROGRAPHICS 2012)
[3] NVidia "Fast approximate anti-aliasing" (FXAA), Timothy Lottes (2011),
http://developer.download.nvidia.com/assets/gamedev/files/sdk/11/FXAA_WhitePaper.pdf
[4] Venceslas Biri, Adrien Herubel, and Stephane Deverly. 2010. Practical morphological
antialiasing on the GPU. In ACM SIGGRAPH 2010 Talks (SIGGRAPH '10). ACM, New York, NY,
USA, , Article 45 , 1 pages. DOI=10.1145/1837026.1837085
http://doi.acm.org/10.1145/1837026.1837085 http://igm.univ-mlv.fr/~biri/mlaa-gpu/MLAAGPU.pdf
[6] "Post-tonemapping resolve for high quality HDR antialiasing in D3D10" in ShaderX6[SF12]
[SF1]Need to update for new version 1.3
[SF2]This is a new addition to the sample
[SF3]Self-explanatory
[SF4]New optimisations, new numbers
[SF5]Added 3.11
[SF6]New functionality
[SF7]The whole section below was replaced by new data/images since latest changes made the old ones completely obsolete
[SF8]The reason I point this out is that I think it is important to say that this is a low power part so that no one can draw wrong conclusion from the numbers, which are by the way pretty awesome (for 11.5W part at 1600x900).
[SF9]This is very useful to know for anyone thinking about implementing it into their own engine
[SF10]All of these comparisons are new – old one was obsolete, and I made a number of additional examples.
[SF11]Not sure if this figure is needed
[SF12]Not sure if we want to rearrange numbers in order of appearance in the text