Persistent Memory Programming—Frequently Asked Questions

Introduction

In this FAQ we answer questions people have asked about persistent memory programming. If you don’t find the information you need here, visit the Intel® Developer Zone’s Persistent Memory (PMEM) Programming site. There you’ll find articles, videos, code samples, and links to other resources to support your work in this exciting technology.

About the Persistent Memory Developer Kit

The Persistent Memory Development Kit (PMDK), formerly known as the Non-Volatile Memory Library (NVML), is a growing collection of libraries and tools designed to support development of persistent memory-aware applications. The PMDK project currently supports ten libraries, targeted at various use cases for persistent memory, along with language support for C, C++, Java*, and Python*. It also includes tools like the pmemcheck plug-in for valgrind, and an increasing body of documentation, code examples, tutorials, and blog entries. The libraries are tuned and validated to production quality and are issued with a license that allows their use in both open- and closed-source products. The project continues to grow as new use cases are identified.

Why was NVML renamed the Persistent Memory Developer Kit (PMDK)?

The reason for the name change and how it affects developers is explained in this blog on the pmem.io website: Announcing the Persistent Memory Development Kit. Pmem.io is the official website for the PMDK.

Basic Persistent Memory Concepts

This section contains frequently asked questions about basic persistent memory concepts.

What is persistent memory?

Persistent memory is:

Byte-addressable like memory
Persistent like storage
Cache coherent
Load/store accessible—persistent memory is fast enough to be used directly by the CPU
Direct access—no paging from a storage device to dynamic random-access memory (DRAM)

What is DAX?

Direct Access (DAX) is the mechanism that enables direct access to files stored in persistent memory without the need to copy the data through the page cache.

DAX removes the extra copy by performing reads and writes directly to the storage device. Without DAX support in a file system, the page cache is generally used to buffer reads and writes to files. It is also used to provide the pages that are mapped into user space by a call to mmap. For more info please refer to the article Direct Access for files atThe Linux Kernel Archives

What is the persistent memory-aware file system?

The persistent memory file system can detect whether there is DAX support in the kernel. If so, when an application opens a memory mapped file on this file system, it has direct access to the persistent region. EXT4*, XFS* on Linux* and NTFS* are examples of a persistent memory-aware file system.

In order to get DAX support, the file system must be mounted with the “dax” mount option. For example, on the EXT4 file system you can mount as follows:

mkfs –t ext4 /dev/pmem0
mount –o dax /dev/pmem0 /dev/pmem

What is the difference between file system DAX and device DAX? When would you use one versus the other?

File system DAX is where an application requires file system support for features like checking for file permissions, access control, and so on.

Device DAX is the device-centric equivalent of file system DAX. It allows memory ranges to be allocated and mapped without the need of an intervening file system.

For device DAX, a user’s application can access data directly. Both paths require the support of the operating system or the kernel.

What is the SNIA* NVM Programming Model for persistent memory?

This Storage Networking Industry Association (SNIA) specification defines recommended behavior between various user space and operating system (OS) kernel components supporting non-volatile memory (NVM). This specification does not describe a specific API. Instead, the intent is to enable common NVM behavior to be exposed by multiple operating system-specific interfaces. Some of the techniques used in this model are memory mapped files, DAX, and so on. For more information, refer to the SNIA NVM Programming Model.

How is memory mapping of files different on byte-addressable persistent memory?

Though memory mapping of files is an old technique, it plays an important role in persistent memory programming.

When you memory map a file, you are telling the operating system to map the file into memory and then expose this memory region into the application’s virtual address space.

For an application working with block storage, when a file is memory mapped, this region is treated as byte-addressable storage. What is actually happening behind the scene is page caching. Page caching is where the operating system pauses the application to do I/O, but the underlying storage can only talk in blocks. So, even if a single byte is changed, the entire 4K block is moved to storage, which is not very efficient.

For an application working with persistent memory, when a file is memory mapped, this region is treated as byte-addressable (cache line) storage and page caching is eliminated.

What is atomicity—why is this important when working with persistent memory?

In the context of visibility, atomicity is what other threads can see. In the context of power-fail atomicity, it is the size of the store that cannot be torn by a power failure or other interruption. In x86 CPUs, any store to memory has an atomicity guarantee of only 8 bytes. In a real world application, data updates may consist of chunks larger than 8 bytes. Anything larger than 8 bytes is not power-fail atomic and may result in a torn write.

What is BTT and why do we need it to manage sector atomicity?

The Block Translation Table (BTT) provides atomic sector update semantics for persistent memory devices. It prevents torn writes for applications that rely on sector writes. The BTT manifests itself as a stacked block device, and reserves a portion of the underlying storage for its metadata. It is an indirection table that re-maps all the blocks on the volume, and can be thought of as an extremely simple file system whose sole purpose is to provide atomic sector updates.

I would like to purchase new servers and devices that support persistent memory programming. What are the hardware and software requirements that must be met to support a persistent memory application?

There are platform and software requirements. You’ll need servers based on the future Intel® Xeon® Scalable processor family platform, code-named Cascade Lake. These platforms are targeted to be delivered in 2018. From a software perspective, you’ll need a Linux or Windows* distribution that supports persistent memory file systems like EXT4, XFS (Linux), and NTFS (Windows), and includes persistent memory device drivers.

What are some of the challenges of adapting software for persistent memory?

The main challenges of implementing persistent memory support are:

Ensuring data persistence and consistency
Ability to detect and handle persistent memory errors

What is the importance of flushing in persistent memory programming?

When an application does a write, it is not guaranteed to be persistent until it is in a power failure protected domain. There are two ways of ensuring that writes are in failure protected domain—either flush (+fence) after writing, or extend the failure protected domain to include CPU caches (this is what eADR does). On platforms with eADR there's no need for manual cache line flushing.

What are the options available for flushing CPU caches from user space?

There are three instructions available:

CLFLUSH flushes one cache line at a time. CLFLUSH is a serialized instruction for historical reasons, so if you have to flush a range of persistent memory, looping through it and doing CLFLUSH will mean flushes happen one after another.
CLFLUSHOPT can flush multiple cache lines in parallel. Follow this instruction with SFENCE, since it is weakly ordered. That's the optimization referred to by the OPT in the opcode name. For more details on the instructions, search for the topic CLFLUSH—Flush Cache Line in the document Intel® 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D and 4.
CLWB behaves like CLFLUSHOPT, with the caveat that the cache line may remain valid in the cache.

Also, there is an Advanced Configuration and Power Interface (ACPI) property that tells you if caches flushing is automatic. If not, you’ll need to implement it. To support developers working on generic server platforms, new interfaces are being created to check the ACPI property and enable you to skip the flushes, if possible.

Why do we need transactions when working with persistent memory?

Transactions can be used to update large chunks of data. If the execution of a transaction is interrupted, the implementation of the transactional semantics provides assurance to the application that power-failure atomicity of an annotated section of code is guaranteed.

Why was the PCOMMIT instruction removed?

The reason for removing the PCOMMIT instruction is that today's platforms with non-volatile dual in-line memory modules (NVDIMMs) are expected to support asynchronous DRAM refresh (ADR) to flush the contents from a memory controller when there is a power failure. Since the sole purpose of PCOMMIT was to flush the contents from the memory controller this instruction was deprecated, and references to this instruction have been removed from PMDK. See the Deprecating the PCOMMIT Instruction blog for details.

Can Intel® Transactional Synchronization Extensions (Intel® TSX) instructions be used for persistent memory?

As far as the CPU is concerned, persistent memory is just memory and the CPU can execute any type of instructions on persistent memory. The problem here is atomicity. Intel® TSX is implemented on the cache layer, so any flushes of the cache will naturally HAVE TO abort the transaction, and if we don't flush until after the transaction succeeds, the failure atomicity and visibility atomicity may be out of sync.

Do PMDK libraries use Intel TSX instructions?

No.

Basic Questions on Persistent Memory Programming with PMDK

Why use PMDK?

PMDK is designed to solve persistent memory challenges and facilitate the adoption of persistent memory programming. Use of PMDK is not a requirement for persistent memory programming, but it offers developers well-tested, production-ready libraries and tools in a comprehensive implementation of the SNIA NVM programming model.

Does PMDK work on non-Intel NVDIMMs?

Yes.

Is the use of PMDK required to access Intel® persistent memory NVDIMMs?

PMDK is not a requirement but a convenience for adopting persistent memory programming. You can either use PMDK libraries as binaries or you can choose to reference the code for the libraries if you are implementing persistent memory access code from scratch.

What is the difference between SPDK and PMDK?

PMDK is designed and optimized for byte-addressable persistent memory. These libraries can be used with NVDIMM-Ns in addition to Intel persistent memory NVDIMMs powered by 3D XPoint™ storage media.

The Storage Performance Development Kit (SPDK) is a set of libraries for writing high-performance storage applications that use block IO.

The difference is that PMDK is focused on persistent memory and SPDK is focused on storage, but the two sets of libraries work fine together if you happen to need them both at the same time.

What language bindings are provided for PMDK?

All the libraries are implemented in C, and we provide custom bindings for libpmemobj in C++, Java* and Python*. At this time, Java and Python bindings are works-in-progress.

I am not using transactions. Is there a library I can use from PMDK to access persistent memory?

Yes. libpmem is a simple library that detects the types of flush instructions supported by the CPU, and uses the best instructions for the platform to create performance-tuned routines for copying ranges of persistent memory.

Is there a library that supports transactions?

Yes. There are three: libpmemobj, libpmemblk, and libpmemlog.

libpmemobj provides a transactional object store, providing memory allocation, transactions, and general facilities for persistent memory programming.
libpmemlog provides a pmem-resident log file. This is useful for programs like databases that append frequently to a log file.
libpmemblk supports arrays of pmem-resident blocks, all the same size, that are atomically updated. For example, a program keeping a cache of fixed-size objects in pmem might find this library useful.

Can I use malloc to allocate persistent memory?

No. PMDK provides an interface to allocate and manage persistent memory.

How are PMDK libraries tested? Are they tested on real NVDIMMs?

These libraries were functionally validated on persistent memory emulated using DRAM. We are in the process of testing with actual hardware.

Is PMDK part of any Linux* or Windows* distributions?

Yes. PMDK libraries, but not tools, are included in Linux distributions from Suse*, Red Hat Enterprise Linux*, and Ubuntu*, and the list may grow in the future.

For Microsoft Windows, PMDK libraries, but not tools are included in Windows Server 2016 and Windows® 10. For details, see the pmem.io blog PMDK for Windows.

To get the complete PMDK, download it from the PMDK GitHub repository.

Does PMDK support ARM64*?

Currently only 64-bit Linux* and Windows* on x86 are supported.

Are there examples of real-world applications using PMDK?

Yes. For example, we have added persistent memory support for Redis*, which enables additional configuration options for managing persistence. In particular, when running Redis in Append Only File mode, all commands can be saved in a persistent memory-resident log file, instead of a plain-text append-only file stored on a conventional hard disk drive. Persistent memory resident log files are implemented in the libpmemlog library.

To learn more about the persistent memory implementation of Redis, including build instructions, visit the Libraries.io GitHub site pmem/redis.

PMDK—libpmem

When should I use libpmem versus libpmemobj?

libpmem provides low-level persistent memory support. If you’ve decided to handle persistent memory allocation and consistency across program interruptions yourself, you will find the functions in libpmem to be useful. Most developers use libpmemobj, which provides a transactional object store, memory allocation, transactions, and general facilities for persistent memory programming. It is implemented using libpmem.

Do 3D XPoint™ storage-based devices like Intel® Optane™ memory module 32 GB PCI Express* M.2 80 mm MEMPEK1W032GAXT support libpmem?

No. PMDK is designed and optimized for byte-addressable persistent memory devices only.

What is the difference between pmem_memcpy_persist and pmem_persist?

The difference is that pmem_persist does not copy anything, but only flushes data to persistence (out of the CPU cache). In other words:

pmem_memcpy_persist(dst, src, len) == memcpy(dst, src, len) + pmem_persist(dst, len)

PMDK—libpmemobj

Why is running libpmemobj too slow on SSDs?

PMDK is designed and optimized for byte-addressable persistent memory while SSDs are block based. When running libpmemobj on SSDs, this requires a translation from block to byte addressing, which adds additional time to a transaction. Also, it requires moving whole blocks from SSD to memory and back for reading and flushing writes.

How does an application find objects in the memory mapped file when it restarts after a crash?

Libpmemobj defines memory mapped regions as pools and they are identified by something called a layout. Each pool has a known location called root, and all the data structures are anchored off of root. When an application comes back from a crash it asks for the root object, from which the rest of the data can be retrieved.

What do the following terms mean in the context of libpmemobj?

Object Store: Object store refers to treating blobs of persistence as variable-sized objects (as opposed to files or blocks).
Memory pool: Memory mapped files are exposed as something called memory pools.
Layout: A string of your choice used to identify a pool.

Does libpmemobj support local and remote replication?

Yes, libpmemobj supports both local and remote replication through use of the sync option on the pmempool command or the pmempool_sync() API from the libpmempool(3) library.

Support for Transactions

Is there support for transactions that span multiple memory pools where each pool is of a different type?

There is no support for transactions that span multiple memory pools where each pool is of the same or a different type.

Multithread Support

How are pmem-aware locks handled across crashes? Or, how is concurrency handled in libpmemobj?

libpmemobj keeps track of the generation number that gets increased each time a pmemobj pool is opened. When a pmem-aware lock is acquired, such as a PMEM mutex, the lock is checked against the pool's current generation number to see if this is the first use since the pool was opened. If so, the lock is initialized. So, if you have a thousand locks held and the machine crashes, all those locks are dropped because the generation number is incremented when the pools are open, and it is decremented when the pools are closed. This prevents you from having to somehow find all the locks and iterate through them.

Thread Safety

Are pool management functions thread safe? For example, thread1 creates/opens/closes file1 and thread2 creates/opens/closes file1. Are these actions thread safe?

No. Pool management functions are not thread safe. The reason is the shared global state that we can't put under a lock for runtime performance reasons.

Is pmem_persist thread safe?

pmem_persist's job is to make sure that the passed memory region gets out of CPU caches, nothing more. It doesn't care about what's stored in this region. Store and flush are separate operations. If you want to store and persist atomically, you have to do the locking around both operations yourself.

PMDK—Pools

Pool Creation

Is there a good rule of thumb to determine what percentage of a pmemobj pool is usable or how big the pool should be if I want to allocate N objects of specific size?

libpmemobj uses roughly 4 kilobytes for each pool + 512 kilobytes per 16 gigabytes of static metadata. For example, a 100 gigabyte pool would require 3588 kilobytes of static metadata. Additionally, each memory chunk (256 kilobytes) used for small allocations (<= 2 megabytes) uses 320 bytes of metadata. Also, each allocated object has a 64-byte header.

I am trying to create a 100+ GB persistent memory pool. How do I use pmempool if I need a large pool?

One way is to ensure that you have persistent memory reserved before you use the pmempool with the command create. For more details, type the command man pmempool-create.

Create a blk pool file of size 110GB

$ pmempool create blk –-size=110G pool.blk

Create a log pool file of maximum allowed size

$ pmempool create log –M pool.log

Is there support for multiple pools within a single file?

No. Having multiple pools in a single file is not supported. Our libraries support concatenating multiple files to create a single pool.

Expanding Existing Pool Size

What is a good way to expand a persistent memory pool; for example, when I begin with a single 1 GB mapped file and later the program runs out of persistent memory?

We are often asked the question of whether the pool grows after creation. No, but you can use a holey file to create a huge pool, and then rely on the file system to do everything else. This usage model doesn't seem to satisfy most people as it is not how traditional storage solutions work. For details, see Runtime extensible zones in the PMEM GitHub* repository.

What is a good way to expand a libpmemobj pool; for example, when I begin with a single 1 GB mapped file and later the program runs out of persistent memory?

When using a file on a persistent memory-aware file system, all our libraries rely on file system capability to support sparse files. This means that you just create a file as large as you could possibly want, and the actual storage memory use would be only what is actually allocated.

However, with device DAX, that is no longer an option, and we are planning on implementing a feature that would allow pools to grow automatically in the upcoming release.

Deleting Pools

Is there a way to delete a memory pool via libpmemobj?

No. The pmemobj_close() function closes the memory pool and does not delete the memory pool handle. The object store itself lives on in the file that contains it and may be re-opened at a later time.

You can delete a pool using one of the following options:

Deleting the file from the file system that you memory mapped (object pool).
Using the pmempool "rm" command.

Persistent Memory over Fabrics

What is the purpose of librpmem and rpmemd?

librpmem and rpmemd implement persistent memory over fabric (PMOF). Persistent memory over fabric enables replication of data remotely between machines with persistent memory.

librpmem is a library in PMDK that will run on the initiator node and persistent memory is a new remote PMDK daemon that will run on each remote node that data is replicated to. The design makes use of the OpenFabrics Alliance (OFA) libfabric application-level API for the backend Remote Direct Memory Access (RDMA) networking infrastructure.

PMDK—Miscellaneous

Debugging

How do you enable libpmemlog debug logs?

Two versions of libpmemlog are typically available on a development system.

The application is linked using the -lpmemlog option. This option is optimized for performance and skips checks that impact performance, and never logs any trace information or performs any run-time assertions.
The application includes:
- Libraries under /usr/lib/PMDK_debug that contain run-time assertions and trace points.
- Set the environment variable LD_LIBRARY_PATH to /usr/lib/PMDK_debug or /usr/lib64/PMDK_debug, depending on the debug libraries installed on the system.

The trace points in the debug version of the library are enabled using the environment variable PMEMLOG_LOG_LEVEL.

Glossary

ADR (Asynchronous DRAM Refresh)

A platform-level feature where the power supply signals other system components that power-fail is imminent, causing the Write Pending Queues in the memory subsystem to be flushed

NVDIMM

A non-volatile dual in-line memory module. Intel will release an NVDIMM based on 3D XPoint Memory Media toward the end of 2018.

Persistent Domain or Power-fail Protected Domain

When storing to pmem, this is the point along the path taken by the store where the store is considered persistent.

WPQ (Write Pending Queue)

Write pending queues are part of the memory subsystem.

Resources

Intel Developer Zone Persistent Memory Programming site
Persistent Memory Programming with PMDK at pmem.io
Persistent Memory Google Group
Persistent Memory Programming on GitHub
The SNIA NVM Programming Model
Webinar: Persistent Memory Programming Using Non-volatile Memory Libraries (now part of the Persistent Memory Developer Kit (PMDK))
Code Samples at https://github.com/pmem/PMDK-examples