dm-cache
As part of the Linux kernel, dm-cache is a device mapper target allowing creation of hybrid volumes, written by Joe Thornber, Heinz Mauelshagen and Mike Snitzer. It allows one or more fast storage devices such as flash-based solid-state drives (SSDs) to act as a cache for one or more slower hard disk drives.
The design of dm-cache involves usage of three physical devices (origin, cache and metadata) for the creation of one hybrid volume. Cache policies, in form of separate modules, are determining how the caching is actually performed.
Overview
Using dm-cache makes it possible to use SSDs as another level of indirection within the data storage access paths, allowing improved speeds by using fast SSDs as caches for slower hard drives (HDDs). That way, the gap between SSDs and HDDs can be bridged – the costly speed of SSDs gets combined with the cheap storage capacity of traditional HDDs.[1] Also, dm-cache can be used for improving performance and reducing the load of storage area networks.[2][3]
Cache policies, in form of separate modules, are determining how the caching is actually performed. These pluggable cache policies can be used to change the algorithms for selecting which blocks are promoted (moved from HDD to SSD), demoted (moved from SSD to HDD), kept in sync, cleaned etc.[4]
For the default multiqueue policy, caching is implemented by using SSDs for storing data associated with performed random reads and random writes, utilizing near-zero seek times as the most prominent feature of SSDs. Sequential I/O is not cached, in order to avoid rapid SSD cache invalidation on such already suitable enough operations for HDDs. Not caching the sequential I/O also helps in extending lifetime of the SSDs used as caches.[5]
History
Another dm-cache project with similar goals was announced by Eric Van Hensbergen and Ming Zhao in 2006, as the result of an internship work at IBM.[6]
Later, Joe Thornber, Heinz Mauelshagen and Mike Snitzer got their own take on the concept, resulting in inclusion of dm-cache into the Linux kernel mainline; it was merged in kernel version 3.9, released on 28 April 2013.[4][7]
Design
Mapped virtual cache device is created by specifying three physical devices:[5]
- origin device – provides slow primary storage (usually an HDD)
- cache device – provides a fast cache (usually an SSD)
- metadata device – records blocks placement and their dirty flags, as well as other internal data required by a policy (per-block hit counts etc.); such a device can not be shared between virtual cache devices, and it is recommended to be mirrored.
Block size, equaling to the size of a caching extent, is configurable only during the creation of a virtual cache device. Recommended sizes are 256–1024 KB, while they have to be multiples of 64. Having caching extents bigger than HDD sectors is a compromise between the size of metadata, and the possibility for wasting cache space. Having too small caching extents increases the metadata size, both in the metadata device and in kernel memory. Having too large metadata extents increases the amount of wasted cache space, due to whole extents being cached even in case of high hit rates only for some of their parts.[4][8]
Both write-back and write-through policies are supported for caching write operations. In case of the write-back policy, writes to cached blocks are going to the cache device only, with such blocks marked as dirty in the metadata. For the write-through policy, write requests are not returned as completed until data reaches both the origin and cache device, with no clean blocks becoming marked as dirty.[4]
Decommissioning a virtual cache device is performed by the cleaner policy, which effectively flushes all dirty blocks from the cache device to the origin device.[5]
Rate of the performed data migration (both promotions and demotions) can be kept throttled down to a configured speed. That way normal I/O to the origin and cache devices can be preserved.[4]
Cache policies
As of October 2013, two cache policies are distributed with the Linux kernel mainline:[5]
- multiqueue
- This policy has two sets of 16 queues – one set for entries waiting for the cache, and another one for those already in the cache. Cache entries in the queues are aged based on logical time. Entry into the cache is based on variable thresholds, and queue selection is based on the hit count of an entry. This policy aims to take different cache miss costs into account, and to adjust to varying load patterns automatically. Large, sequential I/Os are left to be performed by the origin device, since HDDs tend to have good bandwidth. Contiguous I/Os are tracked internally, so they can be routed around the cache.
- cleaner
- This policy writes back all dirty blocks in a cache, so it can be decommissioned.
See also
References
- ^ Petros Koutoupis (2013-11-25). "Advanced Hard Drive Caching Techniques". linuxjournal.com. Retrieved 2013-12-02.
- ^ "dm-cache: Dynamic Block-level Storage Caching". Florida International University. Retrieved 2013-10-09.
- ^ Dulcardo Arteaga; Douglas Otstott; Ming Zhao. "Dynamic Block-level Cache Management for Cloud Computing Systems" (PDF). Florida International University. Retrieved 2013-12-02.
- ^ a b c d e Joe Thornber; Heinz Mauelshagen; Mike Snitzer. "Documentation/device-mapper/cache.txt". Linux kernel documentation. kernel.org. Retrieved 2013-10-07.
- ^ a b c d Joe Thornber; Heinz Mauelshagen; Mike Snitzer. "Documentation/device-mapper/cache-policies.txt". Linux kernel documentation. kernel.org. Retrieved 2013-10-07.
- ^ Eric Van Hensbergen; Ming Zhao (2006-11-28). "Dynamic Policy Disk Caching for Storage Networking" (PDF). IBM Research Report. IBM. Retrieved 2013-12-02.
- ^ "Linux 3.9". 1.3. SSD cache devices. kernelnewbies.org. 2013-04-28. Retrieved 2013-10-07.
- ^ Jake Edge (2013-05-01). "LSFMM: Caching – dm-cache and bcache". LWN.net. Retrieved 2013-10-07.