The end of Dennard scaling has exposed a widening gap between the demand to process ever-growing datasets and the capability of modern computers to do so. In a quest to make data processing faster and more power-efficient, researchers are revisiting the decades-old idea of near data computing which brings computation closer to the data. The data management and the computer architecture communities have compatible but diverging views on the problems that lie in the path of making near data computing a reality. This blog post points to two instances of this divergence where a holistic perspective may lead to promising new solutions:
1. Near data computing ≠ near memory computing. Pushing some compute capability near or inside DRAM is not sufficient, as cost-benefit analysis shows that it is not economical to keep massive datasets in DRAM. Hence, one needs to look to the next level of the storage hierarchy, currently flash memory, to realize the performance and power benefits of near data computing.
2. Software support to tackle silent data corruption. If computation is pushed closer to the flash memory chips, the throughput of error-correcting coding (ECC) needs to increase by an order of magnitude. However, the reliability of flash memory cells is deteriorating with technology scaling and more sophisticated ECC codes have prohibitively high implementation complexity. One can no longer assume that the hardware alone will be able to detect and correct data corruption at line rate. A promising solution is to synergistically using software and hardware to make data processing resilient to data corruption.
Why datasets will never fit in memory: the “five minute rule”
A large body of work on near-data computing has focused on in-memory computing, in which logic is added to the memory cell arrays to carry out simple computations. However, it is not economical to store massive datasets in main memory. In a seminal 1987 paper, Jim Gray and Franco Putzolu observed that whether a data item should reside in memory or not is most often a matter of simple cost-benefit analysis. The prescience of the argument still resonates today:
In some situations, response time dictates that data be main memory resident because disc accesses introduce too much delay. These situations are rare. More commonly, keeping data main-memory resident is purely an economic issue. When does it make economic sense to make data resident in main memory? A good rule of thumb is:
THE FIVE MINUTE RULE
Data referenced every five minutes should be memory resident.
The original rule is inextricably tied to 1980s pricing, but researchers have been updating the cost-benefit calculations every 10 years. (Mark your calendars: the next publication window opens in 2026!) Surprisingly, the five minute rule still holds 30 years after its original publication between DRAM memory and flash memory. In essence, the five minute rule today says that massive datasets will not be accessed frequently enough to justify the cost of keeping them entirely in DRAM over the much-cheaper flash. This dovetails with the fierce competition among storage vendors to offer affordable all-flash arrays (AFA) for public and private clouds. Fully realizing the performance and power benefits of near data computing requires looking to the next level of the storage hierarchy: bringing compute inside flash-based storage devices.
The data management community has studied how to offload query processing to the microprocessor inside the SSD controller. It has been shown that offloading selection and aggregation SQL queries to the SDD improves performance and energy consumption by 3× for a commercial database system. (This work in turn builds on Active Disks and Intelligent Disks that exploited the computational power of embedded processors inside spinning hard disks.) The limitation is that the microcontroller inside the SSD device quickly becomes a bottleneck during querying because it is neither powerful nor versatile enough to run general purpose programs. One proposal to bypass this limitation calls for using an FPGA between the host and the device for query acceleration.
These prior efforts push computation between the host CPU and the flash controller, which means that their data processing capability is limited by the speed of the interface or the flash controller. Bringing computation even closer to the data requires pushing some compute logic below the flash memory controller. This leverages the higher bandwidth between the NAND dies and the flash controller, which is currently underutilized. The challenge with placing compute logic closer to the NAND dies is that computation now needs to be cognizant of the possibility of silent data corruption.
Data management over unreliable storage
The reliability of storage media is deteriorating due to technology scaling. As cells shrink, detecting signal over noise becomes harder. In addition, as cells get closer to each other, disturbance errors between neighboring cells are observed, and can even break memory isolation between VMs in the cloud. As more bits are packed onto a flash cell, going from single-level flash cells (SLC) to denser MLC and TLC, the voltage differences between the levels shrinks and flash becomes less reliable. Furthermore, the raw bit error rates (RBER) of SSDs in the field can be one order of magnitude greater than what was predicted in lab tests. These trends suggest that much stronger error-correcting capabilities will be needed in the near future. This is not easy, as ECC decoders already require about 0.5M gates to keep up with the throughput of NAND flash. Increasing the codeword length is not a solution either, as the decoder complexity increases more than linearly with the codeword length. In short, pushing computation below the flash controller would require a much higher transistor and power investment for faster ECC decoding than what the compute logic itself would use! More radical approaches to error detection and correction that transfer part of the work to software are needed.
The data management community is starting to pay more attention at how data corruption can be detected in software. A paper that will be presented next month in the annual SIGMOD conference presents a software error detection mechanism that is tailored for in-memory column stores. Error detection is buffered by a data hardening step that encodes the data to make corruption detectable and a data softening step that decodes the data but leaves them vulnerable to corruption. The solution adopts arithmetic coding as a software-based error coding scheme which requires an integer multiplication for data hardening and an integer division for data softening. The appealing feature of arithmetic codes is that they give the database system the ability to directly work on the encoded data representation during query processing. For example, because multiplication is distributive, encoded data can be added directly to compute the encoded sum instead of performing a costly “decode, add, re-encode” sequence in every addition. In essence, arithmetic coding allows for computation to be done before error detection for common numerical operations.
Challenges and opportunities for error correction
Long BCH or low-density parity-check (LDPC) codes are typical choices for error-correcting codes in SSDs due to their good error-correcting performance and relatively low implementation complexity. Nevertheless, it is very difficult for such decoders to reach a throughput of 30 GB/sec or higher, as needed for in-flash computing. Adopting shorter codes seems essential to reduce the decoding complexity and latency. One could develop decoding strategies that are hierarchical, with lower levels done in hardware to correct simple errors involving a few bits, and higher levels done in software to catch infrequent multi-bit errors. In addition, the ability to only decode parts of a page would be powerful if combined with knowledge of the access pattern and the query intent.
Things get more interesting if one considers when is the most appropriate time to check for data corruption when processing a complex query. Database systems internally convert a SQL query into a query plan that describes the exact sequence of relational operations (such as project, filter, join or aggregate) that will be performed by the database system to answer the query. Data encoding and decoding can be considered as additional operations whose placement will be optimized automatically: before a query is executed, the database system perform a cost-based analysis to instruct the hardware whether it is better to check for data corruption before, during or after an expensive operation such as a join. Developing error-correcting strategies that jointly optimize the decoder hardware implementation with the database content and query workload is a promising open problem.
About the author: Spyros Blanas is an Assistant Professor in the Department of Computer Science and Engineering at The Ohio State University. His research interest is high performance database systems. Xinmiao Zhang and Radu Teodoerscu contributed to the development of the ideas in this post.