(This blog is written to celebrate the two-year anniversary for the worlds’ first commercial processor with die-stacking technology, which was released on 6/16/2015, as AMD Fury X GPU)
Many of you who attended MICRO 2013 may still remember the keynote speech “Die-stacking is happening” by AMD’s Bryan Black, who predicted at that time that die-stacking technology would soon be coming to main stream products, rather than just as a research topic in academia. Eighteen months later, in the summer of 2015, AMD released the worlds’ first commercial GPU called Fury X, which integrated 4GB of 3D-stacked HBM (High Bandwidth Memory) and created a lot of excitement in the market. Very recently, AMD and Nvidia announced their latest GPU architectures, Vega and Volta, respectively, both of which integrate 16GB of HBM.
What is Die-Stacking Technology?
Die-stacking technology is also called three-dimensional integrated circuits (3D ICs). The concept is to stack multiple layers of integrated circuits (ICs) vertically, and connect them together with vertical interconnections called through-silicon vias (TSV). 3D integration technologies offer many benefits for IC designs. Such benefits include: (1) The reduction in interconnect wire length, which results in improved performance and reduced power consumption; (2) Improved memory bandwidth; (3) The support for the realization of heterogeneous integration, which could result in novel architecture designs; (4) Smaller form factors, which results in higher packaging density and smaller footprint due to the addition of a third dimension to the conventional two dimensional layout, and potentially results in a lower cost design. Consequently, 3D integration technology is one of the promising solutions to overcome the barriers in interconnect scaling, thereby offering an opportunity to continue performance improvements using CMOS technology.
Early Die-stacking Architecture Efforts
The 3D integration technology has been an active research topics since late 90s and early 2000s. IBM was one of the pioneers to study the process technology. As fabrication of 3D integrated circuits has become viable, developing EDA tools and architectural techniques is imperative to explore the design space for processor design using the 3D technology. Intel was the first to explore possible directions to re-architect microprocessors with the die-stacking technology. They first demonstrated a single core processor partitioned into two layers in 2004. Three years later in 2007, Intel demonstrated a prototype 2-layer many-core processor, with 20MB SRAM stacked on top of the 80-core layer, providing 1TB/s bandwidth between the memory and the logic layer. The research community was very excited to see Intel’s efforts, looking forward to commercial products to be available soon.
3D Stacked Processors: Are We There Yet?
However, we didn’t see any real Intel 3D products out in the market after Intel’s demonstration in 2004 and 2007. Why?— Even though an emerging technology such as die-stacking can be proved technically feasible, it may not be adopted by product teams due to other challenges that may be related to cost issues, business models, or killer applications (see the IEEE Micro article “3D stacked processor: Are we there yet?“). For example, stacking DRAM on top of processor layers may result in thermal dissipation challenges, and customizing the DRAM dies for each processor design could increase the overall cost. Also, it needs to have killer applications that can fully leverage the benefits of high memory bandwidth resulting from the die-stacking architecture.
Step Back Towards Thriving: from 3D to 2.5D
To address these challenges, one possible solution is to step back and adopt an interposer-based 2.5D approach. In this approach, the design of the 3D-stacked DRAM and the design of logic die are decoupled. Memory vendors (such as Hynix) would focus on designing many-layer 3D stacked DRAM with an industry standard (such as JEDEC’s HBM), while the processor vendors (such as AMD or Nvidia) would focus on the design of the logic dies. The 3D stacked DRAM die and the CPU/GPU die would be placed side-by-side on a silicon interposer.
Following this direction, AMD became the pioneer to take this approach to make die-stacking architecture happen in main stream computing, with the world’s first commercial GPU product Fury X released in 2015, integrating 4GB High-Bandwidth Memory. Since then, various companies followed up with different variations, for various application domains. Nvidia’s Pascal GPU was packed with 16GB HBM for AI applications; Intel’s Knights’ Landing Xeon Phi CPU was packed with 16GB of MCDRAM (Multi-Channel DRAM) which is a variant of the Hybrid Memory Cube (HMC) design for High Performance Computing (HPC) applications. Beyond the CPU/GPU architecture with 3D stacked memory, Xilinx also announced in 2016 that the Virtex UltraScale+ FPGAs are packed with the High Bandwidth Memory (HBM).
Technology vs. Architecture: An Evolving Interaction
As a summary, die-stacking technology was first investigated more than two decades ago, and inspired architects to explore various possible processor architectures (such as fine-granularity logic-on-logic stacking, memory-on-logic stacking, and finally stacked-memory with logic on interposer), and finally became a main stream architecture. It reminds me of the classic paper by John Hennessy and Norm Jouppi “Computing Technology and Architecture: an Evolving Interaction”, published in 1991. In the article, the authors claimed that “The interaction between computer architecture and IC technology is complex and bidirectional”: the characteristics of technologies affect decisions architects make by influencing performance, cost, and other system attributes, and the developments in computer architecture also impact the viability of different technologies.
3D integration technology will play an even more important role in the future architecture design. Historically, both technology scaling and architectural innovation have played equally important roles to improve the performance of microprocessors, as indicated by an article “CPU DB: Recording Microprocessor History”. Intel also followed a “Tick-Tock” model to drive the development of their microprocessor designs, with every “tick” represented a shrinking of the process technology of the previous microarchitecture, and every “tock” designated a new microarchitecture. However, as technology scaling slows down and with the prediction of the “end of Moore’s law” (even Intel changed their model from “Tick-Tock” to “Tick-Tock-Tock”), shrinking in the 2D dimension becomes more difficult. Consequently, as described by the recent released CCC whitepaper, “Arch2030: A Vision of Computer Architecture Research over the Next 15 Years”, Going Vertical offers a new dimension of scalability in chip design, enabling the integration of more transistors in a single system despite an end of Moore’s Law. In particular, going beyond the current 2.5D integration and TSV-based 3D integration, monolithic 3D integration (grown transistors layer-by-layer on a single silicon substrate) with extreme fine vertical interconnections could open a new, rich field of architectural possibilities.