The first day was started by a keynote from Josep Torellas from UIUC, who argued that research will and should become more interdisciplinary over time. Three reasons support this view: 1. Our community is growing, including internationally, which means more opportunity for broad research, 2. Many new topics are becoming popular, which go beyond core architecture research areas (security, programming, OS, algorithms, circuits, bio, communication), and 3. Government funding is not growing, and most opportunities involve multiple research areas. Balance among core and interdisciplinary research, both in terms of funding and research priorities, remains an important question.
The second keynote was by Michael Garland, the director of research at NVIDIA. He spoke about the challenges of low-level programming for parallel hardware, how we need higher-level programming constructs that enable parallelism. A simple example of spawning multiple GPU threads and waiting showed that poor abstractions can lead to both no parallelism and high synchronization overhead. Addressing these challenges, he first described the coming support by NVIDIA compilers for C++17’s Parallel Algorithms, which enables transparent acceleration on GPUs. A new project is Legate, which enables programs which use the popular NumPy array-programming libraries to be transparently parallelized by changing a single import statement. It utilizes the Legion programming model and highly-parallel decentralized task scheduling on many nodes and GPUs.
Chris Lattener (SiFive) and Tatiana Shpeisman (Google) made the case for MLIR, a new flexible compiler intermediate representation. They point out that many compiler ecosystems (e.g., LLVM, Tensorflow) have a variety of graph IRs at different levels — these similar-but-different technologies cause duplication of infrastructure, and are both fragile and have poor support for understanding cross-cutting failures. MLIR’s goal is to provide domain-specific optimization capability while being general enough for multiple levels of compiler infrastructure, and for multiple communities. The design of MLIR borrows heavily from LLVM, but enables greater flexibility and extensibility. One of the key innovations in MLIR is the ability to express “dialects,” which are families of operations and types useful at a particular level. Chris ended the talk by making an impassioned case for replacing Clang and LLVM IR with MLIR, even though it will be a long and difficult task.
Best Paper Session
The best paper award went to “SIGMA”, by Qin et al. from Georgia Tech and Intel. SIGMA is a reconfigurable architecture for DNN training, which offers greater flexibility to kernel shapes and efficient support for sparsity. This is enabled by a novel reduction tree network and supporting compiler. The runner up was by Lin et al. from USC and Oregon State. They develop a deep reinforcement-learning framework for exploring NOC design. These works highlight the importance of architecture innovation for machine learning, and machine learning’s potential to enable powerful architecture optimization.
Other works in this session include “EMSim” by Sehatbakhsh et al. from Georgia Tech, that enables simulation of magnetic side-channel signal attacks on processor pipelines. “Impala”, by Sadredini et al. from Virginia, develops an algorithmic/architecture co-design for pattern matching automata.
Test of Time Award
The HPCA Test of Time (ToT) award recognizes the most influential papers published in prior sessions of HPCA which have had significant impact in the field. The Test of Time award this year went to “A Delay Model and Speculative Architecture for Pipelined Routers” from HPCA-7 (2001) by Li-Shiuan Peh and Bill Dally. This work was notable both for the way that it changed the focus of the community from off-chip to on-chip networks, as well as being a standard reference and analytical modeling tool for the community.
One highlight was the “Back to the Future” Vision talks, hosted by Yan Solihin. Steven Swanson (UCSD) took us on a journey to a parallel universe called “core-world”, in which core memory (non-volatile memory, NVM) became the dominant storage technology for the last several decades instead of DRAM. In core-world, many of the challenges with NVM could have been solved earlier, and it is up to us to invest sufficient time in our world for NVM to “catch-up” to where we could have been in core-world. Hsien-Hsin Sean Lee of Facebook discussed the limits of specialization for ML workloads and beyond. While it seems like we are hitting physical limits of transistors, we are still off by 8 orders of magnitude from fundamental physical limits. This suggests that perhaps other technologies (e.g., photonics or others) are necessary to continue scaling. David Kaeli, from Northeastern University, discussed the necessity, challenges and new directions for extreme-parallelism multi-GPU systems.
The two industry sessions were also exceptionally well-attended, and half these works focused on practical aspects of machine learning. For example, one talk was by Udit Gupta, who described the importance (majority of ML workload) and unique architecture challenges (memory bandwidth) from recommendation systems employed at Facebook. Daniel Richins discussed the end-to-end performance challenges in deep learning workloads in edge data centers, and found that tasks like data format conversion introduce “AI tax” that should be addressed to achieve high performance.
Workshops and Tutorials
HPCA featured a well-rounded set of tutorials spanning a wide variety of areas, which certainly supported the call for interdisciplinary research. Tutorials included a full day quantum-programming tutorial by researchers from NC-State, an AI benchmarking tutorial on AIBench and surrounding methodological issues by researchers from ICT, and multi-gpu simulator, Akita/MGPUSim, from Northeastern University. Several workshops/tutorials returned from previous years, based on the topics of accelerating biology/bioinformatics, cloud runtimes, and cognitive architectures.
One trend in this year’s HPCA is that many papers (including 3/4 in the best paper session) have open-sourced their tools and frameworks. This is clearly to the benefit of our community, yet raises questions about whether and how our community should evaluate and promote such contributions. Of note was CGO’s approach of having a “tools papers” track, in which artifact evaluation was mandatory. In fact, CGO papers which chose to have artifact evaluation were significantly more likely to be accepted. This will be increasingly relevant for the architecture community to consider going forward.
Finally, we hope that HPCA is not the last architecture conference of 2020, and that we can continue to have meaningful in-person interactions in a post-corona-virus world for the remainder of the year — fingers crossed. 🤞
About the authors: Tony Nowatzki is an Assistant Professor of Computer Science at the University of California – Los Angeles. Newsha Ardalani is a Research Scientist at Baidu Research. Her current work focuses on hardware/software co-design for extremely large-scale deep learning applications.
Disclaimer: These posts are written by individual contributors to share their thoughts on the Computer Architecture Today blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGARCH or its parent organization, ACM.