The 50th Annual International Symposium on Microarchitecture was held earlier this week in Cambridge, Massachusetts.
PC Chairs Joel Emer and Daniel Sanchez kicked things off by sharing some data on conference submissions and reviewing. MICRO-50 used a revision-based model similar to MICRO 2015. Memory systems (the most popular topic among MICRO submissions this year) were well-represented in the program, with one session on DRAM and another on persistent memory. The best-paper runner up (discussed more below) also focused on improving the performance of accessing persistent memory.
Accelerators were another popular area, with two sessions on GPUs, and one on other kinds of accelerators. Deep learning was of special focus within the topic of accelerators, with an entire session dedicated to it. A best-paper nominee, the DeftNN paper, also focused on accelerating deep learning by pruning unutilized parts of the deep neural network and moving computation closer to memory. Tuesday’s keynote, from Microsoft’s Doug Burger, also showed how the company is using FPGA hardware at scale to accelerate deep learning. Our community is pursuing many different directions to improve the performance of this important workload, from bit-level to algorithm-level optimizations. It will be interesting to see if research continues to pursue different layers or if a consistent abstraction evolves. The accelerator and memory system trends flowed together in the “In/Near Memory Computing” session, which included a paper on Oracle’s now-cancelled RAPID project, which built near-memory accelerators into the memory controller.
In addition to the expected categories of papers, I was struck by the breadth of topics covered at MICRO this year. In my estimation, it was much broader than typical recent architecture conferences. There were three papers on quantum computing, as our community grapples with the system design issues for this radically different computing fabric. These papers surely would have been their own session were one of them not the best paper award winner (discussed more below). Back in the classical realm, other papers described a mixed-signal accelerator for solving PDEs, architectural trade-offs for processors made from biodegradeable transistors, using perceptron branch predictors for power management in brain-machine interfaces in rats, and a thought-provoking paper using statistics to model the risk that a processor design will fall short of its performance expectations. We’ve come a long way from SimpleScalar.
Keynote: “To a Trillion and Beyond: the Future of the Internet of Things”
Krisztián Flautner, VP of Technology at ARM, gave the opening keynote titled “To a Trillion and Beyond: the Future of the Internet of Things”. His talk discussed the importance of data and the difficulty of obtaining trustworthy systems. The data collected by IoT devices is a valuable resource so companies tend to hoard it, even though society might be better off if they shared. For example, with 1 trillion miles driven needed to achieve reliable autonomous vehicles, it may be difficult or impossible for a single carmaker to accrue sufficient experience. But if experience were shared across carmakers, safe autonomous vehicles could be realized sooner. The second part of the keynote discussed how trust can be achieved via stability, governance and transparency. Humans have an unfortunate tendency to trust based on superficial characteristics such as facial features. For systems like autonomous vehicles that demand high levels of trust, we must strive to achieve this along several dimensions while avoiding human biases.
“Legends of MICRO” panel
After lunch the “Legends of MICRO” panel was held with Tom Conte, Matt Farrens, Gearold Johnson, Yale Patt and Nick Tredennick as panelists and Rich Belgard as moderator. Margaret Martonosi from Princeton respectfully read a statement before the panel began. She raised awareness about the unfortunate bias that has characterized MICRO’s history — no female PC chairs in the past 26 years, just two female keynote speakers in the conference’s 50-year history (Margaret being one of them). As she read the statement, more and more audience members stood up in support of her message, culminating in a standing ovation. A community conversation about how to address these issues was deferred to the MICRO business meeting (see below), which drew record attendance.
The panel itself discussed conference history, such as how MICRO got its logo (it’s a “uA”, for microarchitecture, under the ISCA architecture pyramid) and how it successfully rebooted itself in the late 1980s, upgrading from the “Workshop on Microprogramming” in 1987 to the “Symposium on Microarchitecture” in 1991 (with a stop at “Workshop and Symposium on Microprogramming and Microarchitecture” in 1990, for completeness) that we know today.
In the evening, at the Boston Aquarium, another panel discussion was held with Arvind, Bob Colwell, Phil Emma and Josh Fisher as panelists, and moderator Srini Devadas. Bob Colwell and Josh Fisher shared their experience at MultiFlow Computer, building one of the earliest VLIW processors and a compiler to target it. Their core observations of the amount of ILP available in ostensibly-sequential code would have a powerful impact on processor design, especially embedded systems and DSPs, and would influence Bob’s later work at Intel making out-of-order designs a commercial reality. Asked to share advice for future generations of architects, Phil Emma encouraged moving beyond the limited interface that current instructions adopt, to richer data structures and more sophisticated operations. Arvind noted that it’s important to study the most sophisticated designs (typically general-purpose processors) to deepen one’s skills. These fancy techniques are often applicable in surprising ways, such as using way prediction to save energy even in microcontrollers.
Keynote: “Specialization and Accelerated AI at Hyperscale”
Day two began with a keynote from Doug Burger, Distinguished Engineer at Microsoft, on “Specialization and Accelerated AI at Hyperscale”. He described Microsoft’s work using FPGAs to support streaming computation at huge scale within production datacenters. Given the mass and breadth of code running in Microsoft’s Azure public cloud, reconfigurable hardware is a natural choice to support a wide range of use-cases and to allow developers to gradually “harden” (i.e., port software to an FPGA implementation) portions of their applications. Burger shared the history of the Catapult platform, starting with FPGAs added to Bing servers to accelerate web search. Their current design couples an FPGA with a high-speed network card, and allows the FPGA to process all traffic going into and out of a server. With this in-network programmable hardware, it becomes possible to run an entire microservice on FPGAs, wholly decoupling it from conventional servers. This, in turn, has helped enable Microsoft’s Brainwave project for high-performance deep learning. With Brainwave, FPGAs provide key components of deep neural network inference as a hardware microservice, allowing for very low latency. Because of the programmable hardware, optimizations such as reduced-precision data types are very natural to implement. Burger emphasized the importance of system-level design: reconfigurable hardware has led to new datacenter organizations. Many of these innovations have resulted from working in the “cracks between sub-fields”, e.g., between architecture and security, networking or distributed systems.
At the awards ceremony, 6 folks were inducted into the MICRO Hall of Fame; membership requires 8 or more MICRO papers. Bill Dally, Chita Das, Reetuparna Das, Daniel Jiménez, Aamer Jaleel and Yuan Xie were presented plaques for their contributions to MICRO.
Guang Gao from the University of Delaware was presented with the B. Ramakrishna (Bob) Rau Award, “For contributions to compiler techniques and microarchitectures for instruction-level and thread-level parallel computing.” In his speech, he noted the importance of thinking at the intersection of microarchitecture, compilers and the OS. Hardware heterogeneity, e.g., due to machine learning accelerators, makes this work challenging but also full of opportunities.
The MICRO Test-of-Time Award was presented to Mikko Lipasti and John Shen for their paper “Exceeding the Dataflow Limit via Value Prediction” from MICRO-29 in 1996. The authors noted graciously that at the time of their work, 3 other groups were simultaneously investigating value prediction. While the modest performance benefits and power costs of value prediction initially dampened industrial enthusiasm, several industrial teams are currently investigating its potential.
The second day closed with the MICRO business meeting, where it was revealed that MICRO-51 next year will be in Fukuoka (“foo-coh-kuh”), Japan. The main event, however, was an extended discussion of diversity within the MICRO community. Attendance at the business meeting was extraordinary – standing room only the entire time. The conversation was wide-ranging and covered many topics. I’ll cover a smattering here, but Adrian Sampson has tweeted about many over on @acmsigarch. Attendees discussed term limits for Steering Committee members, cooldowns on PC membership, a new Diversity Chair or Ombudsperson role, the need for more concrete data on gender and ethnicity within the community (the conference organizers estimated the number of female MICRO attendees at 11%, based on the jacket sizes given out), special programming for 1st-time attendees, and the need to track graduating PhDs so they can reliably be considered for PC membership and conference organization roles.
Best Paper Session
On day three, the conference ended with the best paper nominees session. It was great to see people sticking around until the end of the conference: attendance was comparable with the keynote talks. The best paper award went to “An Experimental Microarchitecture for a Superconducting Quantum Processor”, by Xiang Fu et al. from Delft University of Technology. Fu presented the first microarchitecture for a quantum computer. The QuMA (for Quantum MicroArchitecture) design bridges the gap between the classical and quantum domains, showing that conventional digital logic can be used to control underlying qubits. The evaluation was particularly impressive: the authors implemented their microarchitecture on an FPGA and used it to run a program on a real 1-bit quantum computer. The best-paper runner-up was “Hardware Supported Persistent Object Address Translation” by Tiancong Wang et al. from NC State. In anticipation of upcoming persistent memory technologies, their work proposes hardware support to translate the ObjectIDs of persistent objects into virtual addresses, as persistent objects can be mapped anywhere within the virtual address space. ObjectIDs effectively form a new kind of address space, and hardware translation support is needed to allow fast access to persistent objects. The other two best-paper nominees were the DeftNN deep learning accelerator (discussed above), and the intriguing DEMIS paper. DEMIS showed that compiler and architectural techniques can have significant impact on the electromagnetic radiation that a processor emits. This radiation in turn can interfere with WiFi or LTE signals and reduce their bandwidth, but a processor can be designed to adapt dynamically to avoid interference.
About the author: Joseph Devietti is an Assistant Professor in the Department of Computer & Information Science at the University of Pennsylvania.