Computer Architecture Today

Informing the broad computing community about current activities, advances and future directions in computer architecture.

The 44th ISCA just wrapped up. This year it was held in Toronto, which proved to be a great location owning in no small part to great local organizing and participation. I find attending ISCA to be the best way to quickly read the pulse of the broader Computer Architecture community. This year again, it did not disappoint. Here’s a quick summary of what I observed (TL;NR: machine learning is still a thing).

Before the main conference officially kicked off we were treated to a new edition of the “Visioning” workshop series, this time focused on “Trends in Machine Learning”.  It was the best attended workshop I have ever seen at an architecture conference, with over 160 registrations. The speaker lineup was outstanding with a great mix of talks by people from companies that are heavily invested in ML as well as academic research labs. Notably, many of the speakers’ backgrounds were in core AI/machine learning fields. The high order bits of the presentations included the positive impact new hardware has had on advancing ML applications, new initiatives to deploy AI on low-power “edge” devices and new algorithms and optimization methods (binarization came up in multiple talks). We heard from NVIDIA, Google and Intel on their hardware accelerator efforts and an interesting thought exercise on the limits of learning on CMOS from the Baidu AI Lab.

The main conference featured two great keynotes. On Monday we heard from Intel Senior Fellow Mark Bohr on “CMOS Scaling Trends and Beyond”. Dr. Bohr made a compelling case that the imminent demise of Moore’s Law is perhaps not so imminent. Even though Intel is slowing the cadence of it’s new technology iterations, the 14nm and 10nm nodes have brought better-than-average density scaling (Intel calls it “hyperscaling”). The result is continued 0.5X scaling every 2 years or so. In response to a question from the audience Dr. Bohr acknowledged that this applies to logic scaling, and on-chip SRAM faces bigger challenges. Another interesting technology mentioned in the talk was Intel’s solution for integrating multiple heterogeneous dies in a packages with a common interface, the multi-die silicon bridge, embedded in the interconnect substrate. Also mentioned was the elusive Extreme Ultraviolet Lithography that seems to always be a few years away from production.

The second keynote on Tuesday morning featured Partha Ranganathan from Google who gave us “More Moore: Thinking Outside the (Server) Box.” This was another great presentation that touched on the system and applications aspects of beyond Moore computing. Dr. Ranganathan made the case that we are now at an interesting crossroad in computing: the intersection between exploding demand for computation from new applications and slowing supply of computation capabilities. The solutions he suggested are broad-ranging but can perhaps be summarized as thinking about the datacenter as the new computer, optimizing across the hardware/software boundary, and tailoring the architecture to the application. The talk also included a call to explore software defined infrastructure, which pushes more software control and intelligence into previously “dumb” devices. He gave networking and storage as examples of success stories in this space.

Undoubtedly the most anticipated regular paper in the conference was Google’s “In-Datacenter Performance Analysis of a Tensor Processing Unit”, which presented the first generation of the TPU design. The key components of the architecture are a 256×256 systolic array for matrix operations capable of 64K multiply-accumulate operations per cycle and large accumulator and buffer memories. We also heard some interesting statistics on the breakdown of workloads running on Google servers. While our community is focused on CNN designs, these workloads represent only 5% of what Google runs. The vast majority are multilayer perceptrons (61%) and LSTMs (23%). The Q&A provided the opportunity for some spirited discussion about the paper’s evaluation and fair comparisons between the TPU and modern GPUs. It was good to see some friendly performance competition in the accelerator space!

The rest of the program was again dominated by hardware for machine learning and other accelerator designs in a total of three sections. In the same section as Google’s TPU, we heard about SCALEDEEP, a scalable deep learning architecture designed for training from Purdue and Intel; and about SCNN, an accelerator for sparse and compressed CNNs from NVIDIA, MIT, Berkeley and Stanford.

Security also received plenty of attention this year, with two sections focusing on the subject. This included some very interesting work out of Georgia Teach on using electromagnetic emanations to non-invasively detect malware attacks.The system captures electromagnetic signatures of program execution using external antennas and detects malware by observing deviations from the expected profile of those signatures. I found this to be one of the most original ideas of this year’s ISCA. One of the security sessions also featured two papers on approaches for encrypting and obfuscating the memory bus to prevent side channel attacks.

A power and energy section featured a paper from University of Minnesota on improving the efficiency of voltage regulation by considering and adapting to thermal effects; and a paper from University of Wisconsin on a novel approach to power gating the clock distribution network that is aware of its hierarchical structure and attempts to consolidate gated units to improve efficiency.

Memory consistency challenges were attacked in at least three papers. Work from University of Illinois examined relaxed atomics on heterogeneous systems. A paper from University of Murci and Uppsala University looked at “Non-Speculative Load-Load Reordering in TSO.” A third paper from University of Michigan proposed MTraceCheck, a solution for verifying non-deterministic behavior of memory consistency models after tape-out.

In the reliability session I noted MeRLiN from University of Athens and Universitat Politecnica de Catalunya, a novel approach for dramatically speeding up fault injection campaigns into RTL by identifying and isolating relevant execution phases.

New computing models were explored in work from University of Michigan on parallelizing automata processors, and from EPFL and University of Edinburgh on “The Mondrian Data Engine,” a hardware-software co-design of data analytics operators for near-memory processing.

This year we were also treated to a panel that pondered whether “the death of Moore’s Law will make Computer Architecture livelier than ever.” The discussion was moderated by Prof. Margaret Martonosi and included a reference to the Monty Python “Dead Parrot” skit, which I found quite topical. The main points made by the panelists were that things will change and that change will be good for the field. However, we need to make some substantial efforts and investments. Hsien-Hsin Sean Lee from TSMC envisions the Architect 2.0 that is no longer one piece of a rigid stack, but the epicenter of an application-driven computing model. Babak Falsafi welcomes the new heterogeneous world and argued that integration, specialization and approximation will be key. He urged a focus on systems and solutions, not just the CPU. He acknowledged a need for new tools and methodologies. And speaking of new tools, Mark Hill made a compelling case for open source hardware. He argued that it would drive lower-cost and more rapid innovation in both academia and industry, much like open source software has served as a foundation for both research and the startup ecosystem. The panelists’ presentations generated a lot of discussion and audience questions but remarkably little disagreement. It seems that our field has finally reached consensus!…

To sum up, I found our community to be optimistic about its future end energized by the new challenges it faces. The diversity of topics covered by the conference is greater that it has ever been. It looks like the disruption to traditional performance scaling has freed us to think more broadly about our role in the bigger computing picture and pushed us to attack bolder research problems. I felt more excitement than ever about what comes next.

I am already looking forward to next year’s ISCA in Los Angeles!

Radu Teodorescu is an Associate Professor in the Department of Computer Science and Engineering at Ohio State University.