Computer Architecture Today

Informing the broad computing community about current activities, advances and future directions in computer architecture.

The 24th International Symposium on High-Performance Computer Architecture (HPCA) was held recently in picturesque Vienna, Austria. Below is a quick overview of HPCA-24.

Conference logistics

Yuan Xie (program chair) provided an overview of the submission and review process used. HPCA received a record number of submissions (260) and accepted 54 papers on topics ranging from memory systems to genomics. The rebuttal + revision model that was most recently used in MICRO-50 was used here. And for about 30% of the submissions, the authors chose revisions over rebuttal. HPCA accepted papers from industry in a separate industrial track (8 out of 33 submissions) and they were featured as two separate sessions. The conference also featured three papers in a “Best of CAL” session, as selected by the editorial board for Computer Architecture Letters.

Due to timing constraints, HPCA did not feature lightning talk sessions this year. However, authors were asked (optional) to upload their lightning talks to YouTube. Many authors did and these videos are linked from the conference website (including a rap video!).

Keynotes

As HPCA was co-located with CGO and PPoPP, we were treated with three excellent keynotes. The keynotes by Margaret Martonosi (What is the role of Architecture and Software Researchers on the Road to Quantum Supremacy?) and Sara-Jane Dunn (Biological Computation) were forward-looking and mapped out research challenges for the future while Peter Sewell (From confusion to clarity: hardware concurrency programming models 2008-2018) reflected on the journey to formalize hardware concurrency models. I look forward to seeing formally defined quantum and biological computers at HPCA!

Margaret gave the HPCA keynote and spoke about the role of systems researchers in helping realize the promise of quantum computing. Quantum computers can solve problems in chemistry, simulation, etc. that are currently intractable with classical computers. While different groups are building prototype quantum computers, much work needs to be done in developing the hardware and software ecosystem around them to make these computers useful to regular programmers. Margaret identified high-level programming languages, compilers, error correcting codes, control software, debugging tools, among others as the important tools that need to be developed for quantum computers. For example, due to high error rates and quantum decoherence, quantum algorithms rely on error correction. However, error correction in quantum computers is quite expensive, requiring 10-50 physical qubits (quantum bits) for each logical qubit. Several different ECCs and implementations have been proposed, however, as Javadi-Abhari et al. show, none of them are universally better than the others. So there is a need to develop hybrid, application-aware ECC mechanisms, a task systems researchers are tailored for. With companies like Intel, IBM, and Google along with many startups building and demonstrating prototypes, now is the time to develop abstractions and ecosystems to help programmers get the most out of quantum computers.

In the interest of space, I am omitting other keynote summaries. They can be found here instead.

Awards

The best paper award went to “Amdahl’s Law in the Datacenter Era: A Market for Fair Processor Allocation” by Zahedi et al. from Duke University. This paper proposes a datacenter processor allocation framework that guarantees fairness and yet outperforms existing allocation frameworks. The key insight behind this work is to use Amdahl’s law to accurately model an application’s performance for varying core allocations. Using Amdahl’s law as a starting point, the authors develop a practical model that accurately models application performance with varying core counts and data sets. Furthermore, the authors develop a market and a bidding process that allows for finding a fair, pareto-optimal resource allocation equilibrium.

The inaugural HPCA Test of Time award went to “Dynamically exploiting narrow width operands to improve processor power and performance” by David Brooks and Margaret Martonosi from HPCA ’99. This seminal work on power-efficient processor architectures proposes mechanisms to dynamically detect and exploit sub width ALU operations. Brooks and Martonosi proposed an operand-based (as opposed to opcode-based) sub-width operation detection mechanism that they leverage to power gate unused ALU gates and thus improve power-efficiency. Alternatively, the solution can be used to pack multiple such operations together to improve performance.

Business meeting

HPCA-25 (2019) will be held in Washington DC and there were three bids for HPCA-26 (2020) from Beijing, Edinburgh, and San Diego.

Rajeev Balasubramonian (co-program chair for HPCA-25) provided an overview of the two changes being considered for HPCA-25 reviewing process:

  1. Reviewers will be given an expected ratio of accept/reject scores based on the number of submissions/acceptances and historical ratios. These guidelines are expected to help reviewers calibrate their reviews (not be too positive or negative).
  2. Authors re-submitting papers will have an option of submitting a “diff” of the original paper. If a reviewer is going to review a paper that they reviewed previously, they can ask the chair to view the diff. This change is expected to improve the consistency of reviews and reduce reviewer load.

Rajeev also expressed concern over authors uploading submissions on arXiv before the conference review process is complete. While he acknowledged the usefulness of arXiv, arXiv de-anonymizes authors and hinders the double-blind review process. Such concerns are also being raised in other CS communities. He suggested that authors should balance the benefits of posting early versus disrupting double-blind review process.

Please contact HPCA-25 program chairs, Rajeev and Viji, if you have any suggestions/feedback on the above changes being considered.

Diversity

Recent efforts by Natalie Enright Jerger and Kim Hazelwood to bring to light the lack of diversity in our community seem to be having the intended effect. I found that the topics of diversity and promoting inclusion were part of the hallway conversations and general consciouness, more than any previous conference I attended. The announcement of a bias busting workshop at ISCA ’18 at the business meeting was received with enthusiasm. Furthermore, the conference also featured a meet and greet for Women In Architecture and a panel on Women in Academia and Industry. Such events go a long in way in encouraging researchers from underrepresented minority groups.  These are all great first steps, but much work needs to be done. As a community, it behooves us to follow Kathryn McKinley’s suggestions.

Papers

Papers on cache and memory systems dominated the program at HPCA, including one of the best paper nominees, but they came in very different flavors: conventional cache and memory systems, GPU cache/memory systems, persistent/non-volatile memories, secure memories, and in-memory computing. The conference also featured interesting papers on neural network accelerators, security, GPUs, and insights from industry.

Among the papers on memory hierarchy design, “Reducing Data Transfer Energy by Exploiting Similarity within a Data Transaction” by Lee et al., one of the best paper nominees, reduces data transfer energy by exploiting the observation that transferring a ‘0’ incurs less cost than a ‘1’ and deploys different data encoding schemes to maximize the number of 0s transferred. “SIPT: Speculatively Indexed, Physically Tagged Caches” by Zheng et al. proposes a novel cache indexing mechanism for L1 caches that simultaneously enables large, high performance, and energy-efficient L1 caches. “Making Memristive Neural Network Accelerators Reliable” by Feinberg et al. observes that prior art on analog, in-memory, neural network accelerators does not account for errors that in turn lead to higher classification errors and proposes a novel arithmetic codes based error correction mechanism to reduce classification errors.

Among the papers on security, “D-ORAM: Path-ORAM Delegation for Low Execution Interference on Cloud Servers with Untrusted Memory” by Wang et al. and “Secure DIMM: Moving ORAM Primitives Closer to Memory” by Shaifee et al. both reduce the overheads of address obfuscating memories (like ORAM) by shifting most of the necessary computation closer to the memory devices onto the buffer chip available on some modern DIMMs. “SYNERGY: Rethinking Secure-Memory Design for Error-Correcting Memories” by Saileshwar et al. designs efficient and reliable memories by co-designing memory encryption and error-correcting mechanisms, as opposed to the conventional approach of designing them independently. “Are Coherence Protocol States Vulnerable to Information Leakage?” by Yao et al. demonstrates how hardware cache coherence protocols can be used by attackers to manipulate the coherence state of shared cache blocks to develop timing channels on production hardware (!).

Among the papers on novel architectures, “A Case for Packageless Processors” by Pal et al. argues that processor packages are the main culprit behind limited memory bandwidth available in modern systems and proposes Silicon Interconnection Fabric based packageless processors to deliver increased performance and reduced area consumption. “Routerless Networks-on-Chip” by Alazemi et al. identifies routers in interconnection networks to incur significant power and area overheads and proposes a router-less architecture to tackle these overheads.

The industrial tracks featured papers from Intel, AMD, Google, Microsoft, ICT, and Facebook. “Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective” by Hazelwood et al. provides a fascinating overview of the scale and diversity of machine learning workloads at Facebook and present system design guidelines learned in practice. “Amdahl’s Law in Big Data Analytics: Alive and Kicking in TPCx-BB (BigBench)” by Richins et al. presents a comprehensive microarchitectural study of big data applications, using the newly established industry standard performance benchmark for big data analytics, and makes a surprising observation that these workloads do not exhibit sufficient thread level parallelism and can benefit from optimizations that improve single thread performance. ”Memory Hierarchy for Web Search” by Ayers et al. investigates the role of memory hierarchy performance in commercial web search applications, and shows that there is a significant reuse of data that is not captured by modern cache hierarchies and accordingly proposes new memory hierarchy designs.

About the author: Aasheesh Kolli is a post-doc researcher at VMware Research and will join the Computer Science and Engineering Department at Penn State as an Assistant Professor in Fall ’18. [Website | @aasheeshkolli on Twitter]