Computer Architecture Today

Informing the broad computing community about current activities, advances and future directions in computer architecture.

In late March 2025, computer systems researchers from around the globe descended upon Rotterdam, Netherlands to participate in another edition of ASPLOS.  The event was distinguished from previous installments for a number of reasons; the conference is now celebrating its 30th anniversary, and was co-located with EuroSys for the first time in its history. It also featured a new Contest Track that we both had the pleasure of organizing.

While contest tracks are new to the architecture community, they are no stranger to other ACM conferences.  For example, the electronic design automation conference ICCAD has offered competitions ever since its 30th edition in 2012, and SIGMOD has routinely hosted a student research competition since 2017. But for researchers working at the intersection of architecture, compilers, programming languages, and operating systems, the pleasure of competing against other teams on targeted industrial problems has (until now) been out of reach.

The ASPLOS/EuroSys 2025 Contest Track involved topics contributed by two corporate sponsors — Google and Amazon Web Services — which were released in late 2024.  For roughly three months, a total of thirty teams worked diligently on their solutions to these problems, addressing key challenges that arise in subjects related to hardware-accelerated machine learning. A driving hypothesis of this event was the theory that one or more contestants might stumble upon a discovery that would significantly advance the state-of-the-art for these topics. As it turns out, the teams did not disappoint!  

The contest topic sponsored by Google focused on intra-operator parallelism for distributed deep learning, tasking competitors to find cost-minimizing assignments that successfully partition large tensor operations across arrays of interconnected hardware accelerators.  This NP-hard problem is a critical challenge faced by production ML compiler toolchains responsible for balancing the computation performed across devices with the communication required to synchronize results. To help facilitate the design of new algorithms, Google also released a new benchmark suite specifically for the contest, composed of very large graphs (i.e., containing tens of thousands of nodes each) derived from real industrial models.

The Amazon Web Services contest focused on its new Neuron Kernel Interface (NKI) library, a tile-based Python DSL which enables high performance, “down to the metal” access to Amazon’s Trainium/Inferentia family of ML accelerator chips. In the Amazon contest, competitors needed to identify parts of a PyTorch Llama 3.2 1B model that can be implemented as NKI kernels (operators, fused operators, layers, or even the whole model!) and replace them in the original model to achieve better performance than produced by the existing Neuron compiler. Amazon supplied up to $500 in AWS credits per team, for a total of $5,500 in credits.

In a special session held during the Workshop & Tutorial days, participants shared key insights of various techniques they employed in the creation of their solutions.  For example, a number of submissions to the Google topic leveraged variants of stochastic local search that had previously been unexplored.  Likewise, contestants in the Amazon topic demonstrated that NKI, which is still in beta, was able to largely replace kernels written in low-level instructions for the majority of a forward pass in common language model architectures. In particular, contestants’ kernels handled smaller shapes with impressive improvements; the winning group’s result was made possible in part through a novel micro-kernel programming design demonstrated by their submission.  Below are some photos from the award ceremony.

The prize-winning entries to both topics — which originated from various international locations, including the United States, Germany, Sweden, Japan, China, and elsewhere—exhibited never-before-seen improvements that are likely to influence the frontier of Machine Learning systems (and, given the prize packages totalling over $85,000, we hope these teams will have the resources to continue their groundbreaking work long into the future!).

In light of the strong participation rate for both contests—and the exceptional quality of top submissions—we believe that similar competitions have the potential to play a key role in accelerating progress within the computer systems community. In particular, given the multidisciplinary nature of ASPLOS, we expect that other topics might be especially well-suited to this format. Of course, the success of this track ultimately depends on participation from industry. We strongly encourage professionals from other companies to consider contributing new problems in the future. 

Many thanks to Emily Webber, the co-organizer of the Amazon Web Services competition, to Aninda Manocha and Ziyang Xu for their invaluable assistance in organizing the technical aspects of the contest, and to Kamran Khan for help in sponsoring the contest from Amazon. We also thank Pratik Fegade, Rainier Aliment, Samrat Ghosh, and Zongwei Zhou for their help in coordinating and sponsoring the contest from Google.  Finally, we owe a debt of gratitude to members of the ASPLOS 2025 Organizing Committee (particularly Lieven Eeckhout and Chris Rossbach) for allowing us to affiliate our contest track with the conference.

About the Authors:

Emery Berger is a Professor in the College of Information and Computer Sciences at the University of Massachusetts Amherst and an Amazon Scholar at Annapurna Labs / Amazon Web Services. He served as co-program chair for ASPLOS ’21.  Michael D. Moffitt is a Member of Technical Staff at Google, where he has worked for over ten years.

Disclaimer: These posts are written by individual contributors to share their thoughts on the Computer Architecture Today blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGARCH or its parent organization, ACM.