I am grateful to everyone who responded to my post in April that asked why machine learning has had limited impact on computer systems research. In addition to commenting on this website and sending me email, people discussed this question at length on reddit and ycombinator. This post summarizes and responds to some of the points that were raised.
Many people disagreed with the premise of my question and argued that computer systems research has already been fundamentally transformed by machine learning.
The most frequently cited example was auto-tuning, which is a technique for finding optimal values for program parameters such as loop-unroll factors and tile sizes that minimize running time or energy. Instead of using analytical models to estimate running time or energy for given parameter values, auto-tuning generates a program with those parameter values and executes it to find the running time or energy requirements. Optimal parameter values are found by heuristic search in the parameter space, using techniques like simulating annealing and genetic algorithms.
Auto-tuning was introduced by Jim Demmel and Jack Dongarra in their work on dense linear algebra library generators like ATLAS. Saman Amarasinghe, Mary Hall, and Mike O’Boyle among others have been instrumental in making auto-tuning part of the arsenal of techniques we have today for implementing general-purpose programming languages.
I take a backseat to no one in my appreciation of auto-tuning but even in these days of grade inflation, the bar for calling something “transformative” must be set high. My personal opinion is that auto-tuning and heuristic search methods do not meet that bar.
Several people mentioned that machine learning has been deployed in data-centers to reduce cooling costs by 15-20%. I did not know this, but knowing this leaves me neither shaken nor stirred. Improving the fuel efficiency of cars by 15-20% is interesting. Autonomous cars in urban environments are mind-boggling. There’s a difference.
A few people took exception to my assertion that regression can be useful for building models of the running time and energy behavior of programs. One person was kind enough to send me a link to the Wikipedia page on Rice’s Theorem. In return, I sent him this link to an observation by Bertrand Russell that has always delighted me: “All exact science is dominated by the idea of approximation.”
The most challenging question, asked by several people in different ways, was “If machine learning is the solution, what is the problem?” In other words, what transformative impact do I believe machine learning can have on the way we build or program computer systems?
Instead of taking the easy way out and writing something like “We’ll know it when we see it” or “If I knew that, I would be doing it myself instead of writing blog posts,” let me suggest a way in which machine learning might be used to transform how computers are programmed.
Over the course of my career, I have taught introductory programming to thousands of undergraduates, and my blackboard and my staff have instructed them as they walked fearfully through the shadow of the valley of class hierarchies, dynamic storage allocation, null pointer exceptions, and JIT compilation. Yet most of them were not CS majors – for whom programs are artifacts worthy of study in their own right – but ECE, ORIE, or Physics majors for whom programs are just a means to an end such as data analysis or digital signal processing.
Can we can build systems that will let non-CS majors use computers without having to wallow in the mire of current programming languages and environments? What I have in mind is not yet another programming language but a system that will let you specify what you want computed, leaving it to the system to figure out how this might be accomplished.
The key problem of course is how to specify what you want computed. Attempts over the decades to use formal specifications like Hoare logic have not been successful. For many years, students in the introductory programming course at Cornell were taught to write Hoare logic specifications before writing programs. They ended up of course learning neither logic nor programming (I on the other hand enjoyed that material immensely as a junior faculty member there!).
Is it possible to write informal specifications that are nevertheless precise enough to permit programs to be generated from them?
Actually we write such specifications all the time. When we give students a programming assignment in a CS 1 or CS 2 class, we give them an informal, natural language specification of the program we want them to write. If the students have the right prerequisites, they understand what we want them to do. If something is not clear, they ask for clarifications. In the end, most of them turn in programs that implement the informal specifications we give them; they may have programming errors but they do not turn in a program to compute Mersenne primes if they are asked to write a sorting program.
Can we build a programming system that is at least as smart as a CS freshman? The “Turing test” for such system would be a demonstration that it can read the assignments in an introductory programming course, write the required programs, and get a grade of B or better. Can we build systems that get better at programming when they are given feedback on their “course work” like we give our students? Can we build systems that read pseudocode from textbooks and papers, and produce programs?
Such systems would be milestones towards the ultimate goal of a system that can write its own programs for more complex problems and run them on complicated architectures, meeting requirements like efficiency and fault-tolerance.
Keshav Pingali (firstname.lastname@example.org) is the W.A.”Tex” Moncrief Chair of Grid and Distributed Computing in the Department of Computer Science and the Institute for Computational Engineering and Science (ICES) at the University of Texas at Austin.