The smartphone is the most pervasive mobile computing device on the planet. There are over 2.1 billion devices worldwide, and this number is rising sharply as smartphone penetration increases in emerging markets like China and India. By 2020, there will be 6 billion smartphones globally with over 1 million new subscribers each day for the next six years. To put this number into perspective, Gartner research estimates that Google has 2.5 million servers. If we take all of the major cloud service providers into account—i.e., Google, Amazon, Microsoft and Facebook—a liberal estimate lands us at about 10 million servers worldwide, plus or minus a few. The smartphone-to-server ratio is 200:1 and widening, which begs the questions of how we got here and what lays ahead for mobile computer architecture.
The seismic shift toward pervasive mobile computing started with the arrival of the first Apple iPhone in June 2007. The 1st generation iPhone A4 processor boasted a 412 MHz single-core ARM processor that delivered an unparalleled user experience on a 3.5-inch fully touch-screen display. Since then, mobile application processors have evolved considerably to deliver desktop-like performance. The iPhone 7 A10 processor boasts a peak clock frequency of 2.34 GHz and uses a quad-core asymmetric architecture that integrates two high-performance and two energy-efficient cores together onto the same die. Other newer architecture designs such as the Helio X20 processor from MediaTek features the world’s first Tri-Cluster CPU with 10 processing cores (Deca-core) with varying degrees of performance.
Today’s mobile system-on-a-chip (SoC) processors boast several high-end features: high clock frequencies, aggressive microarchitectures, multicore designs, asymmetric architectures, heterogeneity and domain-specific acceleration. Arguably, mobile SoC processors are unmatched in their design and orchestration complexity.
Advancements in mobile processors have not come for free without challenges. Two problems have consistently plagued mobile devices: battery life and heat dissipation. There is No Moore’s Law for Batteries and mobile devices are passively cooled. While transistors have doubled every two years since 1975, battery density has doubled once every 10 years for more than two decades. With the end of Dennard Scaling, SoC heat dissipation has become a major issue since mobile devices have no active cooling support. So it should come as no surprise that we have all had experiences with mobile devices running hot. Humans can tolerate 45 degrees Celsius and beyond that we experience the onset of pain. Modern mobile devices often operate at this threshold, if not occasionally exceeding the limit. The iPad is known to reach 47 degrees Celsius when processing compute-intensive tasks such as playing games or downloading files.
Today’s smartphones have largely survived the limited battery life and heat dissipation issues thanks to the device’s limited and intermittent usage model. An average smartphone user session lasts less than five minutes with long periods of idleness between users’ touch screen events, allowing for power-saving techniques such as “run-to-idle” to be deployed effectively to achieve all-day battery life. But the decade-long era of passive touch-based mobile computing may be on the verge of a paradigm shift. With the strong push toward ML and AR/VR in smartphones, applications will become more active and demanding. Smartphone applications will transition from waiting on touch events to proactively process the content (e.g., a camera image or a website) rendered on the visual display to extract interesting and semantically rich information that can make the user experience more engaging. The transition from passive to active applications will likely exert new levels of processing pressure on the mobile SoC.
So, the questions that need attention are how computer architects can help design the next-generation of mobile SoC processors given the massive proliferation of these devices, the looming battery and heat dissipation challenges, and the emergence of new application types. To help answer the questions and guide the systematic development of future mobile SoCs, below are “Ten Commandments of Mobile Computer Architecture Research:”
- Thou shalt not use benchmarks in place of real-world interactive applications.
- Thou shalt not cherrypick applications by popularity.
- Thou shalt not ignore the Web browser.
- Thou shalt not presume microarchitectural improvements translate to user satisfaction.
- Thou shalt not drop a frame.
- Thou shalt not assume execution is deterministic across runs.
- Thou shalt not believe software behaves identically across all devices.
- Thou shalt not ignore energy and thermal consequences of innovations.
- Thou shalt not ignore the intellectual property (IP) blocks.
- Thou shalt not presume this list is complete.
Commandment #1: SPEC CPU benchmarks are not representative of real mobile applications, and neither are the mobile counterparts such as Geekbench and AnTuTu. These benchmarks are often used in research papers. The benchmarks exercise the CPU and GPU steadily for a long time. However, most mobile applications are user-driven, interactive and bursty. Therefore, the conclusions drawn from studying these long-running steady-state benchmarks can lead to suboptimal and weak bottleneck analysis. Architects may be misled to optimize for the wrong bottlenecks and make incorrect trade-offs. It is important to study real-world mobile applications, driven by user inputs.
Commandment #2: It is typical to download the top N applications from the App Store, where N is typically 10 in the academic literature, to conduct detailed (micro)architectural analysis studies. Application popularity may seem to be a reasonable metric for selecting applications, but using the top 10 applications to determine how to design future processors may result in misguided faith. The job of an architect is to design processors for the future, not the past. The mobile application ecosystem evolves rapidly. Over 60,000 new applications are released each month for the Apple iOS and Google Play stores. Therefore, it would be prudent for architects to study a broad set of applications and not let popularity be the driving factor that determines which applications are used for workload deep dives.
Commandment #3: The browser is the canonical application on a mobile device—akin to the Gcc compiler in SPEC CPU benchmark suite. More Web traffic flows through mobile devices than desktops. The browser is a complex application with a large memory and code footprint that relies on several threads/processes and it uses an asynchronous execution model that stresses virtually every aspect of a mobile SoC, including the application processor, GPU, video/audio decoders, networking, and communication IP blocks. Moreover, the browser, as a single application, can render any of the billion webpages on the Internet, and as such, it exhibits vastly different execution characteristics based on its input. Mobile SoC vendors consider the browser as the toughest application to optimize.
Commandment #4: “Computer Architecture: A Quantitative Approach” by David Patterson and John L. Hennessy has taught us well to have a strong and rigorous approach to quantitatively measure simulated or real hardware performance. But a lot has changed over the past 10 years since the arrival of the smartphone. The measure of performance in a mobile device is not how fast a processor can compute; rather, its true capability lies in its ability to deliver user-perceivable satisfaction improvements. Doubling the TLB from 32 to 64 entries may seem like the right trade-off to alleviate a performance bottleneck at the expense of power consumption. But if the improvement does not result in a measurable user-perceivable performance improvement (or satisfaction), then it is a wasteful trade-off.
Commandment #5: Most mobile users are fidgety. An average mobile user taps, types, swipes, or clicks her/his device 2,617 times a day. Except for video streaming applications, the majority of mobile applications are event-driven, where the processor is waiting for user events. So, what matters most to end users is touch responsiveness, i.e., the time it takes to render a frame to the display as a result of a user touch input. To ensure “buttery smooth” user interface (UI) performance, the system as a whole must maintain 60 frames per second (FPS) consistently (why 60fps?) without any dropped (or delayed) frames, commonly also referred to as “Jank.” Maintaining 60 FPS means all processing (computing, networking, and rendering) must take place within 16.67 ms. Dropped frames in UI applications can lead to poor user experience. But dropped frames in AR/VR applications can mean users feel nausea and discomfort after a few minutes. Therefore, it is important to focus on fine-grained metrics, such as application jank and tails in user experience, in addition to traditional application performance metrics like end-to-end application execution time.
Commandment #6: Run-to-run variation in workload performance is a severe problem in mobile systems, even when the application is run in isolation. There are multiple sources of variation in a mobile device: thermal throttling, dynamic recompilation (at least in Android), background killing of applications, nondeterministic networking and communication, etc. Therefore, it is prudent to study the effect of one’s optimizations across multiple scenarios, such as varying levels of background activity, IP block traffic congestion levels, etc., to ensure that the improvements persist, in particular on a real device. Ideally, one would rely on statistical cutoffs to build confidence.
Commandment #7: Some applications are written with user experience and hardware capabilities in mind. Applications may take different execution paths (silently) based on queried hardware feature. Some hardware can be restricted for sophisticated application features by popular applications because parts of the application may not be fast enough to provide responsive UX. Such trade-offs are made by product software teams that are correct for today’s hardware, but an architect exploring future designs needs to be wary of such de-optimizations.
Commandment #8: It is tempting to tout performance improvements for hardware solutions without fully considering the consequences. Measuring and quantifying the instantaneous power peaks and the resulting energy consumption of new ideas is necessary given that battery and heat dissipation are first-order constraints for mobile processors.
Commandment #9: Unlike CPUs and GPUs, which have been continuously optimized for decades and treated as indivisible architectural units, SoCs are, by definition, a modular collection of special-purpose compute, communication, and storage IP units. In a high-performance SoC, the majority of the die area is dominated by IP blocks. The Apple A8 SoC has close to 30 IP blocks, and the application processor occupies less than 20% of the die area. IP blocks are typically licensed from third parties and assembled together via standard interfaces (e.g., ARM AXI) to meet a targeted purpose. Modular SoC design provides computation within a single IP that is fast, power efficient, and space efficient. But anytime IPs need to communicate, their default behavior is to drain data to memory, interrupt, or poll, and wait for the CPU to hand off the data from one IP to another. There is an opportunity for researchers to understand how to build chips with the efficiency of an Intel CPU or a Nvidia GPU but the flexibility of a SoC.
Commandment #10: Mobile is a rapidly and continuously evolving domain. It would be presumptuous to assume that any one set of commandments can ever be complete. So, the tenth commandment reserves the exclusive right to state that the current list is incomplete and that the aforementioned commandments should be improved upon over time.
Many of the commandments will be hard to abide by in practice, either due to the lack of proper tools or established methodologies, and that is intended. These commandments hold mobile computer architecture research to high standards. The challenge for next-generation mobile computer architects is to develop and unleash the necessary vessels that will enable more systematic benchmarking and evaluation of mobile systems in place of today’s ad-hoc practices. If one reads the commandments closely enough, she/he shall see several opportunities for new research.
Acknowledgments: The author thanks Allan Knies (SoC Performance and Simulation Lead, Google), Manu Gulati (Lead SoC Architect, Google), Hongil Yoon (SoC Performance Architect, Google) and Xiaoyu Ma (SoC Performance Architect, Google) for their feedback on this article. The views expressed here are solely that of the author and do not necessarily represent or reflect the views of Google where the author is currently employed as a Visiting Researcher.
About the author: Vijay Janapa Reddi is a professor in the Department of Electrical and Computer Engineering department at The University of Texas at Austin. More information can be found at http://3nity.io/~vj.