In this post, I argue that software and data interoperability, a hallmark of the modern computer ecosystem, is at the core of several widespread security problems. An alternate approach is to tie data and potentially software to specific physical instances of a digital system. While such an approach would eliminate the ease with which we can share software and data, it could eliminate incidents where data is lost to attackers or where systems are attacked by malware.
We are used to instances of software running on different computer systems as long as some basic requirements are met (e.g., same instruction set architecture, same operating system, etc.). Similarly, we expect that most computer systems can read (and write) data that is stored in some memory. This interoperability has been a driver of the widespread adoption of information technology platforms.
Looking at current hardware and software security research, however, we are seeing a number of technologies that are being developed that limit this type of broad interoperability. The question at the core of this blog post is what happens if we continue these trends and get to the point where both data and software are tied to individual digital devices? While we do give up on the convenience aspects of interoperability, I argue that we can gain significant benefits with respect to system security and data security.
Individualizing digital systems
Semiconductor chips are typically mass-produced, and each instance of a chip is basically a clone with the same design. As a result, all chips with the same design have the same functionality, can run the same software, and can access the same data. But for many practical reasons, including security, numerous techniques have been developed to differentiate between different instances of chips. For example, serial numbers can be used for tie-breaking protocols in homogeneous distributed environments; trusted platform modules can store cryptographic keys that are unique to an instance of a chip; and, more recently, physically-unclonable functions (PUFs) have used variations in physical characteristics of an instance to generate unique, unforgeable identifiers.
With the ability to attribute uniqueness to different instances of the same chip, it is then possible to represent data and potentially software in such a way that only that specific instance can process them. For example, data can be encrypted in such a way that only a specific target system can decrypt them correctly.
Today, data encryption is commonly used for communication, but processors typically store and process data in cleartext in local memory. While some operating system protections are in place to isolate data, data loss can occur when malware is able to circumvent protection mechanisms and access data.
A potential next step that takes individualization of digital systems to the next step is to represent all data and software in some encrypted form that is unique to the underlying hardware. Based on device characteristics, such as PUFs, data is processed correctly as long as the data and software representations match the specific device.
While the idea of representing data and software in an encrypted fashion unique to every instance to a device might seem extreme, similar approaches are already being deployed in other contexts. In cloud computing, for example, encryption techniques are being explored to enable both encrypted storage and processing of data on a cloud platform. The cloud system can perform processing steps (e.g., search or even computation in case of fully homomorphic encryption) without ever having cleartext access to the stored data. Thus, it is not unrealistic to expect that similar features might become available on digital devices to uniquely represent data and software.
The core challenge to prevent malware spread or data loss is to develop an ability to “contain” data and software. In the current ecosystem of interoperability anyone (or any device) can create software that can be run on another device or read data that has been stored and leaked from another device. Thus, there is no fundamental technical barrier for malware to spread or data loss to occur.
In the approach where data and software are unique to a specific digital device, an explicit authorization is necessary (by giving access to secret information unique to the device) to create software that can be executed or represent data that can be accessed. Without such a step, software does not work on a device and data read from the device cannot be interpreted.
This idea of limiting the functionality of a system to only explicitly allowed actions (i.e., “off-by-default”) to enhance information security has also been applied to other systems. For example, the Internet is designed to allow any system to communicate with any other system in the networks. However, in practice, this ability is being constrained by firewalls that only forward traffic that matches local policies.
What does a world of individualized software and data look like?
We already have ecosystems, where software is tightly controlled. For example, apps for certain hardware platforms are only available through a single storefront. In such an environment, it could be conceivable that for each download an individualized instance of the app is created for the target device. Similarly, data could be encrypted such that it only works on a single target device.
An important question in this context is who controls what software can be instantiated for what device and what data can be processed where. While I do not have answers to this important question, the key observation is an individualized IT ecosystem enables mechanisms to make such decisions explicitly. Thus, policies can be put in place to describe what processing is allowed where and on which systems what data can be used.
To illustrate the power of such an environment, I briefly describe three scenarios:
- Malware protection: To execute a program, a compiled instance of the software needs to be created that matches the specific device on which it will be run. This can be done by either compiling the code from scratch to match the device or by adapting an existing, compiled version of the code. In both cases, device-specific cryptographic information is necessary. Thus, only entities that are authorized to use or manage the device are able to create an executable that would work on the device. If an attacker tried to inject malware into a system (e.g., through a malicious email attachment), they would have to guess (or brute-force) this device-specific information. Thus, an attack would not be successful.
- Control of private data: To access data, a version of the data needs to be created that can be read on a specific device. This step requires the owner or administrator of the data to explicitly grant permission to access the data. Once the data are customized for a given system, they can be read or written on this system without further verification. However, if the data were copied to another system, interpretation of those data would not be possible since the representation does not match only the initial system. Thus, a user could ensure that personal information would be limited to a set of known systems (unless the communication is authorized, see below).
- Communication: To move data between systems, their representation would need to be adapted to the receiving side. This could be done by the sender if both end-systems trust each other or by a trusted third party if the systems do not trust each other. In either case, a policy check would need to be performed to verify that transfer of data from one system to another is permissible. With such an approach, the original owner of the data could control distribution.
Clearly, such an ecosystem is very different from what we are used to. No longer could we easily distribute software and data without an adaptation or verification step. In return, we would, however, address difficult security problems with which we are struggling today. Is this tradeoff worthwhile to achieve more secure systems and protection of data?
About the Author: Tilman Wolf is Senior Associate Dean and Professor of Electrical and Computer Engineering at the University of Massachusetts Amherst.
Disclaimer: These posts are written by individual contributors to share their thoughts on the Computer Architecture Today blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGARCH or its parent organization, ACM.