Agentic Security: Lessons from Computer Architecture

What does speculative execution in a processor — and the predictor that drives it, such as a branch predictor — have to do with AI agents? They may seem very different, yet, at a high level of abstraction there are similarities.

Both speculate: A processor predicts which way a branch will go and begins executing instructions along the predicted path before the branch has resolved. An AI agent infers a user’s intent, reads/writes files, executes programs, makes network calls etc., before it knows whether its interpretation of the user’s intent is right.

Because prediction can fail, both systems require roll back mechanisms. In a processor, once a misprediction is detected, the wrong path work disappears (from the programmer’s point of view). When the agent is told it is wrong, or figures that out itself, it may revise its plan and redo the work after rolling back to a good checkpoint.

Both systems litter and leave residues: While a processor can recover from a misprediction without any programmer visible effects, under the covers, microarchitecturally, wrong path execution perturbs on chip structures like caches. The Spectre attack (2018) showed that this residue can be observed through covert channels. AI agents have a similar problem. When an agent, or its human user, notices a mistake and corrects it, the failed attempt can leave at least two types of residues: a) residue that is easily observable like bad outputs, stale files or processes, or b) harder to know/track/undo residue like timing and volume of network requests, model context summaries shipped to third party servers to name a few.

Also both systems can be tricked and steered through adversarial inputs: In Spectre, the attacker influences the on chip predictor state by executing a pattern, then supplies an adversarial input that causes the victim to transiently execute along the trained path that it should not take architecturally. While that transient execution is later squashed the microarchitectural residue of the execution can still be measured. Malicious prompts can play a similar role in AI agents: they can steer the system toward actions that are later corrected or denied, but in the process may leave litter data that attackers can use.

Given these similarities, we can ask two questions.

1) Can AI agents completely eliminate easily observable “architectural” residues on mispredictions?

2) What are the dangers/risks of hidden “microarchitectural” residue left behind by AI agents?

Regarding architectural residue, processors can hide speculative wrong path work cleanly because the ISA defines what counts as visible committed state. Currently there is no equivalent for AI agents: the absence of an interface that can precisely define operations, state, life time of state, and triggers for misprediction recovery, makes these systems hard to reason about and a fertile ground for leakage.

While observable residue is a serious problem it is also a solvable problem to some degree: if one is satisfied with imprecise, best effort work, one simple thing to do is to just prompt the agent to clean up after itself. A really smart agent, in theory, should be able to use mechanisms like transactions, two phase commit, distributed undo protocols, disposable containers and VMs, sandboxes, access controls, versioning and information flow tracking to minimize overt residues. However, if we wanted to do better than prompting we probably will need an ISA-like layer.

The second, and harder, question is about what happens to hidden/microarchitectural residues. In general, clean up of this type of residue is hard because it is often left in places no one thinks to inspect, or in places users cannot practically inspect because the those parts are proprietary or distributed across organizational boundaries. Also with AI agents, the problem is broader in scope than in a processor because it spans a larger number of tech layers from model context to hardware, local and remote. Further, an agent’s speculation window may last seconds or minutes, compared with nanoseconds in a processor. That longer window creates more opportunity for residue to diffuse. It is highly unlikely that we can simply prompt the agent to clean up hidden/microarchitectural residue because, by definition, there isn’t an architectural interface to observe or control microarchitectural state/work.

How likely are we to solve agentic littering? Who needs this problem solved? And, who should solve this problem?

In addition to technical aspects, economics and incentives often determine whether solutions are adopted. Here too we can look at processor misprediction recovery and compare them to AI agents.

Overt architectural and hidden microarchitectural residues have different economics and incentives at play.

Overt residues are easier to price. If an agent leaves behind a directory full of junk, consumes too many resources, or corrupts a file, that failure is visible to users. Users will complain, and because there are complaints, product teams can justify spending resources to fix them.

Hidden residues are harder. These residues may not produce an obvious effect like a crash. They may also require complex conditions to manifest. That makes it harder to attribute with accuracy and consequently easier to dismiss. It also makes it harder for users to demand fixes, because users often cannot see the thing they are supposed to complain about.

Spectre, an issue due to adversarial steering and microarchitectural residue, was disclosed roughly eight years ago, and the broader class of this leakage has still not been completely fixed. This is not because principled technical solutions do not exist. It is because these solutions increase design complexity, impact performance, change the hardware and software interface in ways that is not easy to adopt, or require coordination across vendors and different layers of the computing stack all of which add recurring or non-recurring costs. Also, each layer can plausibly say that the residue cleanup should be handled by someone else. Vendors can also say that there are have not seen large scale attacks and that they do not have to protect against these attacks given the risk profile.

The same pattern may emerge for AI agents and handling hidden/microarchitectural residues.

Each AI agent boundary is also an economic boundary. Each layer can plausibly say that the residue is someone else’s problem. The model provider can say the deployment should isolate side effects. The deployment/orchestrator can say the runtime should enforce cleanup. The runtime can say the operating system should provide better isolation. The hardware vendor can say software should avoid sensitive colocation. The user experiences the combined risk of all these but usually has the least ability to inspect or repair it!

The real answer is that every party involved in agentic execution should fix its own leaks and share responsibility for security and privacy. But each party also has reason to argue that the cost is too high, especially when the economic benefits are difficult to measure and the harms are difficult to attribute.

So the likely outcome here is not hard to guess. Hidden microarchitectural residue handling is treated as an afterthought, and agents end up reflecting the incentives that shaped it, viz., agents get more capable, overt residue cleanup improves through ad hoc clean up attempts, and create a very long tail of hard to detect, microarchitectural residues that expands the attack surface.

The best chance for security is while these systems are being designed and deployed. AI-agent platforms designed now should at least treat residue management as first-class design requirement. That, however, means finding ways to incentivize designers to care about hidden, microarchitectural residue before users are harmed. If we treat microrchitectural residue management as an optional, “nice-to-have”, “less-important-than-overt” security feature, we will spend the next decade patching a massive, distributed attack surface.

About the Author: Simha Sethumadhavan is a Professor in the CS department at Columbia University. He would like to thank Profs. Roxana Geambasu. Martha Kim and Tanvir Ahmed Khan for thought provoking comments and feedback.

Disclaimer: These posts are written by individual contributors to share their thoughts on the Computer Architecture Today blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGARCH or its parent organization, ACM.

Computer Architecture Today

Agentic Security: Lessons from Computer Architecture

Contribute

Recent Blog Posts

Archives

Subscribe

Join Us

Computer Architecture Today

Agentic Security: Lessons from Computer Architecture

Share this:

Contribute

Recent Blog Posts

Archives

Tags

Subscribe

Join Us