Editor’s note: With continuing proliferation of LLMs and their capabilities, academic community started to discuss their potential role in paper reviewing process. Some conferences are already piloting the assistance of LLMs in their reviewing this year. To bring this discussion to the attention of our community, “Computer Architecture Today” is publishing two related blog posts. The post below is about the general role of LLMs in academic reviewing. The second post (to follow shortly) describes a rather extreme concrete proposal for our community. Enjoy reading and discuss!
The past few years have witnessed a staggering acceleration in the capabilities of large language models (LLMs). What began as an intriguing toy for autocomplete has evolved into sophisticated tools capable of summarizing research papers, drafting technical arguments, and even simulating expert‑level discussion. As these models continue to improve at an astonishing rate, the question is no longer if they will change academic workflows — but how.
Academic reviewers are fatigued and overworked. Across many research communities, the state of academic reviewing is under visible strain. Reviewers are oversubscribed, overworked, and often underappreciated. Program committee (PC) members juggle many papers in a short time frame, and the depth of engagement has become increasingly shallow. The consequences are easy to observe: rushed reviews, inconsistent decisions, and a growing sense of dissatisfaction among authors and reviewers alike. While there are exceptions—venues where quality is preserved through tighter scope or community norms—the overall trend is worrying.
LLMs are already here. LLMs are cheap, powerful, and remarkably easy to use. Whether it’s ChatGPT, Claude, or other domain‑specialized models, they are already being used — quietly — in the academic review pipeline. Some reviewers use them to rewrite clumsy sentences from submitted papers. Others ask them to summarize a paper or critique its weaknesses. These uses, while technically prohibited in many venues, are happening nonetheless. The technology is simply too accessible and too helpful to ignore.
Do LLMs hallucinate? Yes, they do. They should only serve as aids for human authors and reviewers, with the final decisions being made by humans. However, there are architectures (such as using multiple agents that check each other’s work) that help reduce errors in LLM output – for example, this approach is being used in Dr. Copilot to successfully help doctors in Romania. If LLMs can be used in safety-critical domains such as medicine, I believe they can be used in academia as well, especially as aids for human authors and reviewers.
Writing with LLMs
One common concern about using LLMs to write academic papers is that it dilutes the author’s intellectual contribution. If an AI generates entire paragraphs — or even entire papers — what part of the output belongs to the human author? This is a legitimate question. But it also misses the point of how most researchers actually use these tools today.
For thoughtful researchers, LLMs aren’t a shortcut around original thinking; they are a powerful tool for expressing that thinking more clearly. Professor Alex Kontorovich, for instance, described how he uses Claude to help formalize mathematical arguments and clarify intuition in early drafts. Frameworks like TheoremLlama can generate aligned formal proofs in Lean from natural language drafts — demonstrating real gains in expression and rigor.
These are not tools replacing human insight; they are refining it. If we reward clear communication, we should support tools that help authors communicate clearly, while preserving the core originality and insight of the research.
Reviewing with LLMs
The idea of reviewers using LLMs raises legitimate concerns. Chief among them is the fear that reviewers will rely on AI to write reviews, rather than reading the paper carefully. That undoubtedly can be problematic. But the alternative often produces worse outcomes: shallow or disinterested reviews that authors and reviewers alike regret.
A ban on LLMs doesn’t guarantee thoughtful reviewing. Many poor reviews today are uninformed, inconsistent, or disengaged—not because tools are misused, but because reviewers are overburdened. Frustrated reviewers who rely on instinct alone may produce even less useful feedback.
Of course, there are real concerns to address:
- Confidentiality: Uploading submissions to public LLM APIs risks exposing unpublished content and violating author privacy.
- Accountability: Without disclosure, the line between a reviewer’s judgment and AI-generated critique becomes murky.
Again, though, the solution lies in regulated, transparent use – not a blanket ban. We can legalize and guide LLM use in reviewing by:
- Requiring the use of private, publisher-hosted LLMs
- Building workflows that prevent external API exposure
- Including reviewer disclosure mechanisms (e.g., “LLM used to check technical consistency”, similar to how reviewers can currently mention if they did a co-review with a student)
- Embedding auditable prompts (e.g., “spot-check equation consistency”, “summarize methodology errors”)
This approach allows us to preserve the agency for reviewers while mitigating privacy and integrity concerns. We should be careful to manage the burden on reviewers — if the burden is too high, reviewers will not use it.
Separating Subjective from Objective Reviewing
The most compelling case for integrating LLMs into review workflows is restoring a goal the community has partially abandoned: verifying that papers are correct.
Currently, peer review is mostly a sanity check: does the paper have no blatant errors? Are the claims plausible? Reviewers rarely have the capacity to evaluate proofs or replicate experiments in depth.
But with AI assistance, we can divide reviewing into two parts:
- Objective review: Does the reasoning hold? Are the proofs correct? Are claims consistent with artifacts?
- Subjective review: Is the contribution novel? Is the idea interesting? Does it advance the field?
Humans are best suited for subjective judgment. LLMs, on the other hand, could help automate objective checks—code-consistency comparisons, logical coherence, or reproducibility signals. LLMs would be able to find mistakes that human reviewers might not, but as of yet they cannot offer strong guarantees about correctness (I believe this will change in the future). Thus, they will serve as an aid in checking correctness, drastically reducing the effort required by human reviewers.
This dual-track approach could greatly raise review quality by aligning effort where human insight is most necessary—and offloading mechanical checking where AI excels.
Legalization Brings Structure
When we acknowledge that LLMs are here to stay, we can put frameworks in place to guide their use:
- Approved models only: restricting to private or vetted LLMs
- Transparent interfaces: ensuring no submission text is exposed to public APIs
- Prompt logging and audit trails: recording objective-review prompts and LLM responses; thus, the LLM queries and responses can become part of the discussion
- Training: teaching reviewers and authors to use LLMs ethically and effectively
- LLM review stage: optionally embedding an automated check phase focused on reproducibility, consistency, or formal correctness, before humans weigh in
These structures allow AI to support human review rather than substitute it—much as compilers support programmers, without writing code themselves.
How can we use LLMs today in the review cycle?
Here are three ways to use LLMs today:
- Artifact Evaluation: Currently, students and postdocs volunteer their time to evaluate artifacts. I see no reason why this cannot be done with the help of AI agents, with significantly fewer human volunteers. This approach loses nothing and only increases efficiency for the entire community. The systems community is currently exploring this approach for some conferences.
- Aiding reviewers in understanding papers: Publisher-hosted private LLMs should be used to create summaries of submitted papers. These summaries should be shared with reviewers (if they want it). The private LLM should also be able to answer questions about the paper (“Do the authors run experiment X?”) without requiring reviewers to comb through the paper.
- Ensuring review quality: Today, PCs employ volunteers to comb through reviews and check for hostile reviews or reviews that don’t meet the venue standards (too short, too vague, etc). Publisher-hosted private LLMs can be used to identify such reviews, and notify reviewers by email.
Thinking ahead, as AI agents become better, I believe a lot of the work of “herding” reviewers (ensuring reviews come in on time) can be done by AI agents. Imagine an AI helper for PC chairs that automatically emails reviewers when reviewers are late, or urges reviewers to come to a conclusion in online discussions.
A Better Default
Banning LLMs by default doesn’t make us safer—or smarter. It makes us slower, more secretive, and increasingly disconnected from the tools shaping our communication workflows.
LLMs are here to stay. Let us figure out how to use them well.
Thanks to Venkat Arun, Shravan Narayan, Dixin Tang, Emmett Witchel, Daehyeok Kim, and Tianyin Xu for giving feedback on drafts of this article.
About the Author
Vijay Chidambaram is a professor at the University of Texas at Austin. His group, the UT Systems and Storage Lab, works on all things related to data and storage. He is the author of the free online book, “The CS Assistant Professor Handbook“.
Disclaimer: These posts are written by individual contributors to share their thoughts on the Computer Architecture Today blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGARCH or its parent organization, ACM.