Here are some key issues with our review process and some suggestions for the PC Chairs and PC members.
- PC chair’s message to the PC members before review starts
This message often says “there is no set acceptance rate, we are looking for good papers”. But most of our conferences rarely stray from their historical acceptance rates. And, more importantly, only around 80 papers can be discussed in 1.5 days, including some 15 “pre-accepts”, and almost all of the rest are automatic rejects (fewer than 5 papers get rescued). This 80-paper limit pretty much bounds the acceptance rate and is known even before the papers are submitted! Still, the chair’s message gives no guidance to the PC members, resulting in wide variability in PC members’ personal acceptance rates. The PC chairs should remind the reviewers of the expected range of personal acceptance rates based on the binomial distribution assuming the historical acceptance rate (see “Overwhelming statistical evidence that our review process is broken”).
- PC Chairs often do not provide any uniform standards to the PC members
A big problem with our process is the lack of uniformity (across the papers submitted to a conference) and monotonicity (as a paper improves from one submission to the next, so should the review scores). Our PC chairs assign reviews without establishing some uniform standards. Instead, PC chairs should instruct the PC, such as:
- (a) Novelty: No matter how much you dislike a paper, if you cannot produce a reference covering a claimed contribution then that contribution is novel. Similarly, no matter how much you like a paper, if there is a reference covering every claimed contribution, then the paper is not novel (the references have to pass the scrutiny of the reviewers). Novelty has become so subjective that this central metric has lost all its meaning. By giving a clear definition, PC chairs can restore this key metric’s importance.
- (b) (Weak accept: Historically, a paper with an average score of a weak accept would almost certainly get discussed and most likely get accepted. Currently, there is no such clear definition and yet in the final overall ranking of the papers, just one weak accept versus a weak reject often makes the difference between the paper being discussed or being rejected without discussion. Such a definition may also help reduce unreasonable “rejects” for acceptable papers with minor flaws.
- (c) Intellectual conflict of interest: If a PC member reviews a paper that shows his/her work to be poor (the paper is not just shown to be better, the previous work is shown to be poor), then the PC member should be required to mark such an intellectual conflict in his/her review (not shown to the authors) and some co-reviewer should (anonymously) approve of the review. Our process disallows recent co-authors from reviewing each other’s papers but does not consider this obvious conflict of interest. Really, our process should, for example, allow authors to point to a result graph that shows a previous work to be poor and therefore to include the previous work’s authors in the conflict list (under the chair’s scrutiny). Otherwise, our process creates a monopoly where conflicted “gatekeepers” can forever protect their potentially flawed work. Yes, this will reduce the number of eligible reviewers but does that mean we allow conflicted reviews without even requiring the conflict to be declared? Such a conflict also holds for a PC member who has a closely-related in-flight submission (to the same or a different conference).
Every PC chair may not agree with my suggestions, so they should make their own.
- PC members should agree on the accept/reject reasons and not just on the accept/reject score
Our process insists on consensus among the reviewers on the accept/reject decision but not on the underlying reasons. Currently, reviews can give vague reasons, or no reasons at all, to reject a paper. Requiring consensus on the reasons would not only help weed out flawed reviews (both positive and negative) but also help the authors in improving the work. Reviews should provide a list of key reasons and reviewers should not be allowed to see each other’s reviews before the rebuttal (to ensure review independence). In the rebuttal, the authors can flag the lack of consensus (or at least a high majority) on the reasons to reject. If flagged, the reviewers should try once more (without seeing each other’s reviews) in the first week after the rebuttal after which the authors can check if the reasons have converged (see #6 below). If not, then the PC chair should be involved in moving to a sane conclusion. Hopefully, only a handful of papers would need the chair’s attention.
- Reviews should rely more on facts and less on opinion
Reviews should be based on facts. Of course, opinions (“I feel this won’t work” “this won’t be enough”, “I am not convinced”) cannot be eliminated completely, but should be used sparingly and consciously. If a review cannot reject a paper based on facts then the paper should be accepted (and vice versa).
- Full reviews, including all the scores, should be visible to authors at rebuttal
Some PC chairs hide the numerical scores from the authors during rebuttal claiming that the authors would rebut the scores and not the reviews. So what? If the authors rebut the scores without providing technical reasons then the rebuttal won’t work anyway. More importantly, without seeing the scores, the authors cannot catch reviews whose text and scores contradict each other, or two-line reviews with weak rejects.
- Reviews should be held accountable
Let’s use the rebuttal to empower the authors against poor reviews.
(a) Rebuttal has become useless: reviewers can ignore or dismiss the rebuttal or simply claim “not convinced”. Authors should be allowed to use the rebuttal to flag poor reviews and hold reviewers accountable. If a review is flagged, then the most positive reviewer for the paper should (anonymously) examine the review to decide whether the flag is valid and if so mark the review so it receives a lower (or zero) weightage in the overall score calculation. To discourage authors from flagging negative but legitimate reviews, any invalid flag could lead to immediate reject even before the PC meeting. PC chairs should pay attention to PC members who accrue many valid flags across his/her reviews (the flag counts should be visible to the whole PC).
(b) PC members should update the scores within a week after the rebuttal. After this week, authors should be able to see the updated scores and flag a review that does not update the score even though the rebuttal completely blocked the review. This flag should be examined as above.
To enable such anonymous examination, and even free and open post-rebuttal discussions in general, the reviewers should not see each other’s names until the PC meeting.
- Authors should be able to rebut all reviews, including late, post-rebuttal reviews
The PC chair get late, post-rebuttal reviews often for papers with low-expertise reviews. I have never seen a late review give an accept. These reviews are hurried and therefore, have a higher chance of making mistakes. Yet, such a review is often the decisive vote on the paper (being a high-expertise review). But the authors don’t get to rebut such reviews. This is unfair. I am sure the authors would rebut even if it is from 2 am to 6 am of the day of the PC meeting. Often, the chair asks for the extra review a week before the PC meeting. At that point, the chair can alert the authors to plan for an extra rebuttal.
- PC-wide vote at the PC meeting
When there is no consensus among the PC reviewers for a paper, many PC chairs use a PC-wide vote. This is unfair. Three PC members have read the paper, discussed it for a month and still can’t reach a consensus, and yet the rest of the PC, which has not read the paper and has only heard a 5-minute summary of the paper decide in under 20 minutes. Basically, vocal PC members dominate the discussion and if he/she is negative then the paper gets rejected. The PC-wide vote should be banned (e.g., PLDI does not use PC-wide votes) and the majority vote among the people who have read the paper should be used.
The only way we are going to eliminate arbitrary, non-uniform, non-monotonic reviews is when authors can and co-reviewers do call out poor reviews (positive or negative). Let’s make 2018 the year when poor reviews vanish. Happy New Year!
About the Author: T. N. Vijaykumar is a Professor of Electrical and Computer Engineering at Purdue University and works on computer architecture.
Disclaimer: These posts are written by individual contributors to share their thoughts on the Computer Architecture Today blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGARCH or its parent organization, ACM.