Today at MSR, there was a plenary session on the use of a double-blind reviewing (DBR), since this is the first year that this policy is adopted at the conference. In DBR, authors of submitted papers do not know who reviews their papers and the reviewers also do not know who are the authors of the papers they review, nor their affiliations. Other venues, notably in the area of programming languages (e.g., PLDI and POPL) have been adopting DBR for a few years now. The main rationale for DBR is that it reduces bias in paper reviewing. Although paper reviewing should be conducted only on technical and formal grounds, humans often inadvertently let emotions get in the way.

Even though this aims to be a summary of a plenary session, I do add some opinions of my own, which were not part of said session. Thus, in the interest of transparency, I make it clear that I am in favor of DBR or even Triple-Blind Review (where the reviewers of the same paper also appear as anonymous to each other) for reasons that were discussed (not by me) during the session.

One common argument against DBR, one I’ve heard repeated many times throughout the years, is that a reviewer who wants to discover the author of a paper can easily do so. Thus, the extra effort that DBR requires (more on this later) is not justified. However, An interesting experiment conducted by Michael Hicks, chair of SIGPLAN (as of May 2017) and PC of multiple previous conferences, suggests that this is not quite true. At the 29th IEEE Computer Security Foundations Symposium, after the reviewing process was through, Hicks asked the reviewers to guess who were the authors of the papers they had just reviewed. The experiment is documented in a blog post. He found out that 2/3 of the reviewers were unable to guess who the authors of the paper they reviewed were and, among the remaining 1/3, 1 out of 5 guessed wrong. Thus, only 26% of the reviewers were able to correctly guess the authors. Emery Berger conducted a similar process for PLDI 2016 (summarized in a comment of the aforementioned blog post), with similar findings. Moreover, Both Hicks and Berger argue that the amount of extra work involved in making DBR work was smaller than expected. Berger goes so far as to state that “Having run this process, I personally find the notion that double-blind reviewing adds significant (or even modest) burden to be without foundation.”.

All of this brings us to the plenary session discussion. According to Abram Hindle and Lin Tan, PC chairs of MSR 2017, the main tension involved in bringing DBR to MSR is one of Fairness vs. Open Science. On the one hand, DBR is believed to promote fairness since it can reduce the influence of prejudices, misunderstandings, human emotions, and gaps in knowledge on the paper reviewing process. On the other hand, it is harder to share artifacts relevant to a scientific paper in a scenario where anonymity is crucial. This is serious for MSR because the conference emphasizes the importance of open science and of artifact sharing. The importance of artifact sharing for the MSR community, even as early as during the reviewing process, places an additional burden on the PC chairs.

In order to assert that a paper really did not include any information about the authors, Tan and Hindle adopted the following (potentially automatable) process. First, they manually checked the paper header to identify author names. Then, they transformed each PDF file into a text file. Finally, they ran grep searching for terms such as “our”, “our prior”, “our work”, “github”, “funding”, etc. According to Hindle, the “our” heuristic was able to identify multiple cases of self-reference. In addition to this process, they sent reminders to authors who had submitted abstracts about the DBR process. They also did not allow PC members to know which papers had been submitted, besides the ones that were assigned to them. Moreover, if anonymity seemed to be compromised because a PC member told the chairs that she knew who the authors of that paper were, it was assigned to new reviewers.

After Tan and Hindle discussed their experience, the audience was allowed to make suggestions and comments. One interesting suggestion was that the chairs make available a “linting tool” (or maybe just a shell script) so that authors can themselves verify whether their papers meet the minimum anonymity standards (no reference to “our”, Github pages, etc.). This is similar to what the IEEE Computer Society does for authors of accepted conference papers, but aiming to check adherence to formatting standards, instead of anonymity ones.

Much of the discussion in the plenary session revolved around the use not only of DBR, but of Triple-Blind Reviewing (TBR). As I said before, in TBR a reviewer does not know who the other reviewers of the same paper are. It was argued that this may help discussions because junior researchers can face up against senior researchers that may be involved in promotion processes or who can impose their views based on seniority. A member of the audience argued that he did not support TBR. In his experience in the TBR process for ICSE 2016, TBR hindered the discussion process because he had to explain very basic concepts of his area in order to convince the other reviewers that he knew what he was talking about. Another member of the audience replied that this is actually a positive aspect of TBR because junior professors, women, and minorities in general have to do that all the time; the use of TBR levels the playing field in paper reviewing discussions. Someone else argued that triple blind reviewing removes the incentive write good reviews; one can write a poor, lazy review and leave with one’s reputation unscathed. Yet another audience member replied that by opening the names of the reviewers at the end of the process, this problem can be mitigated, since everyone will then know who wrote the poor reviews, but the discussion process will still have been fairer.

At the end of the session, another well-known issue of DBR was raised: That there are no conclusive results on how effective DBR is in making the reviewing process fairer. To this Tan answered that it is very difficult to evaluate the fairness of DBR in a way that is rigorous and reproducible. I’d add that making such evaluation in a way that is still realistic is another obstacle. Tan also added that, based on the post-reviewing survey conducted with both reviewers and authors, there seems to be a general belief that DBR succeeded in making the reviewing process fairer and that she believes that this is by itself a good thing. I wholeheartedly agree with that viewpoint.