Image Credit: Pexels CCO
Image Credit: Pexels CCO

Peer review is the gold standard for research quality assurance.

No, peer review is broken.

As discussed in a 2020 Research Integrity and Peer Review article by Jonathan P. Tennant and Tony Ross-Hellauer titled “The limitations to our understanding of peer review,” it’s relatively common to hear such generalizations in conversations about the peer review process. However, as Tennant and Ross-Hellauer argued, across academic disciplines, “the reality is that there remain major gaps in our theoretical and empirical understanding of [peer review].” So, how much do we actually know about peer review today, and how do we fill the remaining gaps in our understanding of the practice to define what a good peer review should look like?

EASE has released a new set of guidelines on “How to Assess Peer Review Quality” to help journals get closer to an answer. The guidelines, which factor in input from a public RFI, offer a high-level framework for evaluating the validity and reliability of peer review reports to help journals determine which aspects of their processes are working well and which may need work.

Prior to the closing of the RFI, I spoke with Dr. Mario Malički, the Chair of the EASE Peer Review Committee that is leading the initiative, to learn more about the aims and scope of the guidelines. Dr. Malički is the Associate Director of the Stanford Program of Research Rigor and Reproducibility and Co-Editor-in-Chief of Research Integrity and Peer Review. Check out our full conversation below!

Interview with Dr. Mario Malički

Before we dive into the draft guidelines, can you share what led you to focus your career on research integrity and peer review?

MM: I started learning about science in a proper sort of way after I finished medical school and was working at the University of Split School of Medicine, where I had begun my PhD. I was drawn to learning more about the ethical regulations side of research. At the time, I was working at two Departments, the Department of Medical Humanities and the Department of Research in Biomedicine and Health, where I was learning about research methodology, statistics research, research design, and all these things.

I was lucky because my mentors were editors of a Croatian medical journal. So, they had experience evaluating and publishing research. They were doing what was known as “editorial research” at the time, so everything related to publishing, like misconduct, authorship issues, and issues surrounding peer review. Today, this would often be called “meta-research” or “meta-science.” I thought it was a fascinating way to learn more about how science works, so my mentors introduced me to a European-wide project called PEERE. It was a COST project about peer review from the European Commission. So, I hadn’t even defined the topic of my PhD when I joined the group. And then I really just fell in love with research around everything related to peer review.

I’d been told since day one in science that all science relies on peer review, but evidence supporting that peer review was effective was almost nowhere to be found. So, I was curious about why that was and how we can improve peer review. That’s sort of how I got started. Through PEERE, I also got familiar with the EQUATOR Network, COPE, the ICMJE, and all of these kinds of organizations that handle publishing ethics, and eventually with EASE through Bahar Mehmani, Reviewer Experience Lead at Elsevier, who was leading the EASE Peer Review Committee at the time. We became good friends and collaborators. When Bahar stepped down from leading the EASE Peer Review Committee to become a member of the EASE Council, I was nominated to take over as the committee Chair, which is now working on the quality assessment guidelines initiative.

What was the impetus for creating the draft guidelines for assessing peer review quality?

MM: One thing we always need to be aware of is that international laws mandating scientific publishing rarely exist. So, there are around 70k academic journals today, but we don’t know the exact number that adhere to COPE guidelines. I believe it’s only a small percentage of the total, maybe a third of journals that claim to adhere to COPE. So, research ethics policies and processes are not streamlined across journals, publishers, or institutions. If we look at how misconduct and questionable research practices are defined and handled in Europe, the United States, and China, for example, it’s very different. We are working towards having sort of standardized international regulations, but there is a wide variation of practices currently.

However, one of the things that has happened since the 1950s and 1960s is that most journals started employing peer review as a standard quality control check. Today, there seems to be a need to emphasize that your study has been peer reviewed to show it is sound. But, we have seen with sufficient studies that peer review itself is not ideal. So, the release of the draft quality assessment guidelines from EASE is meant to help journals gain a better understanding of what’s working in their peer review and what isn’t.

How does the development of these guidelines compare to past EASE initiatives?

MM: Around ten years ago, EASE released its first set of guidelines, The Science Editors’ Handbook, which was similar to the ICMJE recommendations for manuscripts starting with authorship in the 1970s. The handbook was tailored toward EASE members to help editors run their journals. Since then, EASE has introduced many different guidelines. In the EASE Peer Review Committee, we have what we call the Peer Review Toolkit, which contains recommendations on how to select and invite peer reviewers, review manuscripts, train reviewers, and reward them. So we’ve been releasing recommendations for a while, and from 2021 to 2023, we actually revamped our peer review guides per the latest recommendations.

What is new about the peer review quality guidelines we’re working on now is that we’re involving the community in the development process. In the past, we didn’t put out an RFI or call for feedback like we’re doing right now. Instead, we’d develop guidelines internally and release them. But EASE has around 500 editors as members. We also closely collaborate with other editorial groups like the JEDI. There are a lot of editors out there who want to do a better job and have great ideas. So, we’re trying to get their feedback and have them get involved.

What are the main challenges to developing standards for peer review quality?

MM: Peer review quality is arguably the most challenging aspect of scholarship to assess. We don’t have a way to rate papers in scientific databases to know whether paper A is better than paper B because that’s an incredibly complicated question. In the same way, it’s difficult to answer when one peer review is better than another. For example, if a peer review has missed a significant problem in the paper but everything else in the review report was excellent, how much should that one omission reduce from the overall score?

As a scientific community, we have not agreed on whether we could even try to assign a score to a peer review. Some scholarly research exists on this topic, but none of the tools and questionnaires developed to give review reports a quality score have become standard practice because they are all problematic. It is difficult to apply a systematic classification for peer review quality that we could use to classify all the peer review reports with AI, for example.

Traditional peer review involves two to three reviewers for each manuscript and is sort of based on consensus, like a lot of science. If two reviewers say the statistics in a paper are correct, you’re more likely to believe it than if you have one that says they are good and one that says they aren’t. Then, there needs to be a discussion about who is right. If you involved five or ten people, you would be able to classify how many agreed on what is and isn’t correct and might get closer to an answer. But we have rarely done tests where we put something like a title in front of 10 experts and asked them if they agree it is good. And, of course, the title is among the least significant aspects of a paper because it’s more of an attention-seeking thing. It is not a core component of science. But the same principle applies to other areas. For example, how many scientists and experts would agree that a specific statistical test used to compare two groups in a paper was the best way for the authors to do that analysis? Those questions are hard to quantify. That’s why applying a quality score to a peer review is so difficult.

People often think that peer review is a solution for everything and that it means scientists have conclusively agreed a version of a paper is good. But, we’ve seen in post-publication comments through PubPeer and other platforms that two people are often unable to see what thousands and thousands of people see when they read the paper. And, the more eyes that look at the paper, the more likely others will be to detect issues and start broader discussions about them.

It is also harder to define what quality peer review is and how to measure it because it depends on the quality of the initial submission plus the expertise of the reviewer. It isn’t as simple as saying a great peer reviewer will do a quality peer review. Scientific publishing today involves so many steps, and we cannot expect individuals to catch everything. We frequently see instances of papers retracted because of mistakes peer reviewers did not detect. Of course, a large portion of those retractions happen because of publication falsification.

In the past, we never had access to the raw data to check answers in more detail. So, peer review often consisted of checking the soundness of the reporting and argument based on the data presented. Now, we may have access to the underlying data thanks to open data initiatives. However, many peer reviewers are still unable to spot fabricated data. It took years for some of these cases to come to light through data forensics experts looking into the raw data and seeing who edited it, when they edited it, and what they had done. Data forensics is a completely different field that requires a lot of expertise that peer reviewers don’t necessarily have, and that shouldn’t be their job.

What’s interesting is many scholars complain about instances like this saying how could that paper have passed peer review? Yet, when you look at author satisfaction surveys, almost all authors say they are happy with the peer reviews they’ve gotten. That’s likely because if your paper is published, you’re glad about it. Everyone has approved that what you’ve done is good. So, there’s this discrepancy. Many scholars are seeing that the research in their field is not as reliable as they want it to be, but also self-reporting that they’re happy with how their peer reviews have gone.

How are you attempting to address those challenges with the EASE peer review quality guidelines?

MM: As editors, we always want to ensure the papers we’re considering are going through rigorous expert peer review and that we can be proud of what is published. So, when things go wrong, we need to ask ourselves what needs to change. We have to be open to discussing where there are issues in peer review so we can do better, and we need to start by acknowledging that we will not always get it right. In the new draft guidelines, we propose inviting authors, editors, and reviewers to rate review reports and evaluate them, so journals can get a better sense of how everyone perceives the peer review process.

Another thing we recommend is for journals to have structured peer reviews. By that, we mean clearly defining the questions a peer reviewer is supposed to answer. We often assume if a reviewer hasn’t commented on something, it means they are okay with it. But that needs to be explicitly stated for it to be true.

For example, when we force people to tell us if they stand behind the statistics of a paper, then we see more responses of people saying things like, maybe this needs another set of eyes. My expertise isn’t in this particular area. I’m able to evaluate the literature review, I’m able to evaluate the discussion, and so forth. But I’m not as familiar with all the little details of the statistics as others would be. So we need structure around evaluating things like the soundness of the methods, assessment of possible limitations of the study, the validity of the data, and so forth, to ensure all areas are being considered and commented on.

Often, as journal editors, we don’t change manuscripts very much. Then, we take the praise for what we publish in our journal. If we can’t prove what work was done by the journal during the peer review process, then I think that’s where we need to start having these conversations. For example, there’s recently been a call from some researchers to have something akin to nutrition labels for academic articles, and I think that’s a good direction to move in. We want to make clear what checks are happening. What does it mean that our journal has conducted a quality peer review? Does it mean that our authors were happy with it? The reviewers? The editors? The readers? When we start having these kinds of conversations, we will see what needs to be improved, and we can start working on how to improve it. So, this RFI is an attempt from our side to raise awareness and get people thinking about these things.

While the guidelines are for editorial use, we included a comprehensive overview of how we think about quality peer review that is accessible to researchers. Since researchers often help editors do studies on the quality of peer review, we wanted to help them understand why this is such a challenging topic.

If an author reads about what goes into quality, we presume maybe they will not be comfortable with rating something from 1-5 stars because they will realize how much more complex it is. But, if you read the document between the lines, you’ll quickly see that we’re saying 1-5 stars is an initial proxy for something very complex, and let us use it as a jumping-off point. Then, we can talk about the levels of complexity and whether we should consider dividing quality into different components, like good language, good reporting, and the many other aspects that could be part of the peer review.

If you think about how we look at peer review quality right now, we don’t have sufficient data from journals. There are examples of author and reviewer surveys about the process. However, it isn’t a standard practice, like how many universities will publish annual student satisfaction surveys. We don’t see yearly surveys published on journal websites saying this is the satisfaction of our authors with our services, this is the satisfaction of scholars who contributed a peer review report, and this is the editorial team’s satisfaction with the process. So, the point is, how can we raise the bar if we don’t know what level we’re at right now?

How can the academic community share feedback on the draft guidelines?

MM: We wanted to put out an RFI for this resource to get as much feedback as possible by the first of June. Most of our documents are also regularly updated at the end of the year when we have a bit of an audit and make changes if new research or recommendations come up, so we will be accepting comments throughout the year as well.

Many thanks to Dr. Mario Malički for taking the time for this interview! You can read the draft EASE guidelines for assessing peer review quality and submit your feedback here.

Update note: This blog post was originally published on May 21st 2024 and updated on June 18th 2024.

Danielle Padula
This post was written by Danielle Padula, Head of Marketing and Community Development
Guide to Managing Reviewers Course