Late last week, Verum Analytics rolled out a new online tool for calculating a published finding’s “R-factor,” a citation-based metric that they have introduced to quantify a study’s replicability. The r-factor is the proportion of citations for any given study that come from successful replications of the study. I was asked by Dalmeet Singh Chawla, a science journalist, to test out the new online calculator. The website automatically generates r-factors from PubMed IDs. I sent him my thoughts and feedback for inclusion in an article in Science Magazine. I was relatively critical of the R-factor in my responses to Dalmeet, and he selected representative quotations to include in the piece.
I then received a very polite and thoughtful email from the Verum Analytics team: Yuri Lazebnik, Peter Grabitz, Josh Nicholson, and Sean Rife. They wanted to better understand my criticisms of the r-factor. After several rounds of emails, we decided that our discussion presented a great opportunity to publicly share a bit of our “back-and-forth.” Here are 4 criticisms that I shared with the Verum team, followed by their responses to those criticisms.
CRC: The published literature is heavily biased towards positive, statistically significant results. Any metric that is calculated directly and automatically from the published literature will share this bias and present an unrealistic estimate of a study’s replicability.
VA: First, we would like to emphasize that the R-factor does not measure the replicability of a study, but rather indicates whether the claim(s) it reported has been confirmed. For example, testing a claim in a different experimental system, which is a common practice, is not a replication by definition, while a reproducible study can make a wrong claim by misinterpreting its results. The R-factor indicates the chance that a scientific claim is correct, irrespective of how this claim has been derived.
Second, the R-factor indeed only reflects what scientists have published, including the bias towards publishing “positive” results. However, the R-factor has a potential to decrease this bias by making the negative results consequential, thus providing an incentive to report them. To make reporting negative results easier, we will include preprints, dissertations and theses in the literature we scan.
CRC: There are many features of a study that can increase its evidence value: pre-registration, high statistical power, a representative sample, etc. Since the R-factor relies on a “vote-counting” method, the relative strengths and weaknesses of the methodologies of the counted studies are not considered. This means that small, unregistered studies carry the same weight as massive, registered studies.
VA: We use vote counting because it can be implemented on a large scale, which is required to make any measure of veracity widely used. We are aware that the robustness of vote counting can vary with the scientific field and would like to emphasize that the R-factor should be used as one of the tools to evaluate the trustworthiness of a scientific claim.
CRC: I gather that one intended strength of the metric is that it is quite simple and easily understandable. This is also a weakness. It relies on overly simplistic, bright-line criterion reasoning about replication “successes” and “failures.” A focus on meta-analytically established estimates of effect size and effect size heterogeneity, preferably derived from pre-registered studies or registered reports (which are free of publication bias), would be much more informative. For a method that I personally find more productive, check out the replication database at Curate Science.
VA: The approach that Curate Science has taken is indeed excellent. The question is how widely such comprehensive approaches can be applied in practice, given the effort required. For example, cancer research alone produces just under 200K publications each year. We see the R-factor as a simple measure that can be deployed widely in the foreseable future to help improve the trustworthiness of scientific research. We do envision a future capability to zoom into the details of the confirming and refuting evidence where representing this information is feasible. What our tool already provides is the list of studies that confirm and refute a claim. This feature helps the user to focus on the relevant publications, which are otherwise hidden among the dozens or hundreds of citing reports.
VA: Our tool can return a high R-factor score for an irreproducible article for two reasons: if the refuting evidence is not published, or if the automatic classifier misdentifies a study as confirming. The latter is the problem in the example you provided. The accuracy of our classifier, its robustness, and the graphic representation of the results all need to be improved, which requires further work, time and resources. We hope to attract these resources by showing the scientific community a prototype of what we want to create. Another limitation is that we can currently analyze only openly accessible publications. We are working on gaining access to others.
Meanwhile, the list of classified citing statements that the tool provides can be used to verify that statements are not misclassified as confirming. We will add a feature to report misclassified citations and other feedback, which will help make our tool more accurate, informative, and fun to use.
That’s all for now folks. We agreed a priori that I would NOT respond to or rebut any of Verum’s replies to my criticisms (I requested this setup in my invite to them). Please feel free to continue the conversation in the comments. This has been a really fun and informative experience for me. A summary:
- Verum Analytics is trying to develop and use a new analytic tool focused on an important issue in scientific publishing.
- I publicly criticised their approach in the press.
- They responded quite constructively, with serious questions about my criticisms to seek elaboration from me.
- I provided additional reasoning to support my criticisms
- The responded in kind.
- We have now shared our discussion publicly.
Critical scientific discourse!