Adversarial Co-blog-oration: the R-factor

Late last week, Verum Analytics rolled out a new online tool for calculating a published finding’s “R-factor,” a citation-based metric that they have introduced to quantify a study’s replicability. The r-factor is the proportion of citations for any given study that come from successful replications of the study. I was asked by Dalmeet Singh Chawla, a science journalist, to test out the new online calculator. The website automatically generates r-factors from PubMed IDs. I sent him my thoughts and feedback for inclusion in an article in Science Magazine. I was relatively critical of the R-factor in my responses to Dalmeet, and he selected representative quotations to include in the piece.

I then received a very polite and thoughtful email from the Verum Analytics team: Yuri Lazebnik, Peter Grabitz, Josh Nicholson, and Sean Rife. They wanted to better understand my criticisms of the r-factor. After several rounds of emails, we decided that our discussion presented a great opportunity to publicly share a bit of our “back-and-forth.” Here are 4 criticisms that I shared with the Verum team, followed by their responses to those criticisms.

CRC: The published literature is heavily biased towards positive, statistically significant results. Any metric that is calculated directly and automatically from the published literature will share this bias and present an unrealistic estimate of a study’s replicability.

VA: First, we would like to emphasize that the R-factor does not measure the replicability of a study, but rather indicates whether the claim(s) it reported has been confirmed. For example, testing a claim in a different experimental system, which is a common practice, is not a replication by definition, while a reproducible study can make a wrong claim by misinterpreting its results. The R-factor indicates the chance that a scientific claim is correct, irrespective of how this claim has been derived.

Second, the R-factor indeed only reflects what scientists have published, including the bias towards publishing “positive” results. However, the R-factor has a potential to decrease this bias by making the negative results consequential, thus providing an incentive to report them. To make reporting negative results easier, we will include preprints, dissertations and theses in the literature we scan.

CRC: There are many features of a study that can increase its evidence value: pre-registration, high statistical power, a representative sample, etc. Since the R-factor relies on a “vote-counting” method, the relative strengths and weaknesses of the methodologies of the counted studies are not considered. This means that small, unregistered studies carry the same weight as massive, registered studies.

VA: We use vote counting because it can be implemented on a large scale, which is required to make any measure of veracity widely used. We are aware that the robustness of vote counting can vary with the scientific field and would like to emphasize that the R-factor should be used as one of the tools to evaluate the trustworthiness of a scientific claim.

CRC: I gather that one intended strength of the metric is that it is quite simple and easily understandable. This is also a weakness. It relies on overly simplistic, bright-line criterion reasoning about replication “successes” and “failures.” A focus on meta-analytically established estimates of effect size and effect size heterogeneity, preferably derived from pre-registered studies or registered reports (which are free of publication bias), would be much more informative. For a method that I personally find more productive, check out the replication database at Curate Science.

VA: The approach that Curate Science has taken is indeed excellent. The question is how widely such comprehensive approaches can be applied in practice, given the effort required. For example, cancer research alone produces just under 200K publications each year. We see the R-factor as a simple measure that can be deployed widely in the foreseable future to help improve the trustworthiness of scientific research. We do envision a future capability to zoom into the details of the confirming and refuting evidence where representing this information is feasible. What our tool already provides is the list of studies that confirm and refute a claim. This feature helps the user to focus on the relevant publications, which are otherwise hidden among the dozens or hundreds of citing reports.

CRC: Some findings that we have reason to believe are not exactly models of replicability receive quite high R-factor scores. Two examples: Ego Depletion and Professor Priming.

VA: Our tool can return a high R-factor score for an irreproducible article for two reasons: if the refuting evidence is not published, or if the automatic classifier misdentifies a study as confirming. The latter is the problem in the example you provided. The accuracy of our classifier, its robustness, and the graphic representation of the results all need to be improved, which requires further work, time and resources. We hope to attract these resources by showing the scientific community a prototype of what we want to create. Another limitation is that we can currently analyze only openly accessible publications. We are working on gaining access to others.

Meanwhile, the list of classified citing statements that the tool provides can be used to verify that statements are not misclassified as confirming. We will add a feature to report misclassified citations and other feedback, which will help make our tool more accurate, informative, and fun to use.

That’s all for now folks. We agreed a priori that I would NOT respond to or rebut any of Verum’s replies to my criticisms (I requested this setup in my invite to them). Please feel free to continue the conversation in the comments. This has been a really fun and informative experience for me. A summary:

Verum Analytics is trying to develop and use a new analytic tool focused on an important issue in scientific publishing.
I publicly criticised their approach in the press.
They responded quite constructively, with serious questions about my criticisms to seek elaboration from me.
I provided additional reasoning to support my criticisms
The responded in kind.
We have now shared our discussion publicly.

Critical scientific discourse!

4 thoughts on “Adversarial Co-blog-oration: the R-factor”

zerotoscientist says:

January 28, 2018 at 10:09 am

I don’t necessarily mind that the interpretation of the tool is limited (all tools of this kind will be).

What seems to me a problem is that general citations are negative for the factor. That is, if two studies have an identical amount of confirmation, the most cited paper will appear less confirmed.

Another problem is that disconfirmation should be more impactful than confirmation, following the principle of falsification. If you can replicate 3/4 times, that’s not really impressive. However, the fact that most studies are low-powered hamper the ability to deal with this issue, as we might expect low-powered replications of true effects to fail quite often. Publication bias is also a problem because falsifications are not published.

LikeLiked by 1 person

1. Yuri says:
  
  January 29, 2018 at 3:13 am
  
  The R-factor is the number of confirming citations divided by the the sum of confirning and refuting citations. Hence, the R-factor is independent of the total number of citations.
  
  LikeLike
  
  1. zerotoscientist says:
    
    January 29, 2018 at 1:04 pm
    
    Thank you for clarifying!
    
    LikeLike
Carolyn Meinel says:

January 28, 2018 at 1:51 pm

Highly relevant to the proposed DARPA SCALE program to research means to establish confidence level for the reproducibility of social and behavioral sciences research findings. Thank you also for alerting me to the database at Curate Science http://curatescience.org/.

LikeLiked by 1 person

Share this:

Related

4 thoughts on “Adversarial Co-blog-oration: the R-factor”

Leave a comment Cancel reply