StudySwap is Celebrating Its First Birthday!

It is amazing to think that we launched just over one year ago (March 2nd, 2017) with just a couple of example posts, a twitter account, and some bare bones information on our OSF pageIt has been a fun and rewarding year for us. Here, we list some of our personal highlights.


Successful collaborations completed or in progress:

Year 1 saw a nice initial set of collaborations facilitated via StudySwap, demonstrating the efficacy of the site for finding like-minded researchers with complimentary resources and data collection capacity. A few examples:

  • Martin Schweinsberg recruited labs for Pipeline Project #2
  • Katie Corker recruited labs for Many Labs 5
  • Jaya Karunagharan found a collaborator in the Netherlands to extend findings from a study conducted in Malaysia
  • Savannah Lewis (an undergrad in my lab) collected data for Liam Satchell, while he switched institutions and got his lab up and running. She had this to say about the experience: “Through StudySwap I had the opportunity to work with Liam Satchell on a different type of research than I would normally be able to with my current research mentors. This collaboration has helped me become a more well-rounded researcher and allowed me to have a wider range of experiences on my CV for my graduate school applications.”

More posts awaiting collaborators:

There are also several quite interesting posts that still await collaborators:

  • Harry Manley posted a standing offer to collaborate with anyone who would like to attempt a replication of their study outside of a WEIRD sample (he can collect data in Thailand).
  • Dan Simons is willing to collect data in his lab in Illinois.
  • Xenia Schmalz is seeking a collaborator with expertise in computational modeling.

Getting the word out:

This first year has also been chock full of great opportunities to share information about StudySwap with a wide audience:

Eyeballs on StudySwap:

Posts on StudySwap are now being seen by a large number of researchers, increasing the likelihood that suitable collaborators are are found:


  • We have over 1,000 followers on twitter and we share every post on our feed.
  • StudySwap was visited 6,837 times this year!
  • Our OSF page with supplementary documents has been visited over 2,500 times.
  • Posts  were downloaded over 500 times

New, welcoming publication outlet:

Collabra: Psychology–the official journal of SIPS–is supporting a Nexus that will be an outlet for multi-site collaborative research projects. This is exciting for a few reasons. First, because there is an outlet specifically for the publication of these projects, we are hoping that this Nexus encourages some researchers to coordinate or contribute to one of these projects. Second, this Nexus is offering a Registered Reports submission format where a proposing researcher can get an in-principle acceptance of their multi-site study before data collection. In addition to vetting the data collection protocol via the Registered Reports process, an in-principle acceptance is believed to make it easier to recruit other researchers to join a multi-site collaborative project.

If you are interested in proposing a multi-site collaborative research project, you can submit a Stage 1 Registered Report proposal to Collabra: Psychology. Be sure to indicate you are submitting to the special Nexus on crowdsourced research. If you are interested in contributing to one of these projects, you can follow StudySwap on twitter and facebook. We will be using social media to recruit labs to contribute to those projects.

New team member:

Finally, we closed out year 1 by expanding out team. We are thrilled to welcome Amy Riegelman, a Social Sciences Librarian at the University of Minnesota, as the third official member of the StudySwap crew.

Here’s hoping our second year is even better than our first!

BBBRRRs: “Brick by Brick” Registered Replication Reports

A recent twitter discussion on the merits (or seeming lack thereof) of small N studies led to an interesting idea for a new tweak on the RRR model. Check out the convo, but here’s the tl:dr version: I think that even small N studies can make valuable contributions to the literature because many of them can be combined into an unbiased meta-analysis of results from the individual Registered Reports (which Chris Chambers places at the top of his “evidence pyramid” for controversial research…but the pyramid seems applicable to all research in my opinion). We should create spaces in peer-reviewed journals for these small N studies to be published as individual bricks that can eventually be combined to build a solid house.


How can these small N studies ever build a house? A hypothetical example:

Dr. C is interested in replicating Experiment 2 from Loftus and Palmer (1974), a classic study on memory reconstruction. Dr. C works at a small university (we’ll call it Grandpa’s Cheesebarn University) and can only collect 100 participants per year for any given study. Instead of toiling away for years to reach an N large enough to make a stand alone contribution on the replicability of this seminal experiment, Dr. C would like to publish his own modest brick, in the hopes that others will come along and add their bricks at a later date.

Enter the BBBRRR (the power of the name is in its awfulness). Dr. C submits a Stage 1 Registered Report manuscript for review at the BBBRRR Journal with a plan to collect data from only 100 participants. The BBBRRR Journal sends the paper out for review. The reviewers and editor all agree that Dr. C has provided a strong rationale for replicating this particular experiment, he has planned reasonable and rigorous experimental procedures and data analysis plans, and he has provided a well justified number of similarly sized bricks (let’s say 12 total studies in this case) that would be needed to build a “house” of replication evidence for this experiment. The BBBRRR Journal extends an in-principle acceptance (IPA) to Dr. C. He can proceed with data collection with the assurance that the BBBRRR Journal will publish his paper if he follows his plan in good faith. There is one final catch. Dr. C also must agree to conduct a meta-analysis of all 12 studies, and write a summary report of this analysis at the end of the process. Dr. C agrees, conducts his study, writes his final report, and it is published in the BBBRRR Journal.

This now opens the door for Researchers A, B, D, E, F, G, H, I, J, K, and L to submit their own “mini RRs” specifying their plans to collect data from 100 participants following the same procedures and analysis plans laid out in Dr. C’s now published replication report. Once all 12 total replications have been conducted and published, Dr. C conducts the meta-analysis, including the results from all studies, and writes a summary report to “close out” the BBBRRR. All contributors to the independent replication reports are listed as authors on this final summary.

Key Benefits

  • Researchers can publish results from relatively small samples, if these samples are considered meaningful contributions by reviewers and editors. In some areas of research, for example those involving hard-to-reach populations or rare diseases, even a small N can represent a serious investment of research resources. These contributions should be valued when they can contribute to bias-free meta-analyses over the long-term.
  • No individual researcher is burdened with the considerable effort of coordinating a simultaneous, large-scale replication effort across many labs.
  • All researchers can proceed with data collection having an IPA in hand.
  • The final meta-analysis is free of publication bias.
  • Replication results can be evaluated as they come in.
  • We slowly publish many houses comprised of many solid bricks.

What do you think? Worth a shot?

Adversarial Co-blog-oration: the R-factor

Late last week, Verum Analytics rolled out a new online tool for calculating a published finding’s “R-factor,” a citation-based metric that they have introduced to quantify a study’s replicability. The r-factor is the proportion of citations for any given study that come from successful replications of the study. I was asked by Dalmeet Singh Chawla, a science journalist, to test out the new online calculator. The website automatically generates r-factors from PubMed IDs. I sent him my thoughts and feedback for inclusion in an article in Science Magazine. I was relatively critical of the R-factor in my responses to Dalmeet, and he selected representative quotations to include in the piece.

I then received a very polite and thoughtful email from the Verum Analytics team: Yuri Lazebnik, Peter Grabitz, Josh Nicholson, and Sean Rife. They wanted to better understand my criticisms of the r-factor. After several rounds of emails, we decided that our discussion presented a great opportunity to publicly share a bit of our “back-and-forth.” Here are 4 criticisms that I shared with the Verum team, followed by their responses to those criticisms.

CRC: The published literature is heavily biased towards positive, statistically significant results. Any metric that is calculated directly and automatically from the published literature will share this bias and present an unrealistic estimate of a study’s replicability.

VA: First, we would like to emphasize that the R-factor does not measure the replicability of a study, but rather indicates whether the claim(s) it reported has been confirmed. For example, testing a claim in a different experimental system, which is a common practice, is not a replication by definition, while a reproducible study can make a wrong claim by misinterpreting its results. The R-factor indicates the chance that a scientific claim is correct, irrespective of how this claim has been derived.

Second, the R-factor indeed only reflects what scientists have published, including the bias towards publishing “positive” results. However, the R-factor has a potential to decrease this bias by making the negative results consequential, thus providing an incentive to report them. To make reporting negative results easier, we will include preprints, dissertations and theses in the literature we scan.

CRC: There are many features of a study that can increase its evidence value: pre-registration, high statistical power, a representative sample, etc. Since the R-factor relies on a “vote-counting” method, the relative strengths and weaknesses of the methodologies of the counted studies are not considered. This means that small, unregistered studies carry the same weight as massive, registered studies.

VA: We use vote counting because it can be implemented on a large scale, which is required to make any measure of veracity widely used. We are aware that the robustness of vote counting can vary with the scientific field and would like to emphasize that the R-factor should be used as one of the tools to evaluate the trustworthiness of a scientific claim.

CRC: I gather that one intended strength of the metric is that it is quite simple and easily understandable. This is also a weakness. It relies on overly simplistic, bright-line criterion reasoning about replication “successes” and “failures.” A focus on meta-analytically established estimates of effect size and effect size heterogeneity, preferably derived from pre-registered studies or registered reports (which are free of publication bias), would be much more informative. For a method that I personally find more productive, check out the replication database at Curate Science.

VA: The approach that Curate Science has taken is indeed excellent. The question is how widely such comprehensive approaches can be applied in practice, given the effort required. For example, cancer research alone produces just under 200K publications each year. We see the R-factor as a simple measure that can be deployed widely in the foreseable future to help improve the trustworthiness of scientific research. We do envision a future capability to zoom into the details of the confirming and refuting evidence where representing this information is feasible. What our tool already provides is the list of studies that confirm and refute a claim. This feature helps the user to focus on the relevant publications, which are otherwise hidden among the dozens or hundreds of citing reports.

CRC: Some findings that we have reason to believe are not exactly models of replicability receive quite high R-factor scores. Two examples: Ego Depletion and Professor Priming.

VA: Our tool can return a high R-factor score for an irreproducible article for two reasons: if the refuting evidence is not published, or if the automatic classifier misdentifies a study as confirming. The latter is the problem in the example you provided.  The accuracy of our classifier, its robustness, and the graphic representation of the results all need to be improved, which requires further work, time and resources. We hope to attract these resources by showing the scientific community a prototype of what we want to create. Another limitation is that we can currently analyze only openly accessible publications. We are working on gaining access to others.

Meanwhile, the list of classified citing statements that the tool provides can be used to verify that statements are not misclassified as confirming. We will add a feature to report misclassified citations and other feedback, which will help make our tool more accurate, informative, and fun to use.

That’s all for now folks. We agreed a priori that I would NOT respond to or rebut any of Verum’s replies to my criticisms (I requested this setup in my invite to them). Please feel free to continue the conversation in the comments. This has been a really fun and informative experience for me. A summary:

  • Verum Analytics is trying to develop and use a new analytic tool focused on an important issue in scientific publishing.
  • I publicly criticised their approach in the press.
  • They responded quite constructively, with serious questions about my criticisms to seek elaboration from me.
  • I provided additional reasoning to support my criticisms
  • The responded in kind.
  • We have now shared our discussion publicly.

Critical scientific discourse!


On the Verge of Acceleration: The PSA has Received its First Submissions

The Psychological Science Accelerator received 7 very interesting submissions in response to our first call for studies. Submissions will be reviewed blind, and proposing researchers will remain confidential. So, while they are nameless for now, I am extremely grateful that these 7 teams were willing to take the brave step of submitting their ideas to this new and unproven project. A few facts and figures regarding submissions jumped out to me, and demonstrate some of the promising elements of this initiative:

  • Submissions came from 6 different countries on 3 continents
  • The career stage of the researchers spanned well-established senior faculty to graduate students
  • Desired N ranged from 300 to 6,000
  • The studies seek to investigate a variety of social psychological, cognitive psychological, methodological, and meta-scientific questions
  • Some are novel, some are replications, and some look to extend published findings across cultures
  • All have provided solid justifications for large-scale data collection across many sites

Our Study Selection Committee now has the exciting (but also slightly unenviable) task of evaluating these submissions based on their feasibility and merit. In three weeks we hope to have a set of approved studies for labs in our distributed network to join. If you would like to contribute to our decision making process, or the PSA more generally, please sign up at the link below, or email me at

Recruiting Additional Labs

It is quite possible we decide that data collection for many or all of the submissions would be both feasible and would make a substantial contribution to the field. With that in mind, we are still actively recruiting additional labs, for 2018 and beyond. If you would like to join us, please sign up here. While 165 labs have joined our network, not all can commit to data collection in 2018, and expansion of the network’s data collection capacity will directly impact the number of studies we can conduct and the magnitude and quality of our contributions to psychological science.

10 3 Map


The Psychological Science Accelerator. Rapid Progress. More Help Needed.

The Psychological Science Accelerator is rapidly expanding! We now have 160 labs on 6 continents (come on, Antarctica!) signed up for the distributed laboratory network. I believe we are building the necessary collective expertise and data collection capacity to achieve our goal of accelerating the accumulation of evidence in psychological science.

Check out this cool interactive map of the network. You can zoom in and click on all sites to see the institutions for each lab involved. My research assistants are still populating it (it’s hard to keep up with all the sign-ups!) so it will continue to expand and evolve.

10 3 Map

Introducing the Interim Leadership Team

I am extremely excited to announce our ILT. This group of researchers has been extremely engaged in the PSA since it was little more than a general idea of mine. They have already made substantial contributions to this project, and I’m thrilled that they will be leading the charge as we clarify and formalize the principles and processes underlying the PSA.

Sau-Chin Chen, Tzu-Chi University
Lisa DeBruine, University of Glasgow
Charlie Ebersole, University of Virginia
Hans IJzerman, Université Grenoble Alpes
Steve Janssen, University of Nottingham – Malaysia Campus
Melissa Kline, MIT
Darko Lončarić, University of Rijeka
Heather Urry, Tufts University

The PSA Needs Your Study Submissions

We need researchers to submit their study ideas to the PSA (replication or novel)! We have received a considerable number of “pre-submission inquiries” but no concrete submissions thus far. The due date is fast approaching (October 11th). Please see our call for studies documents here and here and submit your ideas. We want to direct the network’s resources towards the best possible research questions in 2018 and beyond.

We Need Your Lab in the Network

We will always actively recruit additional labs. The power of this project will be in the breadth of the network. Please either email me ( or provide your info here to join us.

We Need Your Input

Our team is currently making decisions on many important issues that will shape the direction of the PSA. We want as big and as diverse a discussion as possible. Please join us even if you cannot commit to collecting data in 2018. All meaningful contributions are welcomed and appreciated! Debates are currently happening (via very fun and active google docs) on the following topics:

-The study selection/voting process
-Defining criteria for authorship (and author order) on resulting manuscripts
-Adoption (or not) of open science recommendations/requirements
-Translation of materials
-IRB/Ethics approval
-Data analytic strategies
-Measurement validity
-Publication outlets
-Dreams and wild ideas for the PSA!

Please join us and help shape the future of the Psychological Science Accelerator!

Dr. Christopher R. Chartier
Associate Professor of Psychology
Ashland University International Collaboration Research Center

The Psychological Science Accelerator: A Distributed Laboratory Network


I recently suggested that the time was right to begin building a “CERN for psychological science.” My hope was that like-minded researchers would join me in a collaborative initiative to increase multi-site data collection with the ultimate goal of increasing the pace and quality of evidence accumulation in the field. The response has been immediate, positive, enthusiastic, and a bit overwhelming (in a good way!).  We have quickly assembled a global (and constantly expanding) team of psychological science laboratories. By vote of the team, we have renamed our “CERN for Psych” project to avoid direct comparisons to physics.

We are The Psychological Science Accelerator:  A Distributed Laboratory Network.

27 days

In just 27 days, 106 labs from over 30 countries and 5 continents have expressed interest in the network and have signed up to “stay in the loop.”  Even more promising is the fact that 58 of those labs (more than 2 per day!) have already committed to contributing to our initial data collection projects in 2018. We have divided these labs into 2 data collection teams, with one collecting data in North America and the other collecting data globally for initial projects.

MAP 9 20 2017

We have built it, will they come?

We now turn this data collection capacity over to researchers around the world and welcome submissions for studies to include in these initial data collection projects. These call for studies documents (north american doc here and global doc here) have been posted as HAVES on StudySwap for others to review and download.

We will accept proposals until October 11th. The team will then discuss the strengths, weaknesses, and feasibility concerns of the proposed studies. Ultimately, we will vote as a team on which studies to include and make our final selections by November 1st.

We welcome submissions from all areas of psychological science. A few key points to consider:

  • The proposed studies can test novel hypotheses or focus on the replication of previous findings.
  • We may collect data for multiple “bundled” studies if all parties deem them to be compatible in a single laboratory session, and a mutually agreeable study order can be found.
  • We will include at least one positive control effect at the end of the data collection session.
  • Feasibility of data collection will be a primary component of our evaluation for these first projects.
  • Studies will be pre-registered, even if the research is exploratory in nature.

How to Get Involved

The two projects listed above for 2018 are just the beginning for the Psychological Science Accelerator (PSA). We hope to build a general purpose network where researchers can propose exciting new ideas and important replication studies, where the network laboratories can then democratically decide which studies are most worthy of data collection resources, and where we then collect large amounts of data in labs all around the world.

To join the network or just stay informed on our activities, fill out this 3-item google form.

To add your lab to those collecting data on the initial projects in 2018, please email me at

To propose a study for these initial projects, please review the call for studies, complete the submission form, and email your submission to me at

If the first 27 days of the Psychological Science Accelerator are any indication of things to come, we have initiated a project that can have a meaningful and lasting impact on psychological science. Please join us!

Dr. Christopher R. Chartier

Associate Professor of Psychology

Ashland University International Collaboration Research Center


Update: Building a CERN for Psychological Science

A Big Week

Things have developed rapidly since we initially proposed that now is the time to begin building a CERN for psychological science. Seventy two labs from twenty nine countries have signed up for the network (see the google map below). Furthermore, 31 labs have already committed to our first data collection projects in 2018, taking the generous step of agreeing in principal to collect data for yet-to-be-determined studies. Clearly there is strong grassroots support for such an initiative. What an exciting time to be working on the improvement of psychological science through large-scale collaboration!


What’s next?

Here are our next steps, including ways for you to get involved if you aren’t already:

-We will continue recruitment for the CERN network indefinitely. We need many more labs in many more locations with many more resources to make this a truly transformative project. You can still sign up here.

-Specifically, for the two Collections2 that we will coordinate in 2018, we would love to recruit additional labs, even though we have already surpassed our minimum goal of 10 labs devoted to each. We could particularly use more North American labs with diverse student subject populations. Again, fill out the form linked above or contact me at to get involved. This specific recruitment effort will continue until September 15th.

-We will release an open “call for studies” on October 1st to select the studies to be included in these initial Collections2.

-Collecting labs will then decide as a group which studies we will collect data for in 2018. Our decisions will be made by October 15th.

-We will then work with the researchers whose proposals were selected for these initial Collections2 to finalize detailed data collection protocols. This work will wrap up by November 15th. During this month, we will also recruit additional data collection labs in case other researchers become interested in the Collections2 once the specific studies are announced.

-On November 16th we will distribute finalized protocols to all data collection labs so they can begin making logistical arrangements and can initiate their IRB review process.

-Data collection will take place between January 1st 2018 and December 31st 2018.

-Manuscripts will be prepared and submitted in 2019. Proposing researchers and all data collection laboratories for each Collection2 will help draft, review, approve of, and be listed as authors on the resulting manuscripts.

What Should We Call Ourselves?

Another open issue is what we should call this distributed laboratory network. My initial title drawing comparisons to CERN in physics was for metaphorical purposes, and we may wish to proceed under a different title. What do you think we should call ourselves? Stick with CERN for psychological science? I’m open to ideas and feedback on this matter. Shoot me an email ( or tweet at me ( with your thoughts.


Thank you so much for your continued support of this project. I have been overwhelmed with the response and am filled with enthusiasm to continue building a CERN for psychological science.

Building a CERN for Psychological Science

In response to the reproducibility crisis, some in the field have called for a “CERN for Psychology.” I believe the time is right for building just such a tool in psychology science by building on current efforts to increase the use of multi-site collaborations.

What would a CERN for Psych look like? It certainly would not be a massive, centralized facility housing multi-billion dollar equipment. It would instead be comprised of a distributed network of hundreds, perhaps thousands, of individual data collection laboratories around the world working collaboratively on shared projects. These projects would not just be replications efforts, but also tests of the most exciting and promising hypotheses in the field. With StudySwap, an online platform for research resource exchange, we have taken small steps to begin building this network.

Ideally, a CERN for Psych would also have a democratic and decentralized process for the selection of projects to devote collective resources to. Researchers could propose exciting study proposals, publicly post them, and the community of laboratories comprising the distributed network would decide autonomously and freely which projects to devote their resources to. This could be seen as a new form of peer review. Only those projects that a collection of one’s peers deem worthy of time and energy will be supported with large-scale data collection. The most exciting and methodologically sound ideas, as determined by the community, would be those receiving the greatest amount of resources. Again, StudySwap already provides a basic starting point for this feature and can eventually fulfill this requirement more fully with changes to the site.

Finally, a CERN for Psych should involve projects that are open and transparent for their full research life cycle. Using the Open Science Framework, projects would be open from idea proposal to methods development to data collection to eventual dissemination. Any interested party could fully review, criticize, praise, build upon, or reanalyze any component of the projects, their data, and their disseminated summaries.

This is not a pipe dream. The basic constituent parts are already in place, but there is much work to be done. What do we need to build a CERN for Psych?

-We need a large distributed laboratory network. If even 10% of psychological scientists devote a small portion of their lab resources to the CERN for Psych, we would be able to harness a massive amount of data collection capacity. This work has already begun, and dozens of labs have signed on for these efforts. Please join the network by filling out this 3-item form.

-We also need researchers who want to use the network to test important hypotheses and who are brave enough to take an innovative approach to their data collection practices. I believe that if we build it, they will come. We are currently recruiting 10 labs to each collect data from 100 participants in 2018 (Total N = 1,000) for just such a proposed study from someone not on the collection team. We will release a call in the near future soliciting study proposals from researchers without access to large samples at their home institutions and who can demonstrate that they would particularly benefit from a geographically dispersed and relatively diverse sample. Email me ( if you have ideas you’d already like to propose. This will be a small demonstration of the feasibility of such projects.

-We also know we need a better online platform for StudySwap. The current page, using the OSF for meetings structure was a short term hack that we are already outgrowing after just 6 months. The new site will need much more sophisticated searching, tagging, and categorizing capabilities. We are working with OSF on these improvements.

-We need funding. For now, we can build the beginnings of a CERN for Psych without big money, but eventually, this endeavor will be much more successful with financial resources at our disposal. We are actively seeking funding to support early adopters of this system.

Please join us in building a CERN for Psych. Eventually, this project could involve data collection from millions of participants, conducted by thousands of research assistants, supervised in hundreds of labs, coordinated by a democratically selected and constantly changing set of dozens of leaders in the field.

Dr. Christopher R. Chartier

Associate Professor of Psychology

Ashland University International Collaboration Research Center

Reacting to Replication Attempts

This is the first post in a three-part mini-series on replication research, to include posts on:

  • Why we should welcome replication attempts of our work
  • My own experience selecting and conducting replication studies
  • The case for offering up our own studies for replication, and how to do it via StudySwap

We should enthusiastically welcome replication attempts

How should we feel and how should we react when we learn that an independent research team either plans to conduct or has conducted a replication attempt of a finding we originally reported? I’ve prepared this flowchart to guide our reactions and elaborated a bit below.


Replication attempts are often perceived as and labeled as “tear down” missions. This response is counterproductive and we need to reframe the discussion surrounding replication attempts. To hear an excellent example of how we can do this, do yourself a favor and listen to this episode of the Black Goat. Sanjay Srivastava, Alexa Tullett, Simine Vazire, and Rich Lucas had a very interesting conversation about replication research and Rich shared some of his actual motivations for conducting replications (spoiler alert, it isn’t to crush souls and destroy careers).

As a starting point for my take on more productive responses to replication attempts of your work, let us assume that you are confident in the finding in question. If you are not, well, that’s another discussion for another time.

If you are confident in the finding, a replication attempt should be taken as a form flattery and a chance to enhance the visibility of your work. It suggests that someone in the field thinks the finding is important enough that we should have an accurate understanding of the finding or estimate of the size of an effect. If the replication attempt is ultimately published, then other members of the field agree on its importance.

The attempt “succeeds”

For example, the replication study finds an effect size very similar to your originally published effect size. Yay! An independent research team has supported the original finding and your confidence in the effect has grown with very little work on your part. You have been cited and received a big pat on the back from the data.

The attempt “fails”

For example, the replication study finds no effect or a much smaller effect size than you did originally. Of course, this will be initially frustrating. BUT, remember, you are confident in the finding. You have essentially been invited to a low-effort publication. Why? The journal will now almost certainly welcome a submission from you showing that you can, in fact, still get the finding. Heck, perhaps you and the replicating team can even work together to figure out what’s going on! This was exactly the positive and productive cycle that developed after we failed to replicate part of the Elaboration Likelihood Model’s predictions in Many Labs 3.

Original -> ML3 -> Response w/ data -> Response to the response w/ data

Charlie Ebersole has even provided some empirical evidence on how responses to “failed” replications are perceived. tl;dr: if one operates as a scientist should, by earnestly pursuing the truth and collaborating with replicators, such behavior will win you friends and enhance your scientific reputation.

So, buy your replicators a beer. You owe them one!

My next two posts will focus on my own experience selecting effects for replication attempts and how to offer up one’s own effects for independent replication.

SURE THING Hypothesis Testing

Studies Until Results Expected, Thinks Hypothesis Is Now Golden

My sons watch a cartoon called Daniel Tiger’s Neighborhood. In one episode, which they (and by extension I) have watched at least one hundred times, Daniel and co. sing a little song that I imagine will repeat in my head for decades. The chorus goes:

“Keep trying, you’ll get better.”

The episode and song have a really nice message. Daniel is struggling to hit a baseball, but his friends encourage him to work at it until he improves.

What does this song have to do with experimental psychology? One interpretation of the lyric could be that of a researcher refining her craft to improve the research she conducts and strengthen the quality of evidence her studies produce. I can’t help but hear it another way.

“Keep trying, you’ll get better…results.”

As in, if at first your hypothesis is not supported, dust yourself off and try again. I think many of us have done too much SURE THING hypothesis testing.

A Twist on an Excellent Cartoon


“Bullseyes” by Charlie Hankin

This cartoon elegantly captures the concept of HARKing. Hypothesizing After Results are Known. SURE THING hypothesis testing definitely isn’t HARKing. The hypothesis in question is often established well before any results, and certainly before the supporting results, are known. The researcher simply tries and tries and tries, all the while making “improvements” or “tweaks” with the best of intentions, until the target is struck.

It also isn’t really p-hacking, a practice in which we exercise myriad researcher degrees of freedom, typically within a single study, until our results reach statistical significance. I think that both p-hacking and SURE THING hypothesis testing deserve their own cartoons. I am not a cartoonist, nor do I know Charlie Hankin, so allow me to simply describe the needed cartoons. The artistically inclined reader is invited to produce these cartoons in exchange for fame and glory.

  • The “p-hacking bullseyes” cartoon: Targets are drawn beforehand, but they cover approximately 67% (drawn from Simmons & Simonsohn’s simulations of how bad it can get if we really go off the p-hacking rails) of the possible-arrow-landing surface.
    • The King’s shot has landed on one of the targets, and the assistant exclaims, “excellent shot, my lord.”
  • The “SURE THING bullseyes” cartoon: This one will need multiple panes, as SURE THING hypothesis testing is more episodic than HARKING or p-hacking.The target is drawn beforehand.
    • The King shoots and misses. “No worries, my lord. The arrow must be faulty. Allow me to retrieve and refine it.”
    • The King shoots again and misses again. “Ah, I know the problem. Let us quickly tighten your bowstring.”
    • The King shoots again and misses again. “Perhaps we shall try again in better lighting and wind conditions tonight.”
    • At night. The King shoots again and hits! “Excellent shot, my lord!”

If you shoot until you hit, then success is a


Of course, others have described this process in scientific experimentation. Perhaps my favorite description comes from the Planet Money podcast episode on the replication crisis. They describe flipping coins over and over until one of them hits an unlikely sequence of results. What I think hasn’t yet been discussed adequately, is the fact that many of the proposals of the open science movement (pre-registration, open data, open materials) provide weak defense against SURE THING hypothesis testing.

An Illustrative Hypothetical Scenario

In my last post, I discussed Comprehensive Public Dissemination of empirical research. This and the following hypothetical scenarios will help outline why I think it can be so powerful.

One researcher pre-registers and runs attempt after attempt at essentially the same study, “tweaking and refining” with the best of intentions as he goes. Eventually,


p < .o5.


How do we feel about this?

An Alternative Hypothetical Scenario

A different researcher has a hypothesis about a potentially cool new effect. She engages in CPD. She clearly identifies on her CPD log a series of studies intended to pilot methods and  to establish the necessary conditions for the effect to occur. Once she thinks she has established solid methods, she runs a pre-registered confirmatory study and


p < .o5.


How do we feel about this?

If We Are SURE THING Hypothesis Testing, We Aren’t Hypothesis Testing at All