A recent scientific paper gave the results of a large project designed to test how well cancer studies replicate. Entitled "Reproducibility in Cancer Biology: Challenges for assessing replicability in preclinical cancer biology," the paper is a shocking portrait of a massive degree of malfunction within the world of experimental biology.
The authors attempted to replicate 193 experiments from 53 widely-cited cancer research papers. The authors were shocked to find that not a single one of the 193 papers gave a methods description sufficient for the authors to reproduce the experiment without asking for more information from the scientists who ran the experiment. They state, "None of the 193 experiments were described in sufficient detail in the original paper to enable us to design protocols to repeat the experiments, so we had to seek clarifications from the original authors."
Upon asking for additional information from the scientists who ran the experiments, the authors found that while 41% of the scientists were very helpful in providing information, 9% of the scientists were only minimally helpful, and 32% of the scientists were not helpful at all or did not respond to requests for information. Such a result suggests that a large fraction of all cancer experiments (a quarter or more) are either junk science procedures that scientists are ashamed to discuss, or fraudulent experiments that scientists refuse to talk about any further, because of a fear of their fraud being discovered.
Imagine if you were a scientist who had pulled some shenanigans or skulduggery when doing an experiment. Years later, you get an email from someone saying, "I am trying to replicate your experiment, but your paper has not given me enough information -- can you please answer this list of questions?" What would you do? In such a case you would probably just not answer the email. The last thing you would want is for someone to discover the sleazy shortcuts you had used, or to discover that you had fudged the results. Conversely, if you did the experiment using best-practices methods, and proceeded in an entirely honest and commendable manner, you would not be troubled by such an email, and would probably answer it in a helpful way.
The paper authors tried to reproduce the cancer studies in a way that produced a statistical power of at least .80 (which can be roughly described as a pretty good likelihood that the result was not a false alarm). They often found that the studies they were trying to reproduce typically used too small a sample size to reach such a standard. We read this: "As an illustration, the average sample size of animal experiments in the replication protocols (average = 30; SD = 16; median = 26; IQR = 18–41) were 25% higher than the sample size of the original experiments (average = 24; SD = 14; median = 22; IQR = 16–30)."
This is quite interesting from the standpoint of neuroscience experimental research. In neuroscience experiments, the great majority of scientific experiments use way-too-small sample sizes. Most experimental neuroscience papers use a sample size smaller than 15 for some of the study groups. I have often cited this "15 subjects per study group" as a minimal quality standard that neuroscientists typically fail to meet in their experiments. But according to the figures quoted above, it seems the quality shortfall is even greater than I have described. The paper suggests that sample sizes should be an average of not just 15, but 30. If that is correct, then the failure of neuroscientists to use adequate sample sizes in their experiments is far greater than I have suggested.
In the paragraph below the authors discusss some of the rot they have discovered within experimental biology:
"The present evidence suggests that we should be concerned. As reported in Errington et al., 2021b, replication efforts frequently produced evidence that was weaker or inconsistent with original studies. These results corroborate similar efforts by pharmaceutical companies to replicate findings in cancer biology (Begley and Ellis, 2012; Prinz et al., 2011), efforts by a non-profit biotech to replicate findings of potential drugs in a mouse model of amyotrophic lateral sclerosis (Perrin, 2014), and systematic replication efforts in other disciplines (Camerer et al., 2016; Camerer et al., 2018; Cova et al., 2018; Ebersole et al., 2016; Ebersole et al., 2019; Klein et al., 2014; Klein et al., 2018; Open Science Collaboration, 2015; Steward et al., 2012). Moreover, the evidence for self-corrective processes in the scientific literature is underwhelming: extremely few replication studies are published (Makel et al., 2012; Makel and Plucker, 2014); preclinical findings are often advanced to clinical trials before they have been verified and replicated by other laboratories (Chalmers et al., 2014; Drucker, 2016; Ramirez et al., 2017); and many papers continue to be cited even after they have been retracted (Budd et al., 1999;Lu et al., 2013; Madlock-Brown and Eichmann, 2015; Pfeifer and Snodgrass, 1990)...Fundamentally, the problem with practical barriers to assessing replicability and reproducibility is that it increases uncertainty in the credibility of scientific claims. Are we building on solid foundations? Do we know what we think we know?"
A separate paper by the same authors ("Investigating the replicability of preclinical cancer biology") gives results on what degree of success was achieved in trying to reproduce the selected experiments. Getting little or no help from such a large fraction of the scientists, and finding the original papers failing to give enough information for replication, the authors were only able to re-run 50 of the 193 experiments they had originally chosen. Of those 50, only 46% were successfully replicated in the sense of producing results like those reported in the original paper.
So after setting the goal of replicating 193 experiments, and doing their best to replicate all 193 whenever possible, the authors were only able to successfully replicate about 23 of the experiments. That's a pitiful replication rate of only about 12%. The authors report this: "One method compared effect sizes: for positive effects, the median effect size in the replications was 85% smaller than the median effect size in the original experiments, and 92% of replication effect sizes were smaller than the original." What this suggests is that the effects reported in experimental biology papers tend to be massively overstated.
What all these numbers give us is a vivid portrait of massive decay, rot, malfunction and arrogance within experimental biology. Another scientific study surveying animal researchers (discussed here) gives similar results. Very clearly, junk experimental results are being produced to a massive degree by experimenters very often guilty of Questionable Research Practices. From the facts that such a large fraction of the experimenters refuse to respond to questions from those attempting to reproduce their experiments, and the fact that only a small fraction of the studies can be successfully replicated, we may assume that either a very large amount of fraud or a massive degree of incompetent activity is occurring within experimental biology -- probably both. Therefore, a good general principle to follow is: assume that any novel experimental biology result you read about in the science news is bogus or junk science, unless the result has been very well replicated, with many other experimenters getting the same result. (Vaccine results have been massively replicated, because when millions of people have taken a vaccine without harm, that is equivalent to massive replication.)
I have long discussed the poor practices and shabby standards of experimental neuroscience. When poor research practices occur in neuroscience, the damage is mainly intellectual. Junk neuroscience experiments cause people to wrongly think that scientists are on the right track in their assumptions about minds and brains, which is not true. They are very much on the wrong track, betting the farm on false assumptions. But at least such misleading junk science experiments don't lead to physical human suffering. It's a different situation if so many cancer research studies are unreliable. We can only guess how great is the physical toll to human beings when so many cancer studies are not reliable.
Yesterday a jury found Elizabeth Holmes guilty of wire fraud. Her company Theranos had bilked investors out of countless millions, long making grand biology-related promises but producing only feeble results. There is many an Elizabeth Holmes (male and female) in the world of experimental biology. They victimize not wealthy investors but the federal government. Every year the US government doles out billions for scientific research, and a large fraction of this goes to fund junk experimental science that cannot be replicated because poor experimental procedures were used, or because the scientists were trying to prove something that is untrue.
You can see endless cases of wasted money by using the National Science Foundation's query tool. Below is an example, searching for grants given on the topic of synapses:
https://www.nsf.gov/awardsearch/simpleSearchResult?queryText=synapses
Very often when you click on the rows of your search results, and very carefully analyze both the original research proposal and the resulting scientific papers published, you will find that a grant proposal was submitted promising some grand result, but that the scientific papers produced were merely junk science papers describing experiments using Questionable Research Practices such as a lack of a blinding protocol or way-too-small sample sizes. The paper titles and the paper abstracts often claim to have found something not actually shown by the research.
The US government has a whole big agency (the IRS) dedicated to tracking down and punishing people who file false tax returns. But the government seems to have no agency dedicated to tracking down and penalizing "sham, scam, thank you Sam" researchers who bilk Uncle Sam out of millions by getting lavish government research grants and then producing junk experimental results incapable of being successfully replicated. We may presume that in the labs they sometimes whisper that Uncle Sam is an easy mark.
In an unsparing essay entitled "The Intellectual and Moral Decline in Academic Research," PhD Edward Archer states the following:
"Universities and federal funding agencies lack accountability and often ignore fraud and misconduct. There are numerous examples in which universities refused to hold their faculty accountable until elected officials intervened, and even when found guilty, faculty researchers continued to receive tens of millions of taxpayers’ dollars. Those facts are an open secret: When anonymously surveyed, over 14 percent of researchers report that their colleagues commit fraud and 72 percent report other questionable practices....Retractions, misconduct, and harassment are only part of the decline. Incompetence is another....The widespread inability of publicly funded researchers to generate valid, reproducible findings is a testament to the failure of universities to properly train scientists and instill intellectual and methodologic rigor. That failure means taxpayers are being misled by results that are non-reproducible or demonstrably false."
No comments:
Post a Comment