A very naive assumption often made about research papers is that they must be good if they got published in a major science journal. The truth is that leading science journals such as Cell very often publish research papers of extremely low quality, papers written by authors guilty of multiple types of Questionable Research Practices. Typically the authors of the papers and the referees judging whether the papers should be published are members of the same research community, where bad research habits may predominate. The referees are unlikely to exclude papers because they committed sins that the referees themselves committed in their own papers.
Below is a list of 50 Questionable Research Practices that may be committed by researchers.
- Not publishing (or writing up and submitting for publication) a study producing negative or null results.
- Not publishing (or writing up and submitting for publication) a study producing results conflicting with common beliefs or assumptions or the author's personal beliefs.
- Asking for authorship credit on a study or paper which you did not participate in.
- Allowing some other person to appear as an author of a paper which he did not substantially contribute to.
- Fabrication of data, such as reporting observations that never occurred.
- Selectively deleting data to help reach some desired conclusion or a positive result, perhaps while using "outlier removal" or "qualification criteria" to try to justify such arbitrary exclusions, particularly when no such exclusions were agreed on before gathering data, or no such exclusions are justifiable.
- Selectively reclassifying data to help reach some desired conclusion or a positive result.
- Concealing results that contradict your previous research results or your beliefs or assumptions.
- Modifying results or conclusions after being pressured to do so by a sponsor.
- Failing to keep adequate record of all data observations relevant to a study.
- Failing to keep adequate notes of a research process.
- Failing to describe in a paper the "trial and error" nature of some exploratory inquiry, and making it sound as if you had from the beginning some late-arising research plan misleadingly described in the paper as if it had existed before data was gathered.
- Creating some hypothesis after data has been collected, and making it sound as if data was collected to confirm such a hypothesis (Hypothesizing After Results are Known, or HARKing).
- "Slicing and dicing" data by various analytical permutations, until some some "statistical significance" can be found (defined as p < .05), a practice sometimes called p-hacking.
- Requesting from a statistician some analysis that produces "statistical significance," so that a positive result can be reported.
- Using concepts, hypothetical ideas and theories you know came from other scholars, without mentioning them in a paper.
- Deliberately stopping the collection of data at some interval not previously selected for the end of data collection, because the data collected thus far met the criteria for a positive finding or a desired finding, and a desire not to have the positive result "spoiled" by collecting more data.
- Failing to perform a sample size calculation to figure out how many subjects were needed for a good statistical power in a study claiming some association or correlation.
- Using study group sizes that are too small to produce robust results in a study attempting to produce evidence of correlation or causation rather than mere evidence of occasional occurrence.
- Attempting to justify too-small study group sizes by appealing to typical study group sizes used by some group of researchers doing similar work, as if some standard was met, when it is widely known that such study group sizes are inadequate.
- Use of unreliable and subjective techniques for measuring or recording data rather than more reliable and objective techniques (for example, using sketches rather than photographs, or attempting to measure animal fear by using subjective and unreliable judgments of "freezing behavior" rather than objective and reliable measurements of heart rate spikes).
- Failing to publicly publish a hypothesis to be tested and a detailed research plan for gathering and interpreting data prior to the gathering of data, or the use of "make up the process as you go along" techniques that are never described as such.
- Failure to follow a detailed blinding protocol designed to minimize the subjective recording and interpretation of data.
- Failing to use known observed facts and instead using speculative numbers (for example, using projected astronomical positions ages in the future rather than known astronomical positions, or "projected future body weight" rather than known current body weight).
- Making claims about research described in a paper that are not justified by any observations or work appearing in the paper.
- Giving a paper a title that is not justified by any observations or work appearing in the paper.
- "Math spraying": the heavy use of poorly documented equations involving mathematics that is basically impossible to validate because of its obscurity.
- Making improper claims of scientific agreement on debatable topics, often with unjustified phrases such as "scientists agree" or "no serious scientist doubts" or claims using the ambiguous word "consensus"; or making unsupported assertions that some particular claim or theory is "well-established" or the "leading" explanation for some phenomenon.
- Faulty quotation: writing as if some claim was established by some previous paper cited with a reference, when the paper failed to establish such a claim (the paper here, for example, found that 1 in 4 paper citations in marine biology are inappropriate).
- Lazy quotation: writing as if some claim was established by some previous paper cited with a reference, when the paper was not read or understood by those making the citation.
- Including some chart or image or part of an image that did not naturally arise from your experimental or observational activities, but was copied from some other paper not mentioned or data arising from some different study you did.
- Altering particular pixels of a chart or image to make the chart or image more suggestive of some research finding you are claiming.
- Failing to use control subjects in an experimental study attempting to show correlation or causal relation, or failure to have subjects perform control tasks. In some cases separate control subjects are needed. For example, if I am testing whether some drug improves health, my experiment should include both subjects given the drug, and subjects not given the drug. In other cases mere "control tasks" may be sufficient. For example, if I am using brain scanning to test whether recalling a memory causes a particular region of the brain to have greater activation, I should test both tasks in which recalling memory is performed, and also "control tasks" in which subjects are asked to think of nothing without recalling anything.
- Using misleading region colorization in a visual that suggests a much greater difference than the actual difference (such as showing in bright red some region of a brain where there was only a 1 part in 200 difference in a BOLD signal, thereby suggesting a "lighting up" effect much stronger than the data indicate).
- Failing to accurately list conflicts of interests of researchers such as compensation by corporations standing to benefit from particular research findings or owning shares or options of the stock of such corporations.
- Failing to mention (in the text of a paper or a chart) that a subset of subjects were used for some particular part of an experiment or observation, giving the impression that some larger group of subjects was used.
- Using misleading language suggesting to casual readers that the main study group sizes were much larger than the smallest study group sizes used (for example, claiming in an abstract that 50 subjects were tested, and failing to mention that the subjects were divided up into several different study groups, with most study groups being smaller than 10).
- Mixing real data produced from observations with one or more artificially created datasets, in a way that may lead readers to assume that your artificially created data was something other than a purely fictional creation.
- The error discussed in the scientific paper here ("Erroneous analyses of interactions in neuroscience: a problem of significance"), described as "an incorrect procedure involving two separate tests in which researchers conclude that effects differ when one effect is significant (P < 0.05) but the other is not (P > 0.05)." The authors found this "incorrect procedure" occurring in 79 neuroscience papers they analyzed, with the correct procedure occurring in only 78 papers.
- Making vague references in the main body of a paper to the number of subjects used (such as merely referring to "mice" rather than listing the number of mice), while only giving in some "supplemental information" document an exact statement of the number of subjects used.
- Using in a matter-of-fact way extremely speculative statements when describing items observed, such as using the extremely speculative term "engram cells" when referring to some cells being observed, or calling certain human subjects "fantasy-prone."
- Exposing human research participants to significant risks (such as exposure to lengthy medically unnecessary brain scans) without honestly and fully discussing the possible risks, and getting informed consent from the subjects that they agree to being exposed to such risks.
- Providing inaccurate information to human subjects (for example, telling them "they must continue" to perform some act when subjects actually have the freedom to not perform the act), or telling them inaccurate information about some medicine human subjects are given (such as telling subjects given a placebo that the pill will help with some medical problem).
- Failing to treat human subjects in need of medical treatment for the sake of some double-blind trial in which half of the sick subjects are given placebos.
- Assuming without verification that some human group instructed to do something (such as taking some pill every day) performed the instructions exactly.
- Turning numerically continuous variables into discrete non-continuous categories (such as turning temperature readings spanning 50 degrees F into four categories of cold, cool, warm and hot).
- Speaking as if changes in some cells or body chemicals or biological units such as synapses are evidence of a change produced by some experimentally induced experience, while ignoring that such cells or biological units or chemicals undergo types of constant change or remodeling that can plausibly explain the observed changes without assuming any causal relation to the experimentally induced experience.
- Selecting some untypical tiny subset of a much larger set, and overgeneralizing what is found in that tiny subset, suggesting that the larger set has whatever characteristics were found in the tiny subset (a paper refers to "the fact that overgeneralizations from, for example, small or WEIRD [ Western, Educated, Industrialized, Rich, and Democratic] samples are pervasive in many top science journals").
- Inaccurately calculating or overestimating statistical significance (a paper tells us "a systematic replication project in psychology found that while 97% of the original studies assessed had statistically significant effects, only 36% of the replications yielded significant findings," suggesting that statistical significance is being massively overestimated).
- Inaccurately calculating or overestimating effect size.
In some fields such as cognitive neuroscience, most papers are guilty of several of these Questionable Research Practices, often more than five or ten of them. In compiling this list I got some items from the paper "Ranking major and minor research misbehaviors: results from a survey among participants of four World Conferences on Research Integrity."
This is an excellent expose of the bad habits that can, sadly, happen in science. Great research, Mark
ReplyDelete