Header 1

Our future, our universe, and other weighty topics


Wednesday, August 13, 2025

There's Very Much Hard Fraud in Scientific Research, But Even More Soft Fraud

 Reading the Science News page on Google News today, I read a shocking story at the mainstream "Inside Higher Ed" site. It's an article entitled "The Growing Problem of Scientific Research Fraud." It starts out by saying, "When a group of researchers at Northwestern University uncovered evidence of widespread—and growing—research fraud in scientific publishing, editors at some academic journals weren’t exactly rushing to publish the findings."  We then hear a little about what sounds like a "censor the bad news" affair.

But the researchers' paper did eventually get published. We read this:

"Last week Amaral and his colleagues published their findings in the Proceedings of the National Academy of Sciences of the United States of America. They estimate that they were able to detect anywhere between 1 and 10 percent of fraudulent papers circulating in the literature and that the actual rate of fraud may be 10 to 100 times more."

We read in the article some researcher saying, "If this trend goes unchecked, science will be ruined and misinformation is going to dominate the literature.” Figure 5 in the paper includes this graph:

fraud in scientific research

The "paper mill products" line shows fraudulent papers. The "PubPeer commented" line shows papers suspected of fraud, and mentioned on a site in which scientists can anonymously discuss suspicions of fraud. The "retracted" line shows papers retracted because of their low-quality or problems discovered in them. The great majority of junk science papers are not retracted. Notice the trend lines. A larger and larger fraction of scientific papers are fraudulent or junk. 

The graph above is from the newly published paper "The entities enabling scientific fraud at scale are large, resilient, and growing rapidly." The link for the paper's pdf file is here

In previous posts I discussed the issue of fraud in biology research. The posts were these:

An article in the journal Nature asks "How big is science's fake-paper problem?"  We read this:

"An unpublished analysis shared with Nature suggests that over the past two decades, more than 400,000 research articles have been published that show strong textual similarities to known studies produced by paper mills. Around 70,000 of these were published last year alone (see ‘The paper-mill problem’). The analysis estimates that 1.5–2% of all scientific papers published in 2022 closely resemble paper-mill works. Among biology and medicine papers, the rate rises to 3%."

What's so bad if a scientific paper resembles the product of a paper mill? The article gives us a bit of a clue, without explaining it very well. It says, "Paper-mill studies are produced in large batches at speed, and they often follow specific templates, with the occasional word or image swapped." The average reader will have no idea of what this refers to, so let me explain. 

In computer programming a template is some body of text containing placeholders. The template can be used to make many different versions of a narrative, by simply replacing the placeholders with specific examples.  For example, the page here gives us a template for producing a press release announcing some scientific research. The template starts out like this:

"Scientists today announced that they are the first to successfully demonstrate SCIENTIFIC FINDING. This has long been one of the holy grails of SCIENTIFIC FIELD. 'This finding radically alters our understanding of the field, to say the least,' says FIRST AUTHOR, a SCIENTIFIC FIELDologist from INSTITUTION who led the research. 'We were stunned when we made the discovery. For a few minutes we just didn’t believe what we were seeing,'  says FIRST AUTHOR, then SECOND AUTHOR (a student of FIRST AUTHOR) yelled "We’ve done it!" and we started dancing around the LAB/OBSERVATORY/FIELD SITE. It was very exciting.”

If you are writing a scientific press release, you could manually replace the capitalized phrases to match some new research.  But templates such as these can also be inputs to computer programs. Computer programs can generate countless different versions of the narratives in a template, by doing search and replace of the capitalized words. 

So, for example, imagine you want 10,000 different versions of the story below:

"MALE HUMAN ONE had a good life, but he knew that something was missing. He tried using dating apps to meet Miss Right, but somehow it never worked out. But one day MALE HUMAN ONE had a stroke of luck.  He was at the BUSINESS PLACE ONE where he was a regular customer. He looked to his left, and was stunned by the beauty of a female he had never met before: FEMALE HUMAN ONE. MALE HUMAN ONE felt sure that he wanted to strike up a conversation with the beautiful stranger, but he couldn't think of what to say. He thought of saying TRITE OVERUSED PICKUP LINE, but thought that would never work.  Suddenly, he had a good idea. Walking up to the stranger he said, ORIGINAL WITTY ICE-BREAKING LINE." 

It would be very easy to write a computer program that generated 10,000 different versions of this story.  The computer program could just run in a loop, and thousands of times replace the phrases MALE HUMAN ONE, FEMALE HUMAN ONE and BUSINESS PLACE ONE with items randomly extracted from a list, or randomly generated. Similarly, the program could thousands of times replace TRITE OVERUSED PICKUP LINE with an item randomly chosen from a list of such lines, and replace ORIGINAL WITTY ICE-BREAKING LINE with  with an item randomly chosen from a list of such lines. 

It seems that paper mills are doing something similar, to generate phony scientific papers, which amount to phony narratives. We hear in the Nature article that some machine-learning software is being used to look for papers that are suspected products of paper mills. An estimate has been produced that 3% of the biology and medicine papers from recent years are fake papers produced by paper mills. This 3% figure is higher than for any of the other fields mentioned. We read this: "June 2022 report by the Committee on Publication Ethics, based in Eastleigh, UK, said that for most journals, 2% of submitted papers are likely to have come from paper mills, and the figure could be higher than 40% for some."

Why would such wrongdoing occur? If you are a scientist living in a "publish or perish" culture, it may be expected that you will author a certain number of papers each year. There is an effect called publication bias, in which scientific journals prefer to publish papers reporting positive results. If you are a scientist doing experiments that have recently produced only null results, you may resort to paying some paper mill to get some result that will have a higher chance of getting published. The paper mill companies are typically in foreign countries, and have discreet names such as Suichow Editorial Services. 

A researcher named Bernhard A. Sabel has developed what he thinks is a pretty simple way to spot paper mill papers in biology and medicine: look for papers which have author email addresses that are private emails or hospital emails rather than college or university emails such as joesmith@harvard.com. The technique of Sabel is entirely different from the technique mentioned earlier in this post. 

The latest version of a paper by Sabel describes the paper mill industry:

"The major source of fake publications are 1,000+ 'academic support' agencies – so-called 'paper mills' – located mainly in China, India, Russia, UK, and USA (Abalkina, 2021Else, 2021PĂ©rez-Neri et al., 2022). Paper mills advertise writing and editing services via the internet and charge hefty fees to produce and publish fake articles in journals listed in the Science Citation Index (SCI) (Christopher, 2021Else, 2022). Their services include manuscript production based on fabricated data, figures, tables, and text semi-automatically generated using artificial intelligence (AI). Manuscripts are subsequently edited by an army of scientifically trained professionals and ghostwriters." 

Sabel mentions a case of a paper mill that emailed a scientific journal offering a sum of $1000 if the journal published one of the papers the paper mill (calling itself an editorial services firm) helped to produce. 

A paper by Sabel states this:

"More than 1,000 paper mills openly advertise their services on Baidu and Google to 'help prepare' academic term papers, dissertations, and articles intended for SCI publications. Most paper mills are located in China, India, UK, and USA, and some are multinational. They use sophisticated, state-of-the-art AI-supported text generation, data and statistical manipulation and fabrication technologies, image and text pirating, and gift or purchased authorships. Paper mills fully prepare – and some guarantee –publication in an SCI journal and charge hefty fees ($1,000-$25,000; in Russia: $5,000) (Chawla, 2022) depending on the specific services ordered (topic, impact factor of target journal, with/without faking data by fake 'experimentation')" 

Sabel estimates that paper mills are a major business, earning a revenue of about a billion dollars per year.  He estimates that close to 150,000 papers are questionable papers with red flags indicating possible paper mill authorship.  

paper mill

Publicly available AI programs such as ChatGPT are making this kind of hard fraud easier. Such programs can do a million and one things. Ask such a program to generate some type of information on some topic, and you might get some largely fictional or largely inaccurate output (sometimes called "AI slop") that can be pasted into a scientific paper. 

The discussion above is largely about what we might call hard fraud. Hard fraud may be defined as something involving data that is fake or made up. But we should not limit a discussion of scientific fraud to a mere discussion of hard fraud. There is also what we can call soft fraud. Soft fraud in scientific research may be defined as the use of  extremely misleading analysis techniques and misleading data gathering techniques and misleading data presentation techniques to give the impression that something was discovered, when no such thing occurred. Soft fraud is extremely abundant in scientific research. To read about some of the things going on when soft fraud occurs in scientific literature, read my posts here:

The Building Blocks of Bad Science Literature

50 Types of Questionable Research Practices

To understand the financial factors that drive such hard and soft fraud, you need to "follow the money" by considering factors like those diagrammed below. Read here for an explanation of the diagram. 

Factors Driving Scientific Fraud



No comments:

Post a Comment