Future and Cosmos: What You Read Would Be So Different If Hi-Tech Companies Were Better at Judging Reliability

Tuesday, September 5, 2023

What You Read Would Be So Different If Hi-Tech Companies Were Better at Judging Reliability

A small class of literature gatekeepers has enormous control over what information and opinions end up being viewed by the masses. Such a class includes publishers, editors, peer reviewers, site moderators, book store owners, librarians, and people at high-tech companies who control what type of posts and articles end up in search results and news feeds. We often fail to realize how much power such people once had and still have over what type of things we read. One key point often overlooked is that enormous control is influenced by people who are not preventing publication of anything but merely controlling likelihood of readership. For example, a librarian at a public library exerts enormous control by deciding which books end up on the libraries of bookshelf, and also by putting certain books on a "New Books" shelf or a "Recommended Books" shelf where they will be far more likely to be found.

One obvious way in which this class of literature gatekeepers exerts control is by approving and rejecting proposed articles, papers and manuscripts. Scientists who act as anonymous peer reviewers help to enforce prevailing groupthink and reigning dogmas by rejecting for publication papers that defy assumptions that supposedly reign in their fields of study. The people who perform such censorship often justify it as "quality control." The class of book editors and newspaper editors once exerted the most enormous control over what the public read, by controlling what ended up on the printed page. With the rise of the Internet, the enormous power of such a class has been greatly lessened because of the ease in which people can publish content online.

But as this class of literature gatekeepers lost a good deal of their once enormous power, another class of literature gatekeepers had an enormous rise in their power. This was the class of gatekeepers controlling Internet search results and the content of digitized feeds such as Facebook feeds and the Apple News app feed. Consider the average person today. He will probably not visit a public library very often, and he will not visit a bookstore very often. But almost every day such a person will search for information using a search engine such as Google or Bing. And almost every day such a person will access various types of feeds that provide a stream of stories, articles and posts. Among the most popular feeds are the Google News feed, the Apple News feed, and the Facebook feed that is typically a mixture of posts and stories from people you don't know, and posts from people you do know, who are some of your Facebook friends.

The most enormous power is possessed by the people who control the search results of search engines and the news feeds such as those mentioned above. There are strong reasons for suspecting that such persons are not very skillfully using such power. Some reasons are the abundance of low-quality items appearing in search engine results and news feeds such as the Google News feed and the Apple News feed. Below are some of the main examples of these low-quality items:

(1) Anonymous clickbait results appearing in news feeds such as science news feeds. Clickbait is a gigantic problem that mars the reliability of stories appearing on feeds such as Science news feeds. There currently exists an economic ecosystem that strongly incentivizes the appearance of interesting-sounding but misleading science stories. This ecosystem is described in my post "Why the Academia Cyberspace Profit Complex Keeps Giving Misleading Brain Research Reports." What I say in that post about brain research holds true for many other types of science.

After a scientific paper has been written up and published, it is announced with a press release issued by the main academic institution involved in the research. Nowadays the press releases of universities and colleges are notorious for making sensationalized claims that are not warranted by anything discovered in the research being discovered. Often a tentative claim made in a scientific paper (basically a "perhaps" or a "maybe") will be stated as if it is was simply a discovery of a definite fact. Other times a university press release will make some important-sounding claim that was never made in the scientific paper writing up the research.

Authorship anonymity is a large factor that facilitates the appearance of misleading university and college press releases. Nowadays university and college press releases typically appear without any person listed as the author. So when a lie or misleading statement occurs (as it very often does), you can never point the figure and identify one particular person who was lying. When PR men at universities are thinking to themselves "no one will blame me specifically if the press release has an error," they feel more free to say misleading and untrue things that make unimpressive research sound important.

There are complex economic reasons why press releases so erroneous keep appearing so often, and why they are passed on in clickbait Internet stories that lead to pages containing ads that generate revenue. To understand those reasons you have to "follow the money" and look at which parties are profiting from such unreliable but interesting-sounding stories. The reasons are explained in this post, and sketched in the diagram below.

Ads on web pages are a pretty good indicator of whether clickbait is occurring, although not all clickbait involves pages with ads. For example, a university press office may publish a press release web page with no ads, but a clickbait headline not matching anything actually discovered in the described research. The purpose in this case is not generate ad revenue, but largely to glorify the university or its researchers.

(2) Anonymous web pages lacking a statement of authorship by one particular person, often containing ads. When a page lacks a statement of authorship by one particular person, the person or people writing the page may feel free to make dubious or false statements, having no sense that they will be held personally accountable. Many pages containing errors and distortions appear at the top of Internet search results when you search for a particular topic. Such pages often include the anonymously written pages of wikipedia.org, which often have errors, particularly when they deal with controversial topics. Apparently the algorithms of companies such as Google fail to properly penalize authorship anonymity.

Often such pages contain ads, and the presence of such ads compromises the credibility of such pages. The ads make us wonder whether the primary purpose of the page was to provide accurate information, or instead to serve as a vehicle for generating revenue by the display of the ads.

An example of a rather poor page returning near the top of Google search results is the page here. The page appears as the third result when I search for "memory recall" using Google. The page has no listed author. The page has near its beginning two ads, one very large. One is a dubious ad hawking CBD oil "for brain and memory." There is little evidence that CBD oil does anything to benefit memory. The page also has near its beginning a column of "product reviews" that are basically ads. The page includes several important claims that are very dubious and not backed up by any evidence. The page claims, "Increased activity in Globus pallidus, anterior cingulate gyrus, thalamus, and cerebellum is seen during recall." There is no robust evidence for such a thing. See my page here for a discussion of the relevant evidence, which fails to show any clear evidence of some part of the brain working harder during recall or recollection. The references the page gives are mostly to very old papers or low-quality papers, such as a study of monkeys using a sample size of only one.

(3) Pages on very subtle and complex topics requiring very many hours of thought or scholarship, written by people who have never written much or studied extensively the topic they are writing about.

We see examples of such pages showing up extremely abundantly in the top search results returned by search engines such as Google and Bing. An example is that when I search for "memory recall" on Google, I get as the third item in the search results an article written by Emilie Le Beau Lucchesi. A page describing the author mentions a bachelor's degree in journalism and a PhD in communication, "with an emphasis on media framing, message construction and stigma communication." This does not suggest the author is a scholar of the brain. Lucchesi's article contains important statements that are unfounded. Lucchesi makes the incorrect claim that "Scientists continue to learn how the brain stores and retrieves memories using brain mapping technology." No such thing has happened, and scientists are not at all learning how a brain could either store or retrieve memories. Scientists lack any credible theory of how a brain could do either of these things. Lucchesi writes this:

"Scientists use the term engram to describe the physical process the brain uses for holding a specific memory. In the past, scientists identified memory engrams in the hippocampus, amygdala and cortex."

To the contrary, no robust evidence has ever been found of engrams in the human brain. No claims to have discovered such claims will hold up to diligent scrutiny. Scientists have never found the slightest solid evidence of human learned information by microscopically examining brain tissue living or dead. Lucchesi's article refers at length to some mouse study but does not mention either the name of the study or give a link to it, merely mentioning its lead author. Lucchesi is probably referring to the study here, which is a Questionable Research Practices study involving way-too-small study group sizes such as only 10 mice and only 12 mice, with the study group sizes sometimes dropping down to ridiculously small sizes such as only 4 mice per study group. The poorly-designed study used no pre-registration, no blinding protocol, and it confesses, "No statistical methods were used to predetermine sample sizes." A decently designed experimental paper will use statistical methods to calculate an adequate sample sizes (in other words, study group sizes), and then use such sample sizes.

We can excuse Lucchesi for her misstatements in this article and for discussing this poor mouse study, because brains and neuroscience are not any areas she has written much about, and none of her three books deal with such a topic or any scientific topic. If search engines were better, her erring article on a topic she rarely writes about would not have appeared on the first page of search results when I searched for information on "memory recall."

What steps could hi-tech companies take to reduce so much junk from appearing at the top of their search results, and to reduce the occurrence of so many misleading stories on their news feeds? Among the steps they could take would be these:

(1) Severely penalize all anonymous authorship, including pages anonymously written and listing some prestigious institution as ita source. Algorithms and human judges should severely penalize all web pages, press releases, posts and articles written by anonymous authors, regardless of whether such literature lists some prestigious institution as its source. For example, a press release issued by Harvard University should be treated as probable unreliable junk unless it lists one particular person (or one or two persons) as its author. Since in recent years the nation's most prestigious institutions (such as leading universities and NASA) routinely have issued erroneous and unreliable press releases without listing specific human authors of such press releases, anonymous authorship from some esteemed authorship should not be regarded as any indication of reliability. If the major hi-tech companies were to announce such a policy, the colleges and universities that keep polluting our science news feeds with low credibility anonymously-written press releases would notice that their press releases are not attracting attention, and would switch to press releases with a single named author. Doing that would improve reliability, because a person is more likely to be truthful when writing a page that lists himself as the author.

Currently we seem to have the opposite of such a principle occurring. Search engines routinely display anonymously written pages on wikipedia.org at the top of search results. Such pages are often of very poor quality, particularly when controversial topics are discussed.

(2) Severely penalize all pages with ads, with the penalty proportional to the number and size of ads on a page. Ads on web pages generate revenue for the people publishing the site. The presence of a single ad on a web page is a reason for doubting the credibility of the statements on that page. As soon as we see an ad on a web page, we should start asking: was the page written primarily to teach truth, or to generate revenue for the web site? The more ads we see on a page, the more we should suspect that the page was written and published primarily to generate ad revenue. There is a direct relation between pages written primarily to cause ad views and the credibility of the pages. If you are writing a page mainly for the sake of generating ad views, you will tend to write enticing headlines or make sensational claims not justified by any evidence you site, because such headlines and content works better as clickbait. For example, if you write a sensational hype headline of "Breakthrough in the search for extraterrestrials" (one not matching anything discovered), that clickbait headline can appear as a link on other pages, causing many people to click on the link to get to your page.

(3) Penalize in search results all pages written by authors writing on a complex topic that they have not written much about, and promote in search results pages written by authors writing on a complex topic that they have written much about.

Major search engine companies engage in constant "web crawling" in which their automated "spiders" search every corner of the internet, and analyze the results. It would not be very hard for "rich as Croesus" search engine companies to maintain databases that try to get a rough idea of authors and their expertise or what they have often written about. For example, if there are twenty pages scattered around the web in which Bob Worthington offers long reviews of military action games, then a search engine company should be able to detect such a pattern, and record Bob Worthington as something of an expert on military action games. Similarly, if such a search engine picks up a claim that Sally Jenkins has a PhD in cell biology, then it should be able to store that fact in some expertise database.

Probably the search engine companies already have something like such databases. But they don't seem to use them well. What should occur is something like this: when someone writes for the first time on some deep topic that he has never written about, such a page should be penalized in search results. But when someone writes about a complex topic he has written very much about, such a writing should be promoted in search results, getting a higher rating.

If such a principle were followed, we would not get in our search results a result like the very poor new piece at the frequently-erring but slick site Quanta Magazine, a piece entitled "The Usefulness of a Memory Guides Where the Brain Saves It." The article repeats the myth that patient HM could not form new memories (a myth I debunk here), and repeats the groundless achievement legend that research on him "helped scientists discover that new memories first formed in the hippocampus and then were gradually transferred to the neocortex," a migration myth that multiplies the explanatory problems of explaining how a brain could store memories. The writer (identified as a "writing intern") also fails to see how bad it is that the "evidence" provided to back up the speculative theory he is promoting is imaginary-data evidence consisting not of real tests with real subjects but merely tests with simulated humans in a computer experiment (which is rather like trying to prove the effectiveness of your new gun model by showing it seems to work well inside the play world of a video game).

(4) Penalize scientific papers hidden behind paywalls, have a system for rating the quality and relevance of scientific papers and scientific articles that show up on the first page or two of search results, and use such a system to quality-check results appearing very early in search results.

Very often on the first page of the search results of major engines, we get links to very poor scientific papers very guilty of Questionable Research Practices. An example is when I search for "memory recall" on Google, I get on the first page of search results a link to the poorly designed study "Memory recall involves a transient break in excitatory-inhibitory balance." The study is guilty of the usual Questionable Research Practices so abundant in today's experimental neuroscience: lack of pre-registration, lack of any blinding protocol, the use of way-too-small study group sizes as small as only 12, and the lack of any sample size calculation to determine whether the samples sizes used were adequate. A study last year found that thousands of participants are needed for accurate correlation studies involving brain scanning, but the paper in question (relying on brain scanning) used only about 19 subjects.

Another example of a poor paper showing up on the first page of Google search results is the paper "Prefrontal feature representations drive memory recall." The paper is behind a paywall, and does not mention how many subjects it used; we may presume it is some way-too-small sample size (when scientists use decent-sized study group sizes, they almost always mention such sizes in their paper abstracts). The study involved scanning the hippocampus region of brains of mice while they performed a memory activity, a region maybe the size of a grain of sand. Given the very tiny size of a hippocampus in mouse brains, that is such a ridiculous and error-prone method that it is virtually never used. Although I can't judge the paper, based on its abstract I can say there seems to be no sense at all in including a link to it on the main page of Google search results when someone uses the phrase "memory recall." All scientific papers behind paywalls should be severely penalized in search results, not appearing on the first three pages of search results.

But, you may object, it would be too hard for a search engine company to evaluate all of the countless thousands or millions of papers published each year, to rate their quality. But such an effort would not be needed. Such a search engine company could simply rate the papers that show up on the first page or two of search results when someone searches for common search phrases such as "exercise benefits," "memory recall," "COVID prevention," and so forth. Doing that would require evaluating the quality of only a few thousand papers.

(5) Promote in search results lengthy "deep dive" articles sounding like someone had researched a topic for a long time, and penalize vacuous-sounding articles reading like college freshman efforts.

When I search for "memory recall" using Google, I get a vacuous-sounding article as the second search result. The anonymously-written article sounds like something a college freshman might have written while half-watching a movie on TV. It includes no references to other sources, no mention of any specific observations, and seems to tell us nothing useful. Some of its generalizations are untrue, such as its goofy generalization that "every time a memory is accessed for retrieval, that process modifies the memory itself," and its generalization that "the ability to access a given memory typically declines over time." No, I very clearly remember saluting the passing coffin of President John Kennedy, and there has been no decline or change in that memory over time. Why is an article like this appearing as the second result in Google's search results when I search for "memory recall"? Their algorithm for producing search results clearly needs more work.

Header 1

Our future, our universe, and other weighty topics

Tuesday, September 5, 2023

What You Read Would Be So Different If Hi-Tech Companies Were Better at Judging Reliability

No comments:

Post a Comment