Monday, February 28, 2022

Mainstream Press Hails Two Very Dubious COVID-19 Origin Studies

Two very shaky new studies about the origin of COVID-19 appeared recently as mere preprints, not yet published in a scientific journal. Discarding the principle of "wait until peer review and journal publication," the preprints were hailed in news stories on CNN, Newsweek and the New York Times.  In the media's overenthusiastic coverage of these studies, we have fresh examples of how mainstream media sources are pushovers for any wobbly scientific work that fits in with whatever narratives the mainstream prefers to peddle.  

The first study is entitled "The Huanan market was the epicenter of SARS-CoV-2 emergence." The whole idea is to make us think that COVID-19 originated in the Huanan Market in Wuhan, China because most of the 2019 and January 2020 cases originated near there. Remarkably, the paper makes no mention of the two large Chinese virus study centers in Wuhan: the Wuhan Institute of Virology, less than nine miles away from the Huanan Market, and the Wuhan Center for Disease Control, less than three miles away. Both are places where a lab leak could have been the source of COVID-19. 

The Yangtze river flows between the east and west parts of Wuhan, with the Huanan Market on the west side and the Wuhan Institute of Virology on the east side. Supposedly there were more COVID-19 cases reported in 2019 and January 2020 on the west side of Wuhan (where the Huanan Market was) than on the east side of Wuhan (where the Wuhan Institute of Virology was). Does this suggest that the COVID-19 virus probably originated naturally in the Huanan Market rather than through some lab leak involving the Wuhan Institute of Virology? There are several reasons it does not:

(1) The area around the Huanan Market is far more populous, and the number of old people living there is about two or three times greater than the area around the Wuhan Institute of Virology.  In a highly populated city, a very contagious virus rapidly spreads around throughout the city. If the virus tends to cause hospitalizations many times higher among older people, then the first hospitalizations will tend to come from whatever part of the city has the most old people, regardless of where in the city the first people were infected.  

(2) Any data about exactly where the first cases arose in 2019 or January 2020 is highly suspect. Reports about where the first cases arose came from the Chinese government. If COVID-19 had originated as a lab leak, we can expect that the Chinese government would have maximized the reporting of cases around the Huanan Market and minimized the reporting of cases around the Wuhan Institute of Virology, as part of an effort to lead people to think that the virus had not originated as a lab leak.  In fact, such a thing might have been done even if there was no lab leak, purely to minimize public suspicions that there was a lab leak.  

(3) Most of the earliest cases in Wuhan were not causally linked to the Huanan Market.  Below is part of a visual published in an article in the journal Science, one entitled "Dissecting the Early COVID-19 Cases in Wuhan." In that article the green dots are identified as cases with "no link to Huanan Market."

Earliest COVID Cases

(4) The Huanan market did not sell bats, and bats were the animals that had a virus with the closest match to COVID-19. No plausible story has yet been written as to a path of natural transmission by which COVID-19 could have come from animals to humans.  A great deal of scientific effort has been spent on looking for some intermediate animal link, but none has been found. 

(5) Referring to the virus that causes COVID-19 (SARS-CoV-2), page 13 of the paper confesses that "some 457 samples from 188 individual animals corresponding to 18 mammals species were screened for active" COVID-19 infection "from 'within and outside Huanan Market', and no positive SARS-CoV-2 samples were identified," and that "on the order of 80,000 samples from mammals across China were tested for SARS-CoV-2, yielding no positive findings."  

In light of such facts, it is quite ridiculous for the paper authors to be claiming in their abstract that "these analyses provide dispositive [i.e. definitive] evidence for the emergence of SARS-CoV-2 via the live wildlife trade and identify the Huanan market as the unambiguous epicenter of the COVID-19 pandemic." This is just another glaring example of what goes on in countless scientific papers these days: scientists boasting in their paper abstracts or paper titles that they have demonstrated some dramatic result, when no such thing has been demonstrated. I could provide a thousand other examples as glaring as the example here.  

The second paper is a very speculative affair providing nothing in the way of solid evidence.  The abstract of the paper does not claim terribly much, but the lead author of the paper (an evolutionary biologist) goes on record in a  CNN story as claiming the paper proves some grand result it does not at all prove.  We should always question news stories quoting the authors of studies saying grandiose things about their own work.  We should also always remember that the very bad habit of most science journalists is to unthinkingly and uncritically parrot whatever grandiose claims the authors of a paper may make about their own work.  The second paper gives us a "just so" story, of the type that evolutionary biologists love to tell.  Many such "just so" stories told by evolutionary biologists are hard-to-believe tall tales.  We should not be particularly impressed by the paper's use of some genomic data, as evolutionary biologists have a long history of using fragmentary or greatly insufficient genome data to prop up their speculative flights of fancy. 

When considering stories such as these, we should always ask: who are the vested interests here, and what kind of biases might they have that might have clouded or distorted their judgments or statements or data processing? It is rather clear that the Chinese government had a motive to be pushing the "natural origin of COVID-19 at the Huanan Market" story line, so that people would not suspect that scientist error at some Wuhan virology lab caused COVID-19 to emerge.  It is also rather obvious that scientists outside of China had a strong motive to be pushing such a story line.  If people thought that COVID-19 had arisen from a lab leak, then distrust of scientists would increase, and experimental scientists involved in gene work all over the world would be subjected to much greater scrutiny and regulations, a hassle they would prefer to avoid for their convenience. 

The origin of COVID-19 remains very much an unsolved problem, and the theory of purely natural origins of the virus is still very much in doubt. In the West we do not know how COVID-19 originated, and should not pretend to know something we do not know. 

Postscript: When I published the post above, I had not read an article in Nature, which states the following (consistent with my thoughts above): 

"Nevertheless, some virologists say that the new evidence pointing to the Huanan market doesn’t rule out an alternative hypothesis. They say that the market could just have been the location of a massive amplifying event, in which an infected person spread the virus to many other people, rather than the site of the original spillover."

The article makes clear that the "The Huanan market was the epicenter of SARS-CoV-2 emergence" paper discussed above relies on a speculation that "raccoon dogs" were the source of COVID-19, using the words "speculate" or "speculation" three times to refer to this wild idea. COVID-19 was not found in any such animals, and the idea that COVID-19 came from such animals is groundless guessing. 

Referring to the two studies mentioned in the title of this post, and writing in the Bulletin of Atomic Scientists, physician Laura H. Khan states this

"Two recent papers, Worobey et al. and Pekar et al., present geospacial analysis of animal stalls in the Huanan market and viral phylogenetic analysis but do not provide convincing evidence of natural spillover. The data and analyses discussed by Worobey are equally consistent with both hypotheses: (1) that SARS-CoV-2 first entered humans at the Huanan Seafood Market in Wuhan, and (2) that SARS-CoV-2 first entered humans at another location and was subsequently brought to the market and then amplified in the market by humans. The authors’ assertion that the data and analyses support only the natural spillover hypothesis is false. Gao et al. reached a conclusion opposite to the claims of Worobey et al. and Peckar et al. Gao et al. reported that there were no positive animal samples at the Huanan market. They further reported that there was no correlation between the locations of the animal sellers in the market or the locations with the highest densities of humans and the locations of the positive environmental samples in the market. Based on these findings, Gao et al suggested that the market 'acted as an amplifier,' with infections being brought into the market by humans infected elsewhere. The hypothesis that SARS-CoV-2 originated from a laboratory-related spillover—for example, from a laboratory-acquired infection—remains a viable possibility....Premature, false declarations of 'dispositive evidence'  or “proof” does not generate public trust in science and does not protect public health."

Thursday, February 24, 2022

The Myth About the Book "Fashionable Nonsense"

Scientists have always tended to tell us idealized tales about the origin of scientific theories. We may be told that some brilliant thinker had a great "Eureka!" insight, and then formulated a theory; that the theory then started to catch on and become popular because it so closely matched the truth and passed observational tests; and that generations later the theory still reigns because of its great success in predicting reality and describing reality. 

But in the later half of the twentienth century quite a few sociologists started to critically analyze the grand triumphal narratives of scientist achievement told by scientists. They concluded that some or many of these narratives can be explained largely by social factors rather than the discovery of objective truth.  One of the most important texts by such thinkers was the 1966 book The Social Construction of Reality: A Treatise on the Sociology of Knowledge by Peter L. Berger and Thomas Luckmann. 

In the 1990's physicist Alan D. Sokal started to engage in literary combat against such work in the sociology of science.  His first step was a very dubious act.  He produced what he later called a hoax paper, entitled Transgressing the Boundaries: Towards a Transformative Hermeneutics of Quantum Gravity. He submitted it to a sociology journal that published it. After the publication, Sokal tried to create the impression that the paper was obvious nonsense, and insinuated that its publication showed that sociology journals would publish any nonsense.  This affair was later called the Sokal Hoax. The paper can be read here. Sokal claimed that "In sum, I intentionally wrote the article so that any competent physicist or mathematician (or undergraduate physics or math major) would realize that it is a spoof." 

This claim about the paper was not correct. The paper was not an obvious parody. Including many quotes from other writers (none of them made up), the paper contains many paragraphs that are quite defensible from a particular philosophical perspective.  An analysis of the paper states that five paragraphs of it provide a "superficial, but essentially correct, overview of physicists' attempt to construct a theory of quantum gravity."  The same analysis says that one section of the paper "contains some ideas -- on the link between scientists and the military, on ideological bias in science, on the pedagogy of science -- with which we partly agree." 

So there were two forms of chicanery going on here. First was the act of submitting to a sociology journal a paper that was supposedly designed as a hoax and a spoof. The second bit of chicanery involved the many untrue claims that the paper was pure nonsense or an obvious parody, and that it showed that sociology journals might publish things that they very obviously should have been rejected.  Quantum gravity (a purely speculative family of theories) is one of the most obscure, impenetrable and speculative areas of physics.  So we should not expect any reviewer of a sociology journal to critically analyze statements made about quantum gravity.  Submitting a paper about the sociology of quantum gravity to a journal with editors and reviewers that could not possibly understand anything about quantum gravity was a very shady type of trick. 

Also, since the submitted paper ("Transgressing the Boundaries: Towards a Transformative Hermeneutics of Quantum Gravity") was authored by a university physics professor, and dealt with an extremely specialized topic in theoretical physics, it made no sense to claim or insinuate that some reviewer or editor at a sociology journal should have discovered flaws in the paper.  When experts in some highly specialized topic write on that topic, the normal procedure is to assume that such authorities are speaking rather intelligibly and credibly on their field of expertise.   

Sokal's claim that the paper was an obvious parody was exaggerated and distorted by numerous writers, who claimed that the paper was obvious nonsense, something it was not. The paper was something largely sensible, but merely sprinkled with nonsense here and there. 

Sokal's next move against the sociology of science was to co-write a book entitled Fashionable Nonsense: Postmodern Intellectuals' Abuse of Science.  A myth has arisen about this book: that it delivered some great blow against thinkers in the sociology of science.  The book did not actually present any strong case against such thinkers. 

The book is mainly devoted to pursuing a very strange method: the method of quoting the most poorly written and unintelligible passages in books written by thinkers in the sociology of science, particularly passages making a poor use of mathematical or scientific jargon.  The thinkers quoted from are not very well known figures in the United States. They include Jacques Lacan, Julia Kristeva, Luce Irigaray, Bruno Latourn, Jean Baudrillard, Gilles Deleuze, Felix Guattari, and Paul Virillo.  The passages include:

(1) some passages that are just a kind of conceptual mishmash because they mix up jargon or buzzwords or abstruse ideas in a way that does not end up making sense;

(2) some passages that are inadvisable because they involve some misunderstanding of some fine point of mathematics or science;

(3) some passages that involve faulty generalization.

In general, if you were to hear someone speaking the passages Sokal quotes, you wouldn't say "that's wrong" or "that's not true"; you would merely say something like, "I don't get what you mean." 

The technique Sokal used was a very strange one indeed. It was a very superficial form of "drive-by scholarship." To intelligently rebut some thinker, you must study his work, understand what his main arguments are, and then rebut such arguments. Sokal failed to do that.  He seemed to think you can discredit some thinker if you show that he wrote some passage that is unintelligible or failed to use terms from science or mathematics in a convincing and intelligible way. That is not true. A professor can write a book that is largely convincing, but which occasionally lapses into cringe-worthy pointy-headed professor-speak that leaves you saying: "Huh?" That happens very abundantly in Sokal's own world of physics and mathematics. 

The title "Fashionable Nonsense" seems like a misleading one, as the book did not even identify any fashionable claims that were nonsensical. A more accurate title of the book would have been "Some Poorly Written Paragraphs I Have Found in the Writings of Intellectuals." In the preface of the book, the authors (Sokal and Jean Bricmont) actually confess that their book does very little to rebut the authors they discuss. We read this:

"We show that famous intellectuals...have repeatedly abused scientific concepts and terminology; either using scientific ideas totally out of context...or throwing around scientific jargon in front of their non-scientist readers without any regard for its relevance. We make no claim that this invalidates the rest of their work, on which we suspend judgment." 

Despite this confession that their book really doesn't do much of anything to rebut the authors who are being quoted, a myth somehow arose that the book Fashionable Nonsense had somehow dismantled the idea that the spread of scientific theories is sometimes largely caused by social factors.  The people advancing such a myth would commonly commit the sin of defamatory definition. They would define a postmodernist as someone believing that there is no objective truth, and that all scientific theories are just social constructs.  That is not a correct claim about the typical postmodernist thinker.

There were some thinkers in the sociology of science who argued for what they called a "strong program" in the sociology of science. They defined this as the approach that a sociological account or sociological explanation can be made of the origin, spread and perpetuation of scientific theories, regardless of whether they are true or false.  It is a great misrepresentation of such an idea to describe it as the claim that there are no objective scientific truths. 

It is a reasonable claim that sociological explanations can be given for why some scientific theories become popular. It is a gross misrepresentation of such an idea to portray it as the claim that all scientific theories are just social constructs, and that there is no objective truth.  The people who advance such misrepresentations are using "straw man" tactics.  Neither Sokal's hoax quantum gravity paper stunt nor his Fashionable Nonsense book did anything to discredit the reasonable idea that when trying to explain the popularity of any disputed scientific theory, we should always be asking: what kind of social factors and vested interest factors and bandwagon factors and psychological benefit factors may have played a role in the rise of such a theory?

academia authoritarianism
Social factors help explain why some shaky theories triumph 

Sunday, February 20, 2022

The Top 27 Numbers to Ponder When Judging the Chance of Accidental Biological Origins

Below are some definitions of "accidental":
  • "happening by chance, unintentionally, or unexpectedly" -- first result from a Google search for "accidental definition"
  • "occurring unexpectedly or by chance" -- Merriam Webster
  • "happening or existing by chance" -- Cambridge Dictionary
  • "happening by chance or accident; not planned; unexpected" -- Dictionary.com
How likely is it that biological innovations such as tiny microbes or large mammals could have first appeared on our planet by accidental  biological processes, meaning processes that did not involve any intention or purpose? Perhaps the best way to intelligently consider this question is to make a list of all the numbers relevant to such a consideration. Then you can make a probability judgment based on these numbers.

I will now present a list of the numbers that are most relevant when judging the likelihood of natural biological origins. Every one of these numbers should be carefully studied and pondered by anyone claiming that earthly biological organisms originated by accidental  processes, for each of these numbers directly affects the likelihood of such a thing. 

Relevant number 1: the median number of amino acids in a protein molecule

Proteins are the building blocks of cells, which are the building blocks of multi-cellular biological organisms. The building blocks of proteins are amino acids. The number of amino acids in a protein is a key measure of the complexity of life, and also a key measure of the difficulty or probability of any type of life accidentally originating.  For example, were it true that a functional protein typically required only a few amino acids, then it would be relatively easy for a protein to naturally form from random combinations of amino acids. The higher the average number of amino acids in a protein, the more unlikely that a protein could arise by accidental processes.

Proteins have been very well studied, so we know the average number of amino acids in proteins.  According to this page, the average number of amino acids in a human protein is 480. The number of amino acids in a human protein varies from about 50 to more than 800. The scientific paper here refers to "some 50,000 enzymes (of average length of 380 amino acids)." On the page here, we read that the median number of amino acids in a human protein is 375, according to a scientific paper. 

You cannot make any relevant probability calculation from such a number until you also consider the next item on our list.

Relevant number 2: the total number of possible amino acids that can exist in any one position in a protein (or in any one spot in the sequence of amino acids corresponding to the protein)

In considering proteins, we are interested in the size of the combinatorial space relevant to the protein. Before it folds into a three dimensional shape, a protein molecule consists of a chain of amino acids that may be compared to a string of beads (with each amino acid being like a bead on this chain). How many possible amino acids might exist at one spot on this chain? The answer is: using the genetic code used by all life, there are twenty possible amino acids.

genes, polypeptide chains and proteins

We can use this number and the previous number to make a rough calculation of the chance of a random string of amino acids being exactly like the sequence of amino acids corresponding to a protein. If we use the previously cited figure of a median of 375 amino acids in a human protein, this gives us a probability of 1 in 20 to the three hundredth seventy-fifth power, or 1 in 20375. That is a probability of only 1 in 10487.

We get such a number when we calculate the chance of some protein having exactly the sequence of amino acids that it has, by chance. But it might be that a particular protein might have many minor variations, and still be functional. A reasonable way of allowing for such possible variations might be to assume that a protein molecule requires that at least half of its amino acids be exactly as they are. But even if we make such an assumption, we are still left with probabilities that are incredibly small. For example, suppose we assume that only half of the amino acids in an average protein must be exactly as they are, and then try to calculate the chance of the protein molecule forming by chance combinations of amino acids. We are still left with a probability of only 1 in 20 to the one-hundred-eighty-seventh power, which is a probability of only about 1 in ten to the one-hundred-thirtieth power, or 1 in 10243

Relevant number 3: the minimum number of types of proteins in the simplest living thing

We have just done some calculations that seem to show that it is incredibly unlikely that a chance combination of amino acids could produce a functional protein. So it seems like a miracle of luck is required for a functional protein to originate. A very relevant question: how many such “miracles” would be required for the simplest living thing to exist? In other words, what is the minimum number of different types of proteins needed for a self-reproducing organism?

Scientists have studied this question by tinkering with the genes of extremely simple microorganisms, trying to take out as many genes as they could without crippling the microorganism so bad that it cannot reproduceA team of 9 scientists wrote a scientific paper entitled, “Essential genes of a minimal bacterium.” It analyzed a type of bacteria (Mycoplasma genitalium) that has “the smallest genome of any organism that can be grown in pure culture.” According to wikipedia's article, this bacteria has 525 genes consisting of 580,070 base pairs. The paper concluded that 382 of this bacteria's protein-coding genes (72 percent) are essential.  Similarly, a recent report from scientists long attempting to estimate the simplest possible microbe is a report estimating that such a microbe would have 473 genes with 531,000 base pairs. This is information that all has to be exactly or almost exactly right for a cell to function properly and reproduce.  

So we can conclude that the simplest living thing requires at least 300 proteins.  The amount of fine-tuned functional information involved is roughly the same as the amount of fine-tuned functional information in a well-written 300-page instruction manual.  Just as we would never expect a well-written 300-page instruction manual to arise by chance processes (even given a billion trillion planets for such accidental processes to occur), we would never expect all the required information in a self-reproducing cell to appear by chance. 

Each gene in the DNA of a microbe specifies the amino acid sequence of a particular protein. So by saying the simplest living thing would need about 300 types of proteins, we are saying roughly the same thing as saying that the genome or DNA of the simplest living thing would need to have at least 300 genes, each of which corresponded to a particular type of protein.

This requirement (that the simplest living thing would require at least 300 different types of proteins) would seem to make the accidental origin of life an impossibility. If you need to have 300 different types of proteins for a living thing, and the probability of each protein arising from a chance combination of amino acids is less than 1 in 10 to the hundredth power, then the chance of getting all the proteins appearing to produce a living thing is much less than 1 in 10 to the hundredth power to the hundredth power. This gives a probability of much less than about 1 in 10 to the ten-thousandth power, or 1 in 1010,000. But there are some considerations that might make the situation a tiny bit more hopeful, such as the possibility of protein domains (or pieces of proteins) that are functional. I will later discuss that.

Relevant number 4: the total number of different protein molecule types in the biological world

When judging the likelihood of earthly life naturally arising, a supremely relevant number to consider is: how many different types of protein molecules are known to exist in the biological world? Each different type of protein molecule is a different invention that needs to be explained. We have seen some reasons why tremendous luck would be required for even one such invention to accidentally  occur.

Let's imagine that in the entire world there were one million types of protein molecules used by living things. Then the probability of Earth's protein molecules arising naturally would be vastly smaller than it would be if there were only 40 different types of protein molecules in the natural world. Similarly, if millions of people prayed to a deity and all had their cancer suddenly disappear, the chance of such a thing happening due to sheer luck would be vastly smaller than it would be if only 40 people prayed to a deity and had their cancer disappear due to sheer luck.

The number of types of protein molecules used by earthly life is actually in the billions.  The source here estimates the number as being between ten billion and ten trillion:

"For example, assuming there is 107–108 species on Earth and the genome of each species consists of 103–105 genes, there are 1010–1013 unique protein sequences, a speck compared to the vast sequence space, but still several orders of magnitude more than contained in today's databases."

Each different type of protein molecule is its own separate complex invention. If the accidental origin of a protein molecule can be described as a miracle, then we must believe in between ten billion and ten trillion such miracles if we are to believe that all earthly life naturally originated. You could put this another way by saying that if we believe that earthly life has accidentally originated, we must believe that the miracle of the appearance of a new type of protein molecule originating has occurred at an average rate of at least once per month for the past billion years, and perhaps as often as once every hour. 

Relevant number 5: the percentage of protein domains that have been shown to be units that can exist and function independently

Scientists have specified that certain protein molecules have more than one section, what is called a domain. Exactly what constitutes a protein domain is unclear, and the classification of protein domains varies between different protein databases that store information about proteins. When a scientist thinks that a protein molecule consists of two or more sections, he may designate different parts of the protein molecules as different domains. This is often as arbitrary as someone declaring what is the first part of a movie, or someone stating what he thinks is the upper part of a human being.  The concept of protein domains seems largely like an arbitrary theoretical construct. 

The concept of protein domains would seem to offer a little glimmer of hope for someone discouraged by the immense improbability of a protein molecule forming by chance. Such a person might argue that it would not be so hard for a protein molecule to form, because you would only need to form the domains of that protein, and then those domains could combine to make the larger protein.

But is it generally true that protein domains have been shown to be something that can exist independently with their own function? Such a thing has not at all been generally shown. Do not be fooled by the definition that is sometimes given of a protein domain. Wishing to have a clearer definition of the nebulous idea of a protein domain, some writers have defined a protein domain as a part of a protein molecule with its own function that can exist independently. But countless thousands of protein domains have been classified, and it has never been shown that even 5 percent of these can exist independently while performing a function. To show such a thing, you have to do very specific and hard-to-perform experiments that have very rarely been done.

The passage below (from this source) makes clear that all kinds of different criteria have been used when classifying different domains of a protein:

Primarily, the term domain means the distinct structural block of a protein, but quite different criteria are presently used to identify this block. Identification can be based on observation of independent folding (2), sequence motifs (16), presence of a distinct hydrophobic core (17), functional activity (17,18), contact classification (19), topology (20), structural homology (21), independent mobility (22–25), and other properties. Since domain-domain interactions can occur in a broad range, varying from almost complete structural and dynamic independence to their complete integrity, the application of these criteria may lead to quite different results.”

In the paper here, the authors did experiments suggesting that the domains of multi-domain proteins may be too unstable to exist and function independently, noting that such domains are “significantly less stable” than single-domain proteins.

I cannot give an exact number for this “percentage of protein domains that have been shown to be units that can exist and function independently,” but it would seem that this number is not greater than ten percent. This is the first of several numbers I will cite to show that the concept of protein domains does little to reduce the gigantic improbability involved in any accidental origin of a protein molecule.

Relevant number 6: the percentage of protein molecules that have been classified as single-domain proteins

If you are a person hoping that protein domains can help reduce the improbability of new protein molecules originating, you will hope that all protein molecules are multi-domain proteins. Unfortunately for such a person, a very large fraction of protein molecules are single-domain proteins.

It has been estimated that 35% of the proteins in eukayrotic cells are single-domain proteins, and that 60% of the proteins in prokaryotic cells are single-domain proteins.

This means that protein domains are of little use in helping to reduce the vast improbability of the formation of a protein molecule (or the gene that codes for that molecule). At the very best,  protein domains might cause about a 65% reduction in the improbability of there being many functional protein molecules that originated accidentally. But such a reduction is trivial when you are dealing with the gigantic improbabilities involved.

To use an analogy, suppose some observers saw a person throwing a hundred decks of cards into the wind, and the observers counted that 100 times all of the cards formed into an elegant house of cards. The chance of this happening naturally is for all practical purposes zero. Now, suppose it was shown that the tally was in error, and that only 33 of the decks of cards thrown into the air had formed into houses of cards. That would still leave you with something that had a probability that was for all practical purposes equal to zero. Similarly, if only about half of proteins are multi-domain proteins, this means that 50% of proteins are single-domain proteins. Such single-domain proteins cannot be explained through any claim that they formed from protein domains that were functional intermediates.

Relevant number 7: the average size of a protein domain

If you are a person hoping that protein domains can help reduce the improbability of new proteins forming, you would hope that the average size of a protein domain is small. For example, you might hope that the average size of a protein domain might be merely about twenty amino acids.  If it were then true that protein molecules could form from "building blocks" of protein domains,  it would not be too hard for such "building blocks" to arise if you only needed about twenty amino acids in each of them. 

But the facts are otherwise. The scientific paper here states that "the average length of a protein domain is approx. 120 amino acids."  Since there are twenty possible amino acids that can exist in a protein, the probability of getting a random sequence of amino acids matching the exact amino acid sequence of a protein domain would be about 1 in 20 to the 120th power, which equals about 1 in 10 to the one-hundred-fifty-sixth power.  This means that by chance we would never expect functional protein domains to ever appear. 

Relevant number 8: the average number of domains in a protein

If you are a person hoping that protein domains can help reduce the improbability of new proteins forming, you will hope that the average number of domains in a protein is a number such as 10 or 20. If that were true, it might significantly reduce the improbability that proteins could have naturally originated. For example, suppose that without considering protein domains we calculate that there is only 1 chance in 10 to the two-hundredth power of a particular protein forming from its component amino acids. If that protein can be built from ten different protein domains, then we might calculate that there is only 1 chance in 10 to the twentieth power of each domain forming. In such a case the improbability of the protein forming naturally would be greatly reduced.

Unfortunately, the average number of domains in a protein is small. The table below (from this paper) gives us an indication of the average number of domains in proteins. For example, in the Metazoa column the numbers tell us what percent of proteins consist of two domains, what percent of proteins consist of three domains, and so forth.   The average protein has only about 1.3 domains. This means that protein domains do very little to reduce the vast improbability of functional protein molecules accidentally originating.

protein domains

Relevant number 9: the “average promiscuity” of protein domains

When scientists think that a protein domain is used by more than one protein, they call such a protein domain a “promiscuous” domain. The subject of promiscuous domains offers a potential way to find some evidence that the accidental evolution of proteins is not as gigantically improbable as it initially seems. Conceivably it could be discovered that there is great deal of “code re-use” going on in proteins, which might it easier to explain their origins. The promiscuity of a particular protein domain can be defined as the number of proteins that use that domain.

Scientific studies have been done searching for promiscuous domains used by more than one protein. But such studies have found very little evidence that domains are widely re-used by proteins. The study here found that only 147 protein domains are used by more than one protein. It also found that when protein domains are promiscuous, they are usually used by only between 1 and 5 different proteins. Only a handful of protein domains are used by more than 10 proteins.

Similarly, the science textbook here concludes "there are few common folds" in the universe of all proteins. 

A relevant number to consider here is the average promiscuity of protein domains. If that number were 4, it would mean that the average protein domain is used by four different proteins. It seems from what I discussed in the previous paragraph that the average promiscuity of protein domains is very close to 1.0. Since there are billions of different types of proteins, we find almost no evidence of “code re-use” and promiscuity if we can only find 147 protein domains used by more than one protein (with most of these being used by only between 1 and 5 proteins). It would seem that the average promiscuity of protein domains is less than 1.1.

What this means is that there is virtually no code-reuse in proteins. A high degree of code reuse in proteins would slightly alleviate the problem of explaining the accidental origin of proteins, but there is no such high degree of code reuse. 

Relevant number 10: the percentage of proteins requiring helper molecules called chaperone proteins

One of the greatest mysteries of sciences is how proteins fold into their characteristic three-dimensional shapes, which are required for them to be functional. Part of the answer is to be found in the fact that a protein molecule may be helped by some other protein molecule that helps it fold into the right shape. These helper molecules are called chaperone proteins.

The dependency of many protein molecules on other additional molecules is an additional factor that should make us doubt that protein molecules could have accidentally originated. Let's imagine a hypothetical case. Suppose there is a protein molecule consisting of 200 amino acids arranged in just the right way to achieve a functional effect. Suppose that this protein molecule cannot fold properly unless it is helped by some other protein molecule (a chaperone protein) that itself consists of 200 amino acids. This means that 400 amino acids had to become arranged in just the right way for the functionality to work. The chance of that happening would be vastly smaller than the case in which we only had to have 200 amino acids arranged in the right way.

So if you believe that earthly life originated by purely accidental processes, you should hope that very few protein molecules require chaperone proteins for their proper function. But according to the source here, twenty to thirty percent of protein molecules require chaperone proteins. So twenty to thirty percent of protein molecules cannot even exist independently in a functional state, and have a dependency on other protein molecules in order to fold properly. This would seem to be another huge example of fine-tuning in biology, fine-tuning we would not expect to exist by chance.

Relevant number 11: the probability of a random mutation breaking the functionality of a protein molecule

It is easy to ruin a protein molecule by making minor changes in its sequence of amino acids. Such changes will typically “break” the protein so that it will no longer fold in the right way to achieve the function that it performs. A biology textbook tells us, "Proteins are so precisely built that the change of even a few atoms in one amino acid can sometimes disrupt the structure of the whole molecule so severely that all function is lost." And we read on a science site, "Folded proteins are actually fragile structures, which can easily denature, or unfold." Another science site tells us, "Proteins are fragile molecules that are remarkably sensitive to changes in structure." paper describing a database of protein mutations tells us that "two thirds of mutations within the database are destabilising." Evolutionary biologist Richard Lewontin stated"It seems clear that even the smallest change in the sequence of amino acids of proteins usually has a deleterious effect on the physiology and metabolism of organisms."

In a recent paper we read this: "For example, an analysis of 8,653 proteins based on single mutations (Xavier et al., 2021) shows the following results: ~68% are destabilizing, ~24% are stabilizing, and ~8,0% are neutral mutations...while a similar analysis from the observed free-energy distribution from 328,691 out of 341,860 mutations (Tsuboyama et al., 2023)...indicates that ~71% are destabilizing, ~16% are stabilizing, and ~13% are neutral mutations, respectively."

A very relevant scientific paper is the paper "Protein tolerance to random amino acid change." The authors describe an "x factor" which they define as "the probability that a random amino acid change will lead to a protein's inactivation." Based on their data and experimental work, they estimate this "x factor" to be 34%. It would be a big mistake to confuse this "x factor" with what percentage of a protein's amino acids could be changed without making the protein non-functional.  An "x factor" of 34% actually suggests that almost all of a protein's amino acid sequence must exist in its current form for the protein to be functional.  

Consider a protein with 375 amino acids (the median number of amino acids in humans).  If you were to randomly substitute 4% of those amino acids (15 amino acids) with random amino acids, then (assuming this "x factor" is 34% as the scientists estimated), there would be only about 2 chances in 1000 that such replacements would not make the protein non-functional.  The calculation is shown below (I used the Stat Trek binomial probability calculator). 




So the paper in question suggests protein molecules are extremely fine-tuned, fragile and sensitive to changes, and that more than 90% of a protein's amino acid sequence has to be in place before the molecule is functional.  

Relevant number 12: the number of protein complexes in organisms

Just as it is not true at all that each employee in a company can do his job working all by himself, it is not true at all that every protein molecule needs nothing beside itself to do its job within a cell. A large fraction of all protein molecules cannot do any useful function unless they are part of some team of protein molecules. Such teams are called protein complexes. 

Roughly speaking, the accidental appearance of a protein complex containing a total of x amino acids is about as unlikely as the accidental appearance of a single protein consisting of x amino acids.  For example, some particular function might be performed by a single protein that required 900 amino acids to be arranged in just the right way, or it might be performed by a protein complex consisting of one protein molecule with 200 amino acids, another protein molecule consisting of 350 amino acids, and a third protein molecule consisting of 350 amino acids. Either way, we have a situation where 900 amino acids have to be arranged in just the right way. 

The more protein complexes there are in a particular organism, the more carefully the biochemistry of that organism has to be organized, and the lower the likelihood of the accidental origination of such an organism. Figure 1 of the paper here suggests that there are many thousands of protein complexes in the human body.  The paper here (attempting to map only "soluble" protein complexes) claims to have mapped 600+ protein complexes in the human genome.  The 2023 paper here says that the CORUM database now includes 5204 protein complexes, 70% of which are human (meaning there must be thousands of different types of protein complexes in the human body). The paper also says, "Recent proteomic experiments discovered a human protein complex map consisting of 6965 different complexes." 

The paper here notes that "a general theoretical framework to understand protein complex formation and usage is still lacking."  The very formation of protein complexes (which happens very rapidly) is a miracle of organization beyond the understanding of science. Since humans have 20,000+ different proteins, we should not expect such complexes to arise by chance combinations of proteins; and DNA does not specify which proteins belong to particular protein complexes. 

Relevant number 13: the average number of proteins in a protein complex

Besides the total number of protein complexes in an organisms (each requiring multiple proteins), a number of similar relevance is the average number of proteins in a protein complex. Figure 1 of the paper here suggests that the average protein complex in the human body requires about seven different proteins. The greater the average number of proteins in protein complexes, the more carefully the biochemistry of that organism has to be organized, and the lower the likelihood of the accidental origination of such an organism.

Figure 3 of the 2023 paper here ("Identification of Protein Complexes by Integrating Protein Abundance and Interaction Features Using a Deep Learning Strategy") gives us these figures for protein complexes in the human body.:

Protein complexes with 2 proteins: 2188 
Protein complexes with 3 proteins: 1160
Protein complexes with 4 proteins: 671
Protein complexes with 5 proteins: 328
Protein complexes with 6 proteins: 209
Protein complexes with 7 proteins: 138
Protein complexes with 8 proteins: 85
Protein complexes with 9 proteins: 51
Protein complexes with 10 proteins: 30

In the figures above it is unclear whether references to protein complexes with a certain number of proteins are actually references to protein complexes with a certain number of types of proteins.  A particular protein complex that uses, say, three types of proteins may actually consist of more than three proteins. 

Relevant number 14: the number of proteins used in only one protein complex

Whenever a protein is part of a protein complex, and that protein is used only in a single protein complex, it is very much harder to explain the origin of such a protein. If the protein is useful only within a single protein complex, the requirement for having the other proteins in the complex is a case of additional very hard-to-achieve requirements for the protein's usefulness, decreasing the chance that such a protein would ever accidentally appear. According to figure 2 of the paper here, about 3500 proteins are used in only a single protein complex, and not reused in any other protein complex. 

Relevant number 15: the number of protein molecules in a cell

A good indicator of the complexity and functional intricacy of a cell is the number of protein molecules inside the cell.  This has recently been estimated as being 42 million. 

Relevant number 16: the number of cell types in the human body

It has been estimated that there are about 200 different cell types in the human body. None of these can be explained by random mutations that occurred in DNA, because DNA does not specify the plan or blueprint for a particular cell.

Relevant number 17: the number of organelles in the most complex cell types

An organelle is a structural unit inside a cell. The more organelles that exist in cells, the more complex such cells are. The origin of a cell with many organelles is much harder to explain naturally than the origin of a cell with few organelles. Similarly, it would be much harder to explain the natural origin of a house with 100 rooms (by something like falling trees accidentally arranging themselves into a house) than it would be to explain the natural origin of a house with only one or two rooms.

How many organelles do cells have? Schematic diagrams of cells are constantly misleading us by depicting cells with only a few organelles. Specifically:

  • A cell diagram will typically depict a cell as having only a few mitochondria, but cells typically have many thousands of mitochondria, as many as a million.
  • A cell diagram will typically depict a cell as having only a few lysosomes, but cells typically have hundreds of lysosomes.
  • A cell diagram will typically depict a cell as having only a few ribosomes, but a cell may have up to 10 million ribosomes.
  • A cell diagram will typically depict one or a few stacks of a Golgi apparatus, each with only a few cisternae. But a cell will typically have between 10 and 20 stacks, each having as many as 60 cisternae.
The source here tells us that a typical large organism such as a mammal "has hundreds or thousands of these organelles in each of their cells." So most cells are incredibly complicated things, contrary to the impression you might get from looking at a cell diagram. 

Relevant number 18: the number of mammalian body structures specified in genomes

An extremely important number relevant to claims of accidental  biological origins are the number of mammalian body structures  specified in genomes (the same as DNA molecules).  If a genome typically stores a body structure for an organism, then it is conceivable that we can explain a transition from one type of animal species to another by imagining that there were gradual changes in such a genomic body plan. But if no genome stores a body plan, we have no explanation as to how one species could have evolved into some other species with a vastly different body plan. 

The actual number of body structures that are stored in the genomes of mammals is zero. The idea that DNA is a blueprint for making a human body or a recipe for making a human body is a myth without any basis in fact. Genomes or DNA only store low-level chemical information such as the amino acid sequence of a protein, not high-level structural information.  Genomes are neither blueprints for building organisms nor recipes for making organisms nor programs for making organisms (something confessed by the dozens of biology authorities I quote at the end of this post). So how can we explain the origin of different animal body structures by imagining some gradual change in DNA or genomes? We can't.  

Relevant number 19: the number of organs, limbs or appendages  specified in genomes

As relevant as number 17 is the number of organs, limbs or appendages that are specified in genomes or DNA.  Just as number 17 is zero, this number is also zero. Genomes or DNA only store low-level chemical information such as the amino acid sequence of a protein, not high-level structural information such as a specification of an organ, a limb or a body appendage. So how can we explain the origin of different organs and limbs and appendages by imagining some gradual change in DNA or genomes? We can't.  

Relevant number 20: the number of cell types specified in genomes

As relevant as number 17 and number 18 is the number of cell types that are specified in genomes or DNA.  Just as number 17 and number 18 are zero, this number is also zero. Genomes or DNA only store low-level chemical information such as the amino acid sequence of a protein, not high-level structural information such as the structure of a cell.  Humans have about two hundred types of cells, and DNA does not specify how to make any one of them.  So how can we explain the origin of different cell types by imagining some gradual change in DNA or genomes? We can't.  

Relevant number 21: the number of natural protein molecules or natural genes that have been proven to have originated from random mutations, natural selection, or any combination of the two 

An important number to consider is: of the total number of different types of protein molecules in the animal kingdom (estimated to be between 10 billion and 10 trillion), how many have been proven to have originated from random mutations, natural selection, or any combination of the two? The answer is zero. 

We can imagine a hypothetical long-term experiment by which a scientist might substantiate the idea that a useful new type of protein molecule can originate through random mutations or natural selection. Some organisms could have their DNA thoroughly mapped, and the organisms could be placed in a zoo or a lab. The descendants of the organisms could be tracked over multiple generations, with organisms from each generation having their DNA thoroughly mapped.  It might be determined that some new type of protein molecule was being formed, from some accumulation of random mutations occurring over multiple generations. No such experiment has ever succeeded in showing that any new type of protein molecule can originate from natural selection, random mutations, or any combination of the two. 

Relevant number 22:  the number of living things that have originated in experiments realistically simulating the early Earth

If some experiment realistically simulating early Earth conditions could produce a living self-reproducing organism from non-life, that would do a great deal to bolster claims that life first originated through some accidental process.  No living self-reproducing organism has ever been produced by such an experiment. 

Relevant number 23:  the number of functional proteins that have originated in experiments realistically simulating the early Earth

One-celled living things require many types of protein molecules. We might call such protein molecules the "building blocks" of one-celled microbes, except that such a term would misleadingly suggest protein molecules are simple (and as discussed above, most protein molecules are not simple, but consist of hundreds of amino acids arranged in just the right way to produce a functional effect).  A number relevant to the credibility of claims of accidental biological origins is the number of functional protein molecules that have originated in experiments attempting to simulate the early Earth.  That number is zero. 

Relevant number 24: the number of amino acid types that have originated in experiments realistically simulating the early Earth

Often called the building blocks of life, amino acids are more correctly described as the building blocks of the building blocks of cells.  Living things use twenty different types of amino acids. The more types of amino acids that can be produced in experiments realistically simulating early Earth conditions, the more credible is the idea that life might have accidentally originated from non-life long ago on Earth. 

The most famous experiment attempting to produce amino acids while simulating early Earth conditions was the Miller-Urey experiment, which produced several types of amino acids used by living things. Unfortunately the Miller-Urey experiment never was a realistic simulation of early Earth conditions, for reasons explained below and in the post here

Miller-Urey experiment

The number of types of amino acids that have been produced through experiments realistically simulating early Earth conditions is either 0, 1 or 2. Most experiments claiming to simulate early Earth conditions have failed to do so. 18 out of the 20 amino acids used by living things have never been produced in experiments realistically simulating the early Earth.  Some experiments which were arguably realistic simulations of early Earth have produced either alanine or glycine (the two simplest amino acids) in very tiny trace amounts such as 40 parts per million. 

Relevant number 25:  the number of nucleotide types that have originated in experiments realistically simulating the early Earth

Life requires not just many types of proteins, but also nucleic acids such as DNA and RNA. The building blocks of DNA are four nucleotides:   adenine (A), cytosine (C), guanine (G), and thymine (T). The building blocks of RNA are three of these nucleotides and another one, uracil.  

The more types of nucleotides that can be produced in experiments realistically simulating early Earth conditions, the more credible is the idea that life might have accidentally originated from non-life long ago on Earth.  Unfortunately, none of these five nucleotides has ever been produced in an experiment realistically simulating early Earth conditions. 

There is no geological, astronomical or meteorological reason for thinking that amino acids or nucleotides existed in anything other than negligible amounts before life existed, and there is no evidence  basis for believing that there ever existed any such thing as a prebiotic "primordial soup" that was rich in either amino acids or the building blocks of DNA (nucleotides).

Relevant number 26:  the number of different chiral forms that an amino acid can have

Amino acids have two different chiral forms: a left-handed form and a right-handed form. When they are artificially created, amino acids with a left-handed form are produced in the same numbers as amino acids with a right-handed form. Excluding glycine (which is essentially too simple to have either a left-handed or a right-handed form), all of the 20 amino acids used by living things occur only in the left-handed form.  

The fact that amino acids have two different chiral forms is a very great problem for all claims that life accidentally originated.  Because the simplest self-reproducing cell would require hundreds of different types of proteins, most using hundreds of amino acids, it would seem that more than 20,000 amino acids would be needed for the origin of life, all of them left-handed.  The probability of all of those amino acids being left-handed is like the probability of you flipping a coin 20,000 consecutive times, and always getting "tails" without ever getting "heads."  This problem (discussed at greater length here) is called the problem of homochirality.  Homochirality seems like an accidentally unachievable state. 

Relevant number 27:  the number of different types of protein molecules in the human body

Previously I gave a hard-to-remember estimate of the total number of different types of protein molecules in the animal kingdom (between 1010–1013 unique protein sequences).  An easier-to-remember number is the total number of different types of protein molecules in the human body. That number is roughly 20,000.  Each of these is its own separate complex invention, most with hundreds of different amino acid parts arranged in just the right way to produce a particular function.  Our biology teachers do such a poor job of teaching the realities of biological complexity that if you were to tell someone that inside his body there are 20,000 different types of complex inventions, he might think you are joking. Such a reality should be as familiar to biology students as the fact that humans are made of carbon compounds. 

Conclusion

The numbers I have discussed here collectively argue with overwhelming force that we have no understanding of how the wonders of biology could have originated through accidental processes.  Many or most of the numbers I have discussed are very strong reasons for thinking it is incredibly improbable that the earth's wonders of biology could have originated through any such random or accidental process. But our biologists typically tell us the opposite.  How can this occur?

It's simple. When biologists tell us things such as that life first originated by accidental combinations of chemicals, and that new species and dramatic biological innovations arose from mere random mutations and survival of the fittest, they are not engaging in numerical reasoning.  Such claims have never been based on numerical reasoning.  When they attempt to convince us that the earth's organisms accidentally originated, biologists today follow the approach of Darwin, who made no appreciable use of numerical reasoning in any of his published works.  Today's biologists do not pay attention to most of the numbers I have discussed, which are all numbers that anyone should pay very close attention to before making a judgment about the origin of earthly organisms.  

We may compare such biologists to someone who becomes convinced he's going to make a huge profit from buying a particular house, but who pays no attention to relevant numbers such as the house's current sales price, the house's price compared to similar houses on the street, the house's sale price when previously sold, the mortgage rate,  the probability of flooding at the house location, and the current rate of increase or decrease in house prices in that house's location.  Just as fundamentalists do not use numerical reasoning to reach the belief that the Bible is infallible, biologists do not use numerical reasoning to reach the belief that the earth's species accidentally originated. Both of these beliefs are articles of faith rather than conclusions reached through numerical reasoning. 

The lack of relevant probability calculations by Darwinists bothered the eminent physicist Wolfgang Pauli, discoverer of the subatomic Pauli Exclusion Principle on which our existence depends. Pauli stated the following:

"I should like to critically object that this model has not been supported by an affirmative estimate of probabilities so far. Such an estimate of the theoretical time scale of evolution as implied by the model should be compared with the empirical time scale. One would need to show that, according to the assumed model, the probability of de facto existing purposeful features to evolve was sufficiently high on the empirically known time scale. Such an estimate has nowhere been attempted though."

Pauli also stated the following about Darwinist biologists:

“In discussions with biologists I met large difficulties when they apply the concept of ‘natural selection’ in a rather wide field, without being able to estimate the probability of the occurrence in a empirically given time of just those events, which have been important for the biological evolution. Treating the empirical time scale of the evolution theoretically as infinity they have then an easy game, apparently to avoid the concept of purposesiveness. While they pretend to stay in this way completely ‘scientific’ and ‘rational’, they become actually very irrational, particularly because they use the word ‘chance’, not any longer combined with estimations of a mathematically defined probability, in its application to very rare single events more or less synonymous with the old word ‘miracle’.”