Header 1

Our future, our universe, and other weighty topics


Wednesday, July 3, 2024

Gelernter Greatly Underestimated the Unlikelihood of Darwinian Protein Origination

 In 2019 computer scientist David Gelernter published a widely discussed book review entitled "Giving Up Darwin" that enraged many a biologist. Gelernter stated, "The origin of species is exactly what Darwin cannot explain." To back up this claim, he first starts out discussing the Cambrian Explosion, a period of relatively short length under which almost every animal phyla originated. He lists the Cambrian Explosion as occurring over 70 million years, but a more common estimate is only about 30 million years. Gelernter states this:

"Darwin’s theory predicts that new life forms evolve gradually from old ones in a constantly branching, spreading tree of life. Those brave new Cambrian creatures must therefore have had Precambrian predecessors, similar but not quite as fancy and sophisticated. They could not have all blown out suddenly, like a bunch of geysers. Each must have had a closely related predecessor, which must have had its own predecessors: Darwinian evolution is gradual, step-by-step. All those predecessors must have come together, further back, into a series of branches leading down to the (long ago) trunk. But those predecessors of the Cambrian creatures are missing...In fact, the fossil record as a whole lacked the upward-branching structure Darwin predicted. The trunk was supposed to branch into many different species, each species giving rise to many genera, and towards the top of the tree you would find so much diversity that you could distinguish separate phyla—the large divisions (sponges, mosses, mollusks, chordates, and so on) that comprise the kingdoms of animals, plants, and several others—take your pick. But, as Berlinski points out, the fossil record shows the opposite: 'representatives of separate phyla appearing first followed by lower-level diversification on those basic themes.' In general, 'most species enter the evolutionary order fully formed and then depart unchanged.' The incremental development of new species is largely not there. Those missing pre-Cambrian organisms have still not turned up."

But this is not the main part of Gelernter's case against Darwinian explanatory boasts. The main part of his case is based on the complexity of living things, particularly the complexity of protein molecules. Citing an average number of amino acids that is much smaller than the actual average, Gelernter tells us, "A protein molecule is based on a chain of amino acids; 150 elements is a 'modest-sized' chain; the average is 250." He tells us that for you to get a functional protein you need to get a very special arrangement of amino acids.  He states this:

"Now at last we are ready to take Darwin out for a test drive. Starting with 150 links of gibberish, what are the chances that we can mutate our way to a useful new shape of protein? We can ask basically the same question in a more manageable way: what are the chances that a random 150-link sequence will create such a protein? Nonsense sequences are essentially random. Mutations are random. Make random changes to a random sequence and you get another random sequence. So, close your eyes, make 150 random choices from your 20 bead boxes and string up your beads in the order in which you chose them. What are the odds that you will come up with a useful new protein?...The total count of possible 150-link chains, where each link is chosen separately from 20 amino acids, is 20150. In other words, many. 20150 roughly equals 10195, and there are only 1080  atoms in the universe. What proportion of these many polypeptides are useful proteins?"

Gelernter tells us that the ratio of long useful amino acid sequences (compared to useless amino acid sequences that will not be the basis of functional proteins) is incredibly small. He cites a paper by Douglas Axe estimating that the ratio is something like 1 in ten to the seventy-fourth power, or about 1 in 1074 . 

Gelernter states this:

"Try to mutate your way from 150 links of gibberish to a working, useful protein and you are guaranteed to fail. Try it with ten mutations, a thousand, a million—you fail. The odds bury you. It can’t be done."

The phrasing of the middle sentence is a great understatement. What is should be is something like "Try it with a million mutations, a billion, a trillion, a quadrillion, a quintillion—you fail." If you have some result that you can only get about 1 in 1074 attempts, then you can try 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 times, and you still very probably do not succeed.  According to the paper here, "we arrive at a figure of 4×1021 different protein sequences tested since the origin of life." The problem is that isn't enough tries to get even one success, if you're talking about proteins of average length.  If you have some result that you can only get about 1 in 1074 attempts, then 4×1021 tries will not give you a 1 in 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 chance of a single success.

For some reason Gelernter uses some figure for the total number of mutations (roughly the same as the total number of tried amino acid sequences) vastly higher than the figure quoted in the paper above. He states this:

"In any case, there have evidently been, in the whole history of life, around 1040 bacteria—yielding around 1040 mutations under Axe’s assumptions. That is a very large number of chances at any game. But given that the odds each time are 1 to 1077 against, it is not large enough. The odds against blind Darwinian chance having turned up even one mutation with the potential to push evolution forward are 1040x(1/1077)—1040 tries, where your odds of success each time are 1 in 1077—which equals 1 in 1037. In practical terms, those odds are still zero. Zero odds of producing a single promising mutation in the whole history of life. Darwin loses...Neo-Darwinianism says that nature simply rolls the dice, and if something useful emerges, great. Otherwise, try again. But useful sequences are so gigantically rare that this answer simply won’t work."

Overall Gelernter's bold article was a fine piece of work, and he did a good job of explaining some of the reasons why Darwinism fails to explain the origin of any of the protein molecules in our bodies, and therefore fails to explain the origin of species. But there is a reason why the case against a Darwinian origin of proteins is very much stronger than he suggested. Gelernter misstated the average number of amino acids in a protein. He states, "A protein molecule is based on a chain of amino acids; 150 elements is a 'modest-sized' chain; the average is 250." No, according to the 2012 scientific paper here, "Eukaryotic proteins have an average size of 472 aa [amino acids], whereas bacterial (320 aa) and archaeal (283 aa) proteins are significantly smaller (33-40% on average)." Mammals like us have eukaryotic proteins, so the average human protein has about 472 amino acids, almost twice as many as the number Gelernter cited. The 2021 paper here lists the median amino acid length of a eukaryotic protein as 353, and has visuals showing that organisms such as humans have very many proteins with more than 500 amino acids each (Figure 1).  A median is a 50th percentile value. In most sets the median and the average are similar, but the median can be substantially different from an average. The 2005 paper here gives numbers of 375 and 416 as the median number of amino acids in human proteins.  

Getting this number right is very important, because if humans have proteins with an average of 472 amino acids, then it isn't merely about twice as hard to get human-type proteins than with proteins with a size of 250 amino acids, but exponentially harder or geometrically harder (as in more than 1,000,000,000,000,000,000,000,000,000,000,000 times harder). An incredibly important point of probability calculation is that the difficulty of getting meaningful useful results from random combinations rises exponentially or geometrically when there occurs a simple linear increase in the number of parts that must be well-arranged. It isn't twice as hard to get from a random character generator a grammatical, useful, well-spelled sentence of 200  characters than a grammatical, useful, well-spelled sentence of 100 characters -- it's more than a million billion trillion times harder. Similarly, getting functional folding protein molecules with a length of nearly 500 amino acids ends up being exponentially harder (very, very many times harder) than getting  functional folding protein molecules with a length of about 250 amino acids.

Let's do some simple math to show the difference here between the right numbers. A reasonable assumption is that every functional protein needs to have at least half of its amino acid sequence just as it is, or the molecule will not perform its function. (There are reasons for thinking that the fraction is actually much larger than 50%, which I give at the end of this post.)  So given that there are twenty amino acids used by living things, the probability of getting a random amino acid sequence serving the purpose of a particular protein can be very roughly estimated as 1 in 20nwhere n is half the length of a protein's amino acid sequence. If we have a protein with a sequence of 250 amino acids, this equals a probability of about 1 in 20125which is the same as about 1 in 10162But if we have a protein with a sequence of 472 amino acids, this equals a probability of roughly 1 in 20236which is the same as about 1 in 10307.  So estimating a chance of something like 1 in 1037, Gelernter has vastly underestimated the difficulties of a Darwinian origination of a functional protein. The odds are almost infinitely worse than he suggests. 

Another reason why Gelernter has vastly underestimated the difficulties of a Darwinian origination of protein molecules is that his estimates revolve around a protein of average length. In estimating the improbability of unlikely events, we should be paying attention to not just average results but above-average results.  The fact is that there are hundreds of types of human proteins with amino acid lengths greater than 2000, as I discuss in the appendix of this post. There are also more than 700 types of human proteins with amino acid lengths greater than 1000.  A reasonable calculation of the likelihood of a Darwinian origination of any human protein molecule with an amino acid length greater than 1000 would leave you with numbers gigantically smaller than the 1 in 1037figure Gelernter used.  You would get likelihood figures something like 1 in 10 to the 500th power, very definitely a "never in the history of the universe" kind of improbability. 

Another reason why Gelernter has vastly underestimated the difficulties of a Darwinian origination of protein molecules is that he failed to mention the extremely important point that a large fraction of all protein molecules (and quite possibly a majority of them) are individually useless, because the protein molecules only function when they act as team members within protein complexes consisting of multiple types of proteins (often five or more).  This fact makes a Darwinian origination of protein molecules gigantically and exponentially more improbable.  Very roughly, we can think of such a situation as multiplying by several times the number of well-arranged parts that must exist before functionality occurs. So instead of just having an average of about 400 well-arranged amino acid parts for functionality to be reached, we have in a large fraction of all cases involving protein function a case in which thousands of well-arranged amino acid parts (existing across several types of proteins in a protein complex) must exist before there is any function. Calculating the odds of getting such protein complexes by Darwinian effects gives you probabilities very, very many orders of magnitude smaller than the probabilities  Gelernter mentioned (I would understate if I said something like 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 smaller). 

Any attempt to escape such improbabilities by evoking the idea of proteins being functional in fractional states can be discredited by discussing the extreme fragility of protein molecules and their extreme sensitivity to small random changes, the fact that small random changes in protein molecules tend to destroy their functionality. I will leave a discussion of that to the appendix of this post. As four Harvard scientists said, "A wide variety of protein structures exist in nature, however the evolutionary origins of this panoply of proteins remain unknown.

The mathematics above may make the reader's head spin. So let's look at the issue from a different way, a way that will use a simple visual. The visual is below. It depicts the Darwinian story of the evolution of humans. 

Darwinism problem


We see a pyramid. Each brick in the pyramid represents roughly a million human lifetimes. It is not believed that humans existed prior to about 200,000 years ago. The defining characteristic of humans is the use of symbols, and there is no robust evidence of any symbol manipulation prior to about 100,000 years ago. The human population has increased vastly since 200,000 years ago. It is believed that 200,000 years ago the number of humans was very small, being only about 20,000. 

We know that there has been no major evolution of human beings since about 8000 BC. The main such evolution claimed is some change in lactose digestion, which does not qualify as major evolution. The earliest artwork of humans depict  humans looking as they now look. We know that humans before the time of Jesus were about as smart as humans now are. Anyone reading the works of Plato (which survive in great number) will read the work of a mind as brilliant and subtle as any that exists today. It is rather obvious that the minds who designed how to build structures such as the Great Pyramid of Giza (which survive after 4500 years) with the primitive tools of the time must have been as ingenious as almost any one living today. 

Now, the visual above depicts a story that makes no sense.  According to the story, during only a relatively small number of human lifetimes (between about 200,000 BC and 8000 BC) there occurred some vast evolutionary leap allowing previously uncivilized creatures to become language-using, reasoning and civilized beings. But, according to the same story, during a vastly greater number of lifetimes (occurring after 8000 BC) there has occurred no major evolution of humans. The story makes no sense, and is implausible as claiming that a million people died from lightning in one year, with fewer than 1000 dying from lightning in the next century.  Why would there have been such a vast evolutionary leap during a relatively small number of human lifetimes, and no major human evolution during a number of human lifetimes very many times larger? 

Gelernter was haughtily denounced by Darwinism devotees, on the grounds that he was not a biologist, and that his critique was not peer-reviewed. But biologists sometimes make similar critiques. In late 2023 we had a peer-reviewed paper by several biologists stating the following:

"There is a growing sense of unease among biologists that there are serious shortcomings in the Neo-Darwinian framework, in particular that several of its central assumptions are wrong and that, as a result, it lacks explanatory power. The problems are many and likely fatal."

accidents do not invent

Appendix: I will now tell you how to get an authoritative answer about how many human protein molecules have more than 2000 well-arranged amino acid parts. Using the UniProt protein database that anyone can use without a login, you go to www.uniprot.org, and type in the following search phrase (or, using less effort, just click on the link below):

(length:[2000 TO 50000]) AND (organism_name:"Homo sapiens")


This gives you a results screen like the one below.


You will see more than 1000 rows in the result set. The results will first show the simplest proteins with more than 2000 amino acids. Click on the Length column header, and the results will be sorted like we see above, with the most complex proteins shown first. 

There seem to be some duplicates in the results, or cases of proteins that are minor variations of the same protein.  But scrolling through the results, you will be able to see two things:

(1) There are at least hundreds of types of proteins in the human body that each have thousands of amino acids.
(2) The most complex proteins in the human body have more than 10,000 well-arranged amino acids. For example, the Titin protein consists of more than 30,000 well-arranged amino acids. 

Using a variation of the search string above, you can get an idea of how many types of human protein molecules have more than 1000 amino acids each. For example, suppose you change the www.uniprot.org search string to be the one below (or just click on the link below):


You will get a result set of more than 8000 rows. Allowing for many duplicates, we can assume that human bodies contain more than 1000 types of "highest complexity" protein molecules, where "highest complexity" means having more than 1000 amino acids. 

Below (as promised above) are some quotes establishing the extreme fragility of protein molecules, and how small random changes destroy their functionality (an issue of great relevance to whether such molecules can originate by Darwinian processes):

  • "It seems clear that even the smallest change in the sequence of amino acids of proteins usually has a deleterious effect on the physiology and metabolism of organisms." -- Evolutionary biologist Richard Lewontin, "The triple helix : gene, organism, and environment," page 123.
  • "Proteins are so precisely built that the change of even a few atoms in one amino acid can sometimes disrupt the structure of the whole molecule so severely that all function is lost." -- Science textbook "Molecular Biology of the Cell."
  • "To quantitate protein tolerance to random change, it is vital to understand the probability that a random amino acid replacement will lead to a protein's functional inactivation. We define this probability as the 'x factor.' ...The x factor was found to be 34% ± 6%."  -- 3 scientists, "Protein tolerance to random amino acid change." 
  • "Once again we see that proteins are fragile, are often only on the brink of stability." -- Columbia University scientists  Lawrence Chasin and Deborah Mowshowitz, "Introduction to Molecular and Cellular Biology," Lecture 5.
  • "We predict 27–29% of amino acid changing (nonsynonymous) mutations are neutral or nearly neutral (|s|<0.01%), 30–42% are moderately deleterious (0.01%<|s|<1%), and nearly all the remainder are highly deleterious or lethal (|s|>1%).” -- "Assessing the Evolutionary Impact of Amino Acid Mutations in the Human Genome," a scientific paper by 14 scientists. 
  • "An analysis of 8,653 proteins based on single mutations (Xavier et al., 2021) shows the following results: ~68% are destabilizing, ~24% are stabilizing, and ~8,0% are neutral mutations...while a similar analysis from the observed free-energy distribution from 328,691 out of 341,860 mutations (Tsuboyama et al., 2023)...indicates that ~71% are destabilizing, ~16% are stabilizing, and ~13% are neutral mutations, respectively." -- scientist Jorge A. Villa, "Analysis of proteins in the light of mutations." 2023.
  • "Proteins are intricate, dynamic structures, and small changes in their amino acid sequences can lead to large effects on their folding, stability and dynamics. To facilitate the further development and evaluation of methods to predict these changes, we have developed ThermoMutDB, a manually curated database containing >14,669 experimental data of thermodynamic parameters for wild type and mutant proteins... Two thirds of mutations within the database are destabilising." -- Eight scientists, "ThermoMutDB: a thermodynamic database for missense mutations," 2020. 
  • No comments:

    Post a Comment