Future and Cosmos: origin of proteins

Showing posts with label origin of proteins. Show all posts

Wednesday, April 9, 2025

Scientists Almost Seem to Have Given Up on Trying to Explain the Origin of Proteins and Genes

Scientists like to make various types of big boasts about their knowledge of things, such as the boast that they understand the basics of how the human species appeared. But various types of unsolved problems act as antagonists to such boasts. It's kind of like this:

Scientist: I understand how life originated.

Boast antagonist problems: No, you sure as hell do not.

Scientist: I understand how a human mind arises.

Boast antagonist problems: No, you sure as hell do not.

Scientist: I understand how the human species arose.

Boast antagonist problems: No, you sure as hell do not.

In regard to the origin of life, the boast antagonist problems include the problems of the origin of DNA, the origin of the genetic code, the origin of genes, the origin of the first protein molecules, and the origin of homochirality. In regard to the origin of large organisms, the boast antagonist problems include the problems of the origin of eukaryotic cells, the origin of most of the proteins used by mammals, the origin of protein complexes, the origin of multicellular life and the origin of bipedalism. In regard to the origin of man, the boast antagonist problems include the problems of explaining morphogenesis, the origin of language, the origin of higher abstract reasoning, and the problems of explaining instantaneous memory creation, instant memory recall and the persistence of memories for 50 years.

There are different ways scientists can act when faced with such boast antagonist problems. A healthy response to such problems is to spend great amounts of time trying to resolve them. Another healthy response to such problems is to modify and restrain your boasts of knowing grand things, on the basis that there are too many related unsolved problems for you to make such boasts. An unhealthy response to such boast antagonist problems is to pretty much ignore them, to mention them as little as possible, and to hope that people don't pay attention to them. There is a reason for thinking that scientists are largely guilty of this type of unhealthy response. The reason I refer to is that when we search for US federal funding for research on these boast antagonist problems, we find that some of the biggest of these problems are getting scant research.

The web page here allows you to search grants that have been approved by the National Science Foundation:

If you type in "cancer" as the search string, and press the Search button, you will get 750 results. The number of results is not directly listed. But by multiplying the part of the page showing results per page by the part of the page showing how many pages of results were returned, you can figure out the total number of results. For example, in the search result below, we have 30 results per page, and 25 pages of results. So apparently there are about 750 National Science Foundation projects that have some involvement with cancer:

You can also find a huge number of research results searching for topics that have no practical value. Below is what the search term "dark matter" produces:

We get 55 pages of results, with 30 results per page, giving a total of something like 1650 federally funded projects relating to dark matter. That's an amazing result, given that dark matter has never even been directly observed; and we don't even know if it exists.

Now, let's try a different search. We will look for funded research projects relating to the origin of protein molecules. The problem of the origin of protein molecules is one of the boast antagonist problems I referred to above. Inside each human body there are more than 20,000 different types of protein molecules, each a different type of complex invention requiring a very special arrangement of thousands of atoms. Scientists lack any credible theory for the origin of protein molecules.

Protein molecules require very special arrangements of amino acids as hard-to-achieve by chance as it is hard for ink splashes to produce useful functional paragraphs. Because protein molecules are in general very sensitive to small changes, with their functionality typically being broken if you change only 10% or less of their amino acids, protein molecules have very high organization thresholds for them to be functional, meaning that they are not credibly explained by gradualist ideas such as Darwinism. The difficulties in explaining the origin of protein molecules is one of the biggest reasons for rejecting boasts that biological origins are successfully explained by ideas of Darwinian evolution. The issue is discussed at much greater length here and here.

Below is the result I get when I search for the phrases "origin of protein molecules" and "origin of proteins" and "origin of protein" using the National Science Foundation grant query tool (the visual combines three different search results):

The search for research projects using the term "origin of protein molecules" produced no results. The search for research projects using the term "origin of proteins" produced no results. The search for research projects using the term "origin of protein" produced only two results. Both of those results were projects related to Alzheimer's disease, neither of which had anything to do with explaining the origin of any protein molecule.

There's another way we can search for research projects related to the origin of protein molecules. We can search using terms such as "origin of genes" and "origin of gene." Each type of protein molecule has its amino acid sequence specified by a particular type of gene. So research into the origin of genes is pretty much equivalent to research on the origin of proteins.

Below is the result I get when I search for the phrases "origin of genes" and "origin of gene" using the NSF grant query tool:

The results are extremely scanty. A search for the phrase "origin of genes" produced only one result, and it is a project completed in the year 2000. A search for the phrase "origin of gene" produced only four results. The first result is a project that ended in 2023. The other three results are all projects that ended in the year 2007.

Another topic related to the origin of genes and proteins is the origin of the genetic code. The genetic code is the system of representation used by genes, in which certain combinations of nucleotide base pairs stand for certain types of amino acids. This system of representations is shown below:

A search for the phrase "origin of the genetic code" on the NSF grant query tool produces only the three results shown below:

All of these projects have already been completed.

Another of the boast antagonist problems I mentioned was the origin of protein complexes. Most or a large fraction of all proteins seem to be useless when acting alone. Most or a large fraction of all proteins only become functional when they act as team members within groups of proteins called protein complexes. Why such protein complexes arise so conveniently in the body is a major unsolved problem of biology. Below are some relevant quotes:

"The majority of cellular proteins function as subunits in larger protein complexes. However, very little is known about how protein complexes form in vivo." Duncan and Mata, "Widespread Cotranslational Formation of Protein Complexes," 2011.
"While the occurrence of multiprotein assemblies is ubiquitous, the understanding of pathways that dictate the formation of quaternary structure remains enigmatic." -- Two scientists (link).
"A general theoretical framework to understand protein complex formation and usage is still lacking." -- Two scientists, 2019 (link).
"Most proteins associate into multimeric complexes with specific architectures, which often have functional properties like cooperative ligand binding or allosteric regulation. No detailed knowledge is available about how any multimer and its functions arose during historical evolution." -- Ten scientists, 2020 (link).
"Protein assemblies are at the basis of numerous biological machines by performing actions that none of the individual proteins would be able to do. There are thousands, perhaps millions of different types and states of proteins in a living organism, and the number of possible interactions between them is enormous...The strong synergy within the protein complex makes it irreducible to an incremental process. They are rather to be acknowledged as fine-tuned initial conditions of the constituting protein sequences. These structures are biological examples of nano-engineering that surpass anything human engineers have created. Such systems pose a serious challenge to a Darwinian account of evolution, since irreducibly complex systems have no direct series of selectable intermediates, and in addition, as we saw in Section 4.1, each module (protein) is of low probability by itself." -- Steinar Thorvaldsen and Ola Hössjerm, "Using statistical methods to model the fine-tuning of molecular machines and systems," Journal of Theoretical Biology.

Below is the result we get using the phrase "origin of protein complexes" on the NSF grant query tool:

The query produces no results. If you change the query to "formation of protein complexes," you will get only five results, all referring to projects already completed. None of those projects generally addressed the problem of how protein complexes form.

The queries above suggest that scientists almost have given up on trying to explain the origin of genes, protein molecules and the genetic code, and that scientists have almost given up on trying to explain the formation of protein complexes. The problems of trying to explain the origin of such things are some of the biggest unsolved problems in science. But as long as you stay chained to the ball and chains of Darwinism and materialism, there is basically no hope of making progress on such problems. So rather than giving us continued demonstrations of how bad a job Darwinism does at explaining the origin of genes, proteins and the genetic code, scientists seem to be taking a kind of "hands off" approach to such problems, hoping that people won't notice their gigantic failure to credibly explain such things.

By their failure to "put two and two together" in realizing the implications of their failure to credibly explain the origin of protein molecules, genes, protein complexes, the genetic code and homochirality, today's biologists remind me of Lois Lane in the Superman comic books, TV shows and movies. I am currently watching on HBO Max reruns of the TV series "Lois and Clark: The New Adventures of Superman." The series has excellent romantic chemistry between Superman/Clark Kent (very well-played by Dean Cain) and Lois Lane (very well played with comic flair by Teri Hatcher).

In the series Lois Lane is a bright woman, but when it comes to figuring out Superman's secret identity (that Superman is really Clark Kent), Lois just cannot put two and two together (to use an English expression meaning to reach a very obvious conclusion). Lois frequently sees Superman right next to her, and every day she sees Clark Kent, who looks and talks exactly like Superman, the only difference being that Clark Kent wears glasses. Also, Lois never sees Superman and Clark Kent together. And it seems that in half of the episodes, whenever some danger arises when Lois and Clark are together, Clark suddenly disappears and Superman suddenly appears to save the day. Figuring out that Superman must be Clark Kent is just a matter of putting two and two together, but Lois just cannot bring herself to do that. Similarly, faced with a biosphere in which all the big organisms look as well-designed and precisely fine-tuned and information-rich and well-organized as anything could look, our biologists just cannot bring themselves to put two and two together and reach the obvious conclusion that follows from such realities.

Saturday, February 15, 2025

Why Accidents Cannot Produce Very Complex and Useful Instruction Information

Darwinist materialism is built upon the idea that accidents of nature can produce dazzling works of biological construction. In this post I will explain a small part of the reason why this idea is irrational and utterly unbelievable. Part of the reason is that accidents cannot produce very complex instruction information. By very complex instruction information I mean the type of information you would need to construct some complex thing such as a house, a car, a cell or even a large protein molecule with a specific biological function.

Let us start with a simple case of probability calculation. What is the chance that a random string of five English characters would produce a five-letter word in the English language? To calculate this, you need to answer two questions:

(1) How many random combinations of five English characters would result in a word in the English language?

(2) How many possible five-character strings of letters could you produce from random combinations of characters?

A Google query of "number of five letter words in English" will give you the answer to the first question. The answer is that there are roughly 100,000 to 120,000 five-letter words in the English language.

The second question can be answered using the mathematical rule that the number of possible combinations of a sequence of characters or digits is roughly equal to the number of possible values in each position of the sequence multiplied by itself a number of times equal to the length of the sequence. So, for example:

There are 10 possible digits between 0 and 9, so the total number of possible decimal digit sequences with a length of 5 is roughly equal to 10 multiplied by itself 5 times, which equals 100000. (I say "roughly equal" because the exact number is all the numbers between 10,000 and 99,999, which is 99,000 numbers.)
Counting only lowercase letters and digits (a to z and 0 to 9), there are 36 possible characters that can exist in any position in a five-character sequence. The total number of possible five-character character sequences is roughly 36 multiplied by itself five times, which is 60,466,176.

So what is the chance of a random set of five characters being a word in the English language? The answer is roughly 120,000 divided by 60,466,176, which is 0.00198. This is roughly about 1 chance in 500.

Now, imagine we want to calculate the chance of a long series of randomly typed characters producing nothing but words in the English language. To keep things simple, we can calculate this random typing as being a series of five random characters, each followed by a space. If we want to calculate a series of x randomly typed groups of five characters all being words in the English language, we will have to multiply .000198 by itself x times.

So, for example, the probability of typing 100 consecutive random five-letter sequences and having them all be words in the English language is roughly 1 in 500 (or .000198) to the hundredth power. You can calculate things like this using what is called a large exponents calculator. Using such a calculator, we find that 1 in 500 to the hundredth power is roughly equal to 1 in 10 to the 269th power.

The large exponents calculator above for some reason prefers to work with integer numbers rather than decimal numbers such as .000198. But since .000198 is very close to 1 in 500, and we are only interested in getting an answer roughly correct, we can simply type 500 in the first input slot above, and remember to divide by 1 the total produced.

We see that the probability of you typing 100 random five-character sequences and having them all be words in the English language is roughly 1 in ten to the 288th power. This is a number so low that it is prohibitive. Things with this improbability would never occur in the entire history of the universe.

But what about the probability of you randomly typing 100 random five-character sequences and producing some complex and useful instruction information such as how to build a complex and useful building or invention, or at least one of its parts? Would it be less or greater than the incredibly low probability calculated above? It would surely be very much less, because the calculation above does not even take into account the need to arrange the words in a meaningful order. It is much, much more improbable for randomly generated output to produce a useful instruction sentence such as "use a hammer and nails to hammer together all the wood two-by-fours" than it is is for random words to produce a meaningless but correctly spelled sentence such as "house smart green taste works south quick." So given the previous calculations we are safe in assuming that typing 100 five-character sequences of randomly typed text will produce a useful instruction sentence with a probability very much less than 1 in ten to the 288th power.

Now, let us consider the instruction information in biology. In biology we have the most gigantic "missing specifications" problem described in detail here. This is because contrary to the false claims that have so often been made, nowhere in DNA or its genes has anyone discovered anything like the instructions needed to build a body or any of its organ systems or any of its organs or any of its cells. DNA and its genes do not even specify how to build any of the organelles that are the main building components of cells. But we do know that DNA does contain a huge repository of instruction information. The DNA in humans contains more than 20,000 genes. Each of those genes tells how to make a particular polypeptide chain that is the starting point for a particular protein molecule. Such a polypeptide chain is a sequence of amino acids.

In terms of complexity and functional usefulness, there is a great deal of similarity between a gene and the 100-word instruction sequence I previously imagined. Simplifying things, I previously imagined 36 possible values at each position in the random sequences I was imagining. For a gene we have a similar situation. A gene specifies a sequence of amino acids, usually hundreds and sometimes thousands. There are twenty amino acids that are used by living thing. Any position in a gene can specify any of twenty amino acids.

So the math we have with genes is similar to the math previously imagined. I was previously imagining a sequence of 500 random characters (100 words each consisting of five random characters). The average length of a human gene is a size needed to specify about 450 amino acids. Human protein molecules are in eukaryotic cells, and the scientific paper here says, "Eukaryotic proteins have an average size of 472 aa [amino acids]." And just as the chance of you making a useable English instruction sentence from about 500 random characters is incredibly low (less than 1 chance in 10 to the 288th power), the chance of you getting a useful gene from a random sequence of nucleotide base pairs specifying a random sequence of amino acids is incredibly low. To be functional, a protein molecule half-specified by a gene requires a very special three-dimensional structure, that uses a very hard-to-achieve effect called folding. A functional gene and a corresponding functional protein molecule requires a very special arrangement of amino acids, as special as the arrangement of characters in a functional instruction sentence.

The fact that protein molecules require very rare and special sequences of amino acids is shown by how sensitive protein molecules are to small changes. Below are some relevant quotes by scientists:

"It seems clear that even the smallest change in the sequence of amino acids of proteins usually has a deleterious effect on the physiology and metabolism of organisms." -- Evolutionary biologist Richard Lewontin, "The triple helix : gene, organism, and environment," page 123.
"Proteins are so precisely built that the change of even a few atoms in one amino acid can sometimes disrupt the structure of the whole molecule so severely that all function is lost." -- Science textbook "Molecular Biology of the Cell."
"To quantitate protein tolerance to random change, it is vital to understand the probability that a random amino acid replacement will lead to a protein's functional inactivation. We define this probability as the 'x factor.' ...The x factor was found to be 34% ± 6%." -- 3 scientists, "Protein tolerance to random amino acid change."
"Once again we see that proteins are fragile, are often only on the brink of stability." -- Columbia University scientists Lawrence Chasin and Deborah Mowshowitz, "Introduction to Molecular and Cellular Biology," Lecture 5.
"We predict 27–29% of amino acid changing (nonsynonymous) mutations are neutral or nearly neutral (|s|<0.01%), 30–42% are moderately deleterious (0.01%<|s|<1%), and nearly all the remainder are highly deleterious or lethal (|s|>1%).” -- "Assessing the Evolutionary Impact of Amino Acid Mutations in the Human Genome," a scientific paper by 14 scientists.
"An analysis of 8,653 proteins based on single mutations (Xavier et al., 2021) shows the following results: ~68% are destabilizing, ~24% are stabilizing, and ~8,0% are neutral mutations...while a similar analysis from the observed free-energy distribution from 328,691 out of 341,860 mutations (Tsuboyama et al., 2023)...indicates that ~71% are destabilizing, ~16% are stabilizing, and ~13% are neutral mutations, respectively." -- scientist Jorge A. Villa, "Analysis of proteins in the light of mutations." 2023.
"Proteins are intricate, dynamic structures, and small changes in their amino acid sequences can lead to large effects on their folding, stability and dynamics. To facilitate the further development and evaluation of methods to predict these changes, we have developed ThermoMutDB, a manually curated database containing >14,669 experimental data of thermodynamic parameters for wild type and mutant proteins... Two thirds of mutations within the database are destabilising." -- Eight scientists, "ThermoMutDB: a thermodynamic database for missense mutations," 2020.

Genes contain very complex and useful instruction information. But getting such information by chance is very roughly as improbable as getting useful instruction information from randomly generated text. A gene tells much of what is needed to construct a particular type of complex invention: a protein molecule. A protein molecule is a very special arrangement of hundreds or thousands of amino acids, which have to be just right for that particular type of protein molecule to perform its function. Human DNA has roughly 20,000 genes, each of which largely tells how to make a different type of complex invention in your body: a particular type of protein molecule with hundreds or thousands of well-arranged parts. The extreme sensitivity and fragility of protein molecules (discussed in the bullet list above) tells us how very special is the required arrangement that must occur in every human gene.

The likelihood of random mutations producing a novel type of gene that could serve as instructions for how to make a new type of functional protein is roughly the same as the likelihood of 500 randomly typed characters producing a useful and very complex instruction. The chance of both of these things is so very low as to be prohibitive. The chance of some accident or series of accidents producing from scratch either a new useful type of gene or protein or a new useful 100-word instruction is basically zero, so low that we would never expect it to ever happen in the history of the universe.

From such realities we can derive the very general principle that accidents cannot produce very complex and useful instruction information. This principle matches our intuitions. If someone ever claimed that he spilled a big box of 500 Scrabble letters, and that they fell on the floor and accidentally produced a 100-word complex instruction that was very useful, you would never believe such a tale.

But what about all the biologists who tell us that all of the millions of types of genes and millions of types of protein molecules in the animal kingdom are the result of accidents of nature, mere random mutations? They are believing the worst type of nonsense. Believing in such a thing is as illogical as believing that all of the books in a huge public library were written by mere ink splashes, rather than the purposeful intention of authors.

What happened was that between 1875 and 1925 Darwinism became a sacred dogma of the conformist belief communities that reside in the biology departments of universities. In the next decades scientists discovered how mountainous is the organization and information richness of living things. Biologists discovered around the middle of the 20th century that humans require an information richness and level of hierarchical organization vastly beyond anything they had ever been imagined. At that time all claims of understanding the origin of species and the origin of humans should have been abandoned.

But by then biologists had already made Darwin their Jesus or Buddha, and had made Darwin's boasts of explaining biological origins some sacred dogma that was not to be questioned. So the groundless boast that biologists had explained human origins and the origin of other species continued to be taught, just like some religious dogma that continues to be taught even after facts have discredited it. The biologists made it clear they despised the fundamentalists who clung to the idea that mankind was only about 6000 years old. But by clinging to the discredited explanation boasts of Darwinism, such biologists were acting in the same way as such fundamentalists, clinging to a discredited belief tradition rather than updating their claims to fit the observed facts.

And what if you somehow had an explanation for the accidental origin of all the genes in the human body, despite all the reasons discussed above for thinking such a thing is impossible? Then you still wouldn't have a tenth of an explanation for how there arose human bodies, because DNA and its genes do not specify how to make human bodies or any organs or any cells or even any of the organelles that are the building components of such cells. And you also would not have an explanation for human minds and their capabilities, because neither genes nor brains explain such capabilities, for reasons discussed at great length in the posts on my site here.

You might try to defeat some of the reasoning above by appealing to possibilities such as lower functional thresholds (such as rare types of protein molecules that might be functional in half form). Such attempts could easily be demolished by a discussion of facts arguing far more strongly in the opposite direction, such as the fact that most types of protein molecules produce no survival benefit or reproduction benefit by themselves, but only are beneficial when they act as team members in biological components of far greater complexity, such as protein complexes requiring many types of proteins to be useful. A proper study of functional thresholds and interdependent components always undermines the explanatory boasts of biologists rather than supporting them.

Wednesday, July 3, 2024

Gelernter Greatly Underestimated the Unlikelihood of Darwinian Protein Origination

In 2019 computer scientist David Gelernter published a widely discussed book review entitled "Giving Up Darwin" that enraged many a biologist. Gelernter stated, "The origin of species is exactly what Darwin cannot explain." To back up this claim, he first starts out discussing the Cambrian Explosion, a period of relatively short length under which almost every animal phyla originated. He lists the Cambrian Explosion as occurring over 70 million years, but a more common estimate is only about 30 million years. Gelernter states this:

"Darwin’s theory predicts that new life forms evolve gradually from old ones in a constantly branching, spreading tree of life. Those brave new Cambrian creatures must therefore have had Precambrian predecessors, similar but not quite as fancy and sophisticated. They could not have all blown out suddenly, like a bunch of geysers. Each must have had a closely related predecessor, which must have had its own predecessors: Darwinian evolution is gradual, step-by-step. All those predecessors must have come together, further back, into a series of branches leading down to the (long ago) trunk. But those predecessors of the Cambrian creatures are missing...In fact, the fossil record as a whole lacked the upward-branching structure Darwin predicted. The trunk was supposed to branch into many different species, each species giving rise to many genera, and towards the top of the tree you would find so much diversity that you could distinguish separate phyla—the large divisions (sponges, mosses, mollusks, chordates, and so on) that comprise the kingdoms of animals, plants, and several others—take your pick. But, as Berlinski points out, the fossil record shows the opposite: 'representatives of separate phyla appearing first followed by lower-level diversification on those basic themes.' In general, 'most species enter the evolutionary order fully formed and then depart unchanged.' The incremental development of new species is largely not there. Those missing pre-Cambrian organisms have still not turned up."

But this is not the main part of Gelernter's case against Darwinian explanatory boasts. The main part of his case is based on the complexity of living things, particularly the complexity of protein molecules. Citing an average number of amino acids that is much smaller than the actual average, Gelernter tells us, "A protein molecule is based on a chain of amino acids; 150 elements is a 'modest-sized' chain; the average is 250." He tells us that for you to get a functional protein you need to get a very special arrangement of amino acids. He states this:

"Now at last we are ready to take Darwin out for a test drive. Starting with 150 links of gibberish, what are the chances that we can mutate our way to a useful new shape of protein? We can ask basically the same question in a more manageable way: what are the chances that a random 150-link sequence will create such a protein? Nonsense sequences are essentially random. Mutations are random. Make random changes to a random sequence and you get another random sequence. So, close your eyes, make 150 random choices from your 20 bead boxes and string up your beads in the order in which you chose them. What are the odds that you will come up with a useful new protein?...The total count of possible 150-link chains, where each link is chosen separately from 20 amino acids, is 20¹⁵⁰. In other words, many. 20¹⁵⁰ roughly equals 10¹⁹⁵, and there are only 10⁸⁰ atoms in the universe. What proportion of these many polypeptides are useful proteins?"

Gelernter tells us that the ratio of long useful amino acid sequences (compared to useless amino acid sequences that will not be the basis of functional proteins) is incredibly small. He cites a paper by Douglas Axe estimating that the ratio is something like 1 in ten to the seventy-fourth power, or about 1 in 10⁷⁴ .

Gelernter states this:

"Try to mutate your way from 150 links of gibberish to a working, useful protein and you are guaranteed to fail. Try it with ten mutations, a thousand, a million—you fail. The odds bury you. It can’t be done."

The phrasing of the middle sentence is a great understatement. What is should be is something like "Try it with a million mutations, a billion, a trillion, a quadrillion, a quintillion—you fail." If you have some result that you can only get about 1 in 10⁷⁴ attempts, then you can try 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 times, and you still very probably do not succeed. According to the paper here, "we arrive at a figure of 4×1021 different protein sequences tested since the origin of life." The problem is that isn't enough tries to get even one success, if you're talking about proteins of average length. If you have some result that you can only get about 1 in 10⁷⁴ attempts, then 4×1021 tries will not give you a 1 in 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 chance of a single success.

For some reason Gelernter uses some figure for the total number of mutations (roughly the same as the total number of tried amino acid sequences) vastly higher than the figure quoted in the paper above. He states this:

"In any case, there have evidently been, in the whole history of life, around 1040 bacteria—yielding around 1040 mutations under Axe’s assumptions. That is a very large number of chances at any game. But given that the odds each time are 1 to 1077 against, it is not large enough. The odds against blind Darwinian chance having turned up even one mutation with the potential to push evolution forward are 1040x(1/1077)—1040 tries, where your odds of success each time are 1 in 1077—which equals 1 in 1037. In practical terms, those odds are still zero. Zero odds of producing a single promising mutation in the whole history of life. Darwin loses...Neo-Darwinianism says that nature simply rolls the dice, and if something useful emerges, great. Otherwise, try again. But useful sequences are so gigantically rare that this answer simply won’t work."

Overall Gelernter's bold article was a fine piece of work, and he did a good job of explaining some of the reasons why Darwinism fails to explain the origin of any of the protein molecules in our bodies, and therefore fails to explain the origin of species. But there is a reason why the case against a Darwinian origin of proteins is very much stronger than he suggested. Gelernter misstated the average number of amino acids in a protein. He states, "A protein molecule is based on a chain of amino acids; 150 elements is a 'modest-sized' chain; the average is 250." No, according to the 2012 scientific paper here, "Eukaryotic proteins have an average size of 472 aa [amino acids], whereas bacterial (320 aa) and archaeal (283 aa) proteins are significantly smaller (33-40% on average)." Mammals like us have eukaryotic proteins, so the average human protein has about 472 amino acids, almost twice as many as the number Gelernter cited. The 2021 paper here lists the median amino acid length of a eukaryotic protein as 353, and has visuals showing that organisms such as humans have very many proteins with more than 500 amino acids each (Figure 1). A median is a 50th percentile value. In most sets the median and the average are similar, but the median can be substantially different from an average. The 2005 paper here gives numbers of 375 and 416 as the median number of amino acids in human proteins.

Getting this number right is very important, because if humans have proteins with an average of 472 amino acids, then it isn't merely about twice as hard to get human-type proteins than with proteins with a size of 250 amino acids, but exponentially harder or geometrically harder (as in more than 1,000,000,000,000,000,000,000,000,000,000,000 times harder). An incredibly important point of probability calculation is that the difficulty of getting meaningful useful results from random combinations rises exponentially or geometrically when there occurs a simple linear increase in the number of parts that must be well-arranged. It isn't twice as hard to get from a random character generator a grammatical, useful, well-spelled sentence of 200 characters than a grammatical, useful, well-spelled sentence of 100 characters -- it's more than a million billion trillion times harder. Similarly, getting functional folding protein molecules with a length of nearly 500 amino acids ends up being exponentially harder (very, very many times harder) than getting functional folding protein molecules with a length of about 250 amino acids.

Let's do some simple math to show the difference here between the right numbers. A reasonable assumption is that every functional protein needs to have at least half of its amino acid sequence just as it is, or the molecule will not perform its function. (There are reasons for thinking that the fraction is actually much larger than 50%, which I give at the end of this post.) So given that there are twenty amino acids used by living things, the probability of getting a random amino acid sequence serving the purpose of a particular protein can be very roughly estimated as 1 in 20n, where n is half the length of a protein's amino acid sequence. If we have a protein with a sequence of 250 amino acids, this equals a probability of about 1 in 20125, which is the same as about 1 in 10162. But if we have a protein with a sequence of 472 amino acids, this equals a probability of roughly 1 in 20236, which is the same as about 1 in 10307. So estimating a chance of something like 1 in 1037, Gelernter has vastly underestimated the difficulties of a Darwinian origination of a functional protein. The odds are almost infinitely worse than he suggests.

Another reason why Gelernter has vastly underestimated the difficulties of a Darwinian origination of protein molecules is that his estimates revolve around a protein of average length. In estimating the improbability of unlikely events, we should be paying attention to not just average results but above-average results. The fact is that there are hundreds of types of human proteins with amino acid lengths greater than 2000, as I discuss in the appendix of this post. There are also more than 700 types of human proteins with amino acid lengths greater than 1000. A reasonable calculation of the likelihood of a Darwinian origination of any human protein molecule with an amino acid length greater than 1000 would leave you with numbers gigantically smaller than the 1 in 1037figure Gelernter used. You would get likelihood figures something like 1 in 10 to the 500th power, very definitely a "never in the history of the universe" kind of improbability.

Another reason why Gelernter has vastly underestimated the difficulties of a Darwinian origination of protein molecules is that he failed to mention the extremely important point that a large fraction of all protein molecules (and quite possibly a majority of them) are individually useless, because the protein molecules only function when they act as team members within protein complexes consisting of multiple types of proteins (often five or more). This fact makes a Darwinian origination of protein molecules gigantically and exponentially more improbable. Very roughly, we can think of such a situation as multiplying by several times the number of well-arranged parts that must exist before functionality occurs. So instead of just having an average of about 400 well-arranged amino acid parts for functionality to be reached, we have in a large fraction of all cases involving protein function a case in which thousands of well-arranged amino acid parts (existing across several types of proteins in a protein complex) must exist before there is any function. Calculating the odds of getting such protein complexes by Darwinian effects gives you probabilities very, very many orders of magnitude smaller than the probabilities Gelernter mentioned (I would understate if I said something like 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 smaller).

Any attempt to escape such improbabilities by evoking the idea of proteins being functional in fractional states can be discredited by discussing the extreme fragility of protein molecules and their extreme sensitivity to small random changes, the fact that small random changes in protein molecules tend to destroy their functionality. I will leave a discussion of that to the appendix of this post. As four Harvard scientists said, "A wide variety of protein structures exist in nature, however the evolutionary origins of this panoply of proteins remain unknown."

The mathematics above may make the reader's head spin. So let's look at the issue from a different way, a way that will use a simple visual. The visual is below. It depicts the Darwinian story of the evolution of humans.

We see a pyramid. Each brick in the pyramid represents roughly a million human lifetimes. It is not believed that humans existed prior to about 200,000 years ago. The defining characteristic of humans is the use of symbols, and there is no robust evidence of any symbol manipulation prior to about 100,000 years ago. The human population has increased vastly since 200,000 years ago. It is believed that 200,000 years ago the number of humans was very small, being only about 20,000.

We know that there has been no major evolution of human beings since about 8000 BC. The main such evolution claimed is some change in lactose digestion, which does not qualify as major evolution. The earliest artwork of humans depict humans looking as they now look. We know that humans before the time of Jesus were about as smart as humans now are. Anyone reading the works of Plato (which survive in great number) will read the work of a mind as brilliant and subtle as any that exists today. It is rather obvious that the minds who designed how to build structures such as the Great Pyramid of Giza (which survive after 4500 years) with the primitive tools of the time must have been as ingenious as almost any one living today.

Now, the visual above depicts a story that makes no sense. According to the story, during only a relatively small number of human lifetimes (between about 200,000 BC and 8000 BC) there occurred some vast evolutionary leap allowing previously uncivilized creatures to become language-using, reasoning and civilized beings. But, according to the same story, during a vastly greater number of lifetimes (occurring after 8000 BC) there has occurred no major evolution of humans. The story makes no sense, and is implausible as claiming that a million people died from lightning in one year, with fewer than 1000 dying from lightning in the next century. Why would there have been such a vast evolutionary leap during a relatively small number of human lifetimes, and no major human evolution during a number of human lifetimes very many times larger?

Gelernter was haughtily denounced by Darwinism devotees, on the grounds that he was not a biologist, and that his critique was not peer-reviewed. But biologists sometimes make similar critiques. In late 2023 we had a peer-reviewed paper by several biologists stating the following:

"There is a growing sense of unease among biologists that there are serious shortcomings in the Neo-Darwinian framework, in particular that several of its central assumptions are wrong and that, as a result, it lacks explanatory power. The problems are many and likely fatal."

Appendix: I will now tell you how to get an authoritative answer about how many human protein molecules have more than 2000 well-arranged amino acid parts. Using the UniProt protein database that anyone can use without a login, you go to www.uniprot.org, and type in the following search phrase (or, using less effort, just click on the link below):

(length:[2000 TO 50000]) AND (organism_name:"Homo sapiens")

This gives you a results screen like the one below.

You will see more than 1000 rows in the result set. The results will first show the simplest proteins with more than 2000 amino acids. Click on the Length column header, and the results will be sorted like we see above, with the most complex proteins shown first.

There seem to be some duplicates in the results, or cases of proteins that are minor variations of the same protein. But scrolling through the results, you will be able to see two things:

(1) There are at least hundreds of types of proteins in the human body that each have thousands of amino acids.

(2) The most complex proteins in the human body have more than 10,000 well-arranged amino acids. For example, the Titin protein consists of more than 30,000 well-arranged amino acids.

Using a variation of the search string above, you can get an idea of how many types of human protein molecules have more than 1000 amino acids each. For example, suppose you change the www.uniprot.org search string to be the one below (or just click on the link below):

(length:[1000 TO 50000]) AND (organism_name:"Homo sapiens")

You will get a result set of more than 8000 rows. Allowing for many duplicates, we can assume that human bodies contain more than 1000 types of "highest complexity" protein molecules, where "highest complexity" means having more than 1000 amino acids.

Below (as promised above) are some quotes establishing the extreme fragility of protein molecules, and how small random changes destroy their functionality (an issue of great relevance to whether such molecules can originate by Darwinian processes):

"It seems clear that even the smallest change in the sequence of amino acids of proteins usually has a deleterious effect on the physiology and metabolism of organisms." -- Evolutionary biologist Richard Lewontin, "The triple helix : gene, organism, and environment," page 123.

"Proteins are so precisely built that the change of even a few atoms in one amino acid can sometimes disrupt the structure of the whole molecule so severely that all function is lost." -- Science textbook "Molecular Biology of the Cell."

"To quantitate protein tolerance to random change, it is vital to understand the probability that a random amino acid replacement will lead to a protein's functional inactivation. We define this probability as the 'x factor.' ...The x factor was found to be 34% ± 6%." -- 3 scientists, "Protein tolerance to random amino acid change."

"Once again we see that proteins are fragile, are often only on the brink of stability." -- Columbia University scientists Lawrence Chasin and Deborah Mowshowitz, "Introduction to Molecular and Cellular Biology," Lecture 5.

"We predict 27–29% of amino acid changing (nonsynonymous) mutations are neutral or nearly neutral (|s|<0.01%), 30–42% are moderately deleterious (0.01%<|s|<1%), and nearly all the remainder are highly deleterious or lethal (|s|>1%).” -- "Assessing the Evolutionary Impact of Amino Acid Mutations in the Human Genome," a scientific paper by 14 scientists.

"An analysis of 8,653 proteins based on single mutations (Xavier et al., 2021) shows the following results: ~68% are destabilizing, ~24% are stabilizing, and ~8,0% are neutral mutations...while a similar analysis from the observed free-energy distribution from 328,691 out of 341,860 mutations (Tsuboyama et al., 2023)...indicates that ~71% are destabilizing, ~16% are stabilizing, and ~13% are neutral mutations, respectively." -- scientist Jorge A. Villa, "Analysis of proteins in the light of mutations." 2023.

"Proteins are intricate, dynamic structures, and small changes in their amino acid sequences can lead to large effects on their folding, stability and dynamics. To facilitate the further development and evaluation of methods to predict these changes, we have developed ThermoMutDB, a manually curated database containing >14,669 experimental data of thermodynamic parameters for wild type and mutant proteins... Two thirds of mutations within the database are destabilising." -- Eight scientists, "ThermoMutDB: a thermodynamic database for missense mutations," 2020.

Header 1

Our future, our universe, and other weighty topics