Let us look at a scientific paper which sheds light on the very high functional thresholds of protein complexes. The concept of a functional threshold is a supremely important topic for anyone seriously studying biological organization and biological complexity. A functional threshold is the minimum number of parts and arrangement of those parts for some particular biological function to be met. Enormously complex organisms such as human beings have innumerable functions, most of which have very high functional thresholds.
Most of the scientific papers in biology literature do a very bad job of shedding light on just how high functional thresholds are. But the paper "A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality" by G Traver Hart, Insuk Lee and Edward R Marcotte (which you can read here) has a table that helps to show how high such functional thresholds are.
The scientists studied protein complexes in yeast, a one-celled organism sometimes called simple. But even the simplest one-celled organism is really a marvel of organization and fine-tuned complexity. Early in the paper the scientists begin to give us a clue about the underlying complexity, stating this:
"The molecular machines that carry out basic cellular processes are typically not individual proteins but protein complexes. Even in the relatively simple model organism Saccharomyces cerevisiae, most machines that process and store biological information are in fact large protein complexes comprised of many subunits.."
For those unfamiliar with the hierarchical organization of the human body, I can give a sentence that describes it. The sentence is: a human body consists of a skeletal system and organ systems; organ systems are made up of one or more organs and other components; organs are made up of tissues; tissues are made up of cells; cells are made up of many organelles of many types; organelles are made up of protein complexes and proteins; protein complexes are made up of many different types of proteins; proteins consist of special well-arranged sequences of hundreds or thousands of amino acids, folded into 3D shapes; amino acids consist of atoms; and atoms consist of protons, neutrons and electrons.
Let's look at these units called protein complexes. Typically a protein is of no use by itself, and the protein only becomes useful when it becomes part of a protein complex. A protein complex is a group of proteins connected together to perform a particular function. How many amino acids do you have to have specially arranged for you to have a useful protein complex? The answer is: many thousands. Similarly, it takes a special arrangement of thousands of letters or characters for you to get a page full of text in a technical manual using fine print.
(Image credit: Wikipedia Commons, derived from Yuan et al. 2010, Structure of an apoptosome-procaspase-9 CARD complex)
We can start to realize what the answer to that question (" How many amino acids do you have to have specially arranged for you to have a useful protein complex?") by examining Table 1 in the paper mentioned above. Below is an annotated photo of the beginning of that table.
The text in blue gives my comments elaborating on some of the data in the table. The table lists various protein complexes used by yeast. The "Size" column in the table lists the number of types of proteins used in the protein complex. The "% Essential" column lists what percent of these proteins are needed for the protein complex to function. The numbers shown under the "% Essential" column are high.
So, for example, when we see a number of 74% as the "% Essential" entry for the C1 complex, what that means is that this protein complex needs 74% of its proteins to do its job. And when when we see a number of 93% as the "% Essential" entry for the C4 complex, what that means is that this C4 protein complex needs 93% of its proteins to do its job.
No one should be very surprised to be reading the type of numbers we see under the "% Essential" column. It is a very common characteristic of complex functional things that they need most of their parts to be functional. Consider an object such as a bicycle. It requires most of its parts to be functional, but not all of them. You can actually ride a bicycle without its seat, although it is very uncomfortable to do so. I once discovered this fact after I rode home in a bicycle without a seat, after a thief had stolen my bicycle's seat.
Or consider an object like an automobile. It requires most of its parts to be functional, but not all of them. You can remove the windshield and tear out most of the seats and tear out the radio, and the automobile will still be useful as a vehicle for travel. But the automobile requires most of its parts to function, and its engine requires a special arrangement of very many parts.
Now, an important question is: how many parts have to be assembled just right for one of these protein complexes to be functional? We can make a rough estimate using the data in the table. Let's start with the C1 protein complex.
According to the table, that complex consists of 35 types of proteins. One of those proteins is the RPA135 protein, which has a special sequence of 1203 amino acids, as you can see on the page here. Another of those proteins is the RPA190 protein, which has a special sequence of 1644 amino acids, as you can see on the page here. Another of those proteins is the RP031 protein, which has a special sequence of 1460 amino acids, as you can see on the page here.
The proteins I just mentioned are some of the most complex proteins that make up this C1 protein complex, and most of the proteins are less complex than the ones I've mentioned. But even most of the less complex proteins each require a special sequence of hundreds of amino acids.
The main facts here are:
(1) This C1 protein complex requires 35 types of protein molecules, 74% of which are "essential" for its function.
(2) Several of these proteins each require a special amino acid sequence consisting of well over a thousand amino acids.
(3) Almost all of these proteins require a special amino acid sequence consisting of at least hundreds of amino acids.
From these facts, we can roughly estimate that all in all this C1 protein complex requires the special arrangement of more than 10,000 amino acids. The functional threshold here is roughly as high as the amount of arrangement of letters you need to produce a blog post or essay of about five pages or about 2000 words, consisting of about 10,000 well-arranged characters. Just as a five-page essay does not need all of its words and all of its sentences to be functional, this C1 protein complex apparently does not need all of its proteins. Just as it is impossible that chance (such as a monkey typing for an hour) might produce a readable, functional five-page essay or a coherent useful readable 2000-word blog post, it is impossible that chance or random mutations or random combinations might produce a special arrangement of 10,000 amino acids needed to produce this protein complex.
It is very appropriate to compare the difficulty of getting just-right arrangements of amino acids and the difficulty of getting just-right arrangements of characters in the English alphabet, partially because in both cases the likelihood of chance combinations being functional is similar. Just as there are 26 characters used in the English alphabet, there are 20 amino acids used by living things; and a random permutation in a gene may produce nucleotide base pairs corresponding to any of 20 amino acids.
We have a similar situation in regard to the second protein complex mentioned in the image above. The protein complex is named as the C4 protein complex, and the paper says it uses 27 types of proteins, and that 93% of them are "essential." One of those proteins is the BMS1 protein, which has a special sequence of 1183 amino acids, as you can see on the page here. Another of those proteins is the ECM16 protein, which has a special sequence of 1267 amino acids, as you can see on the page here.
The main facts here are:
(1) This C4 protein complex requires 27 types of protein molecules, 93% of which are "essential" for its function.
(2) Several of these proteins each require a special amino acid sequence consisting of well over a thousand amino acids.
(3) Almost all of these proteins require a special amino acid sequence consisting of at least hundreds of amino acids.
From these facts, we can roughly estimate that all in all this C4 protein complex requires the special arrangement of more than 10,000 amino acids. The functional threshold here is roughly as high as the amount of arrangement of letters you need to produce a blog post or essay of about five pages or about 2000 words, consisting of about 10,000 well-arranged characters. Just as it is impossible that chance (such as a monkey typing for an hour) might produce a readable, functional, useful five-page essay or coherent 2000-word blog post, it is impossible that chance or random mutations or random combinations might produce a special arrangement of 10,000 amino acids needed to produce this protein complex.
My image above shows only the first two protein complexes listed in Table 1 of the scientific paper. There are other protein complexes listed with similar levels of complexity and similar functional thresholds. All of this accidentally unachievable complexity is involved in a mere microscopic yeast.
In the human body there are thousands of types of protein complexes that are mysteriously assembling every day. Their assembly is not explained by DNA. DNA and its genes specify which amino acids make up a particular protein. But DNA and its genes do not specify the structure of any protein complex. The visual below shows what is and is not specified by DNA.
There are six main reasons why we must regard the protein complexes in the human body as as accidentally unachievable, things that cannot be explained by unguided natural processes.
Reason #1: Chance processes such as Darwinian evolution could never produce the genes needed to make the proteins that make up such protein complexes (the gene origination problem). To perform the task a particular protein molecule performs, a type of protein molecule typically requires some specific fine-tuned gene, an amino acid sequence with most or nearly all of the protein's actual amino acid sequence, a chain of hundreds or thousands of amino acids specially arranged to produce a functional effect. Evolutionary biologist Richard Lewontin stated, "It seems clear that even the smallest change in the sequence of amino acids of proteins usually has a deleterious effect on the physiology and metabolism of organisms." A biology textbook tells us, "Proteins are so precisely built that the change of even a few atoms in one amino acid can sometimes disrupt the structure of the whole molecule so severely that all function is lost." And we read on a science site, "Folded proteins are actually fragile structures, which can easily denature, or unfold." Another science site tells us, "Proteins are fragile molecules that are remarkably sensitive to changes in structure." A paper describing a database of protein mutations tells us that "two thirds of mutations within the database are destabilising." Those who think that functional folded protein molecules could gradually arise (getting longer and longer from a small size) will be dismayed to read this statement in a 900+ page textbook on protein chemistry: "Polypeptides less than about 70 amino acids in length should not fold because they should not be able to bury a large enough number of hydrophobic amino acids to overcome the configurational entropy of their random coils." Folding is required for most functional protein molecules.
Accordingly, we cannot explain the origin of genes through some gradualism approach that imagines that first there was one tenth of the gene that was useful for one purpose, and then there was two tenths of the gene that were useful for some other purpose, and then finally we got the version of the gene that humans now have. Human genes with only half of their base pairs or a third of their base pairs are not useful, and their corresponding protein molecules are not useful with half of their amino acids.
But how hard would it be to get by chance or random mutations an amino acid sequence that would be the core of a useful protein molecule? That depends on the number of amino acids in the protein. Here we run into a simple principle that is the bane of all theories of accidental biological origins: the principle that a simple linear increase in the number of parts that must be well-arranged results in an exponential or geometric increase in the unlikelihood of such an arrangement occurring by chance. A small increase in the number of parts quickly results in what is called a combinatorial explosion, in which the number of possible combinations skyrockets. This is why computer security experts often tell you to use at at least 14-characters for the password of any financial account. If you change your password from 7-characters to 14 characters, that doesn't make it merely twice as hard for a hacker trying all combinations to break into your account; instead it is is roughly 10,000,000,000 times harder.
The chart below shows some of the relevant mathematics. If you doubt these numbers, you can verify them using the Large Exponents Calculator here. Since there are 20 different amino acids used in protein, you use 20 in the first row of such a calculator. Numbers such as E+6 refer to powers of ten. So 3.2 E+6 means 3,200,000; 1.024 E+13 means 10,240,000,000,000; and E+26 means 1 followed by 26 zeros. The bottom of the chart is a number of combinations equal to about 1 followed by more than 2600 zeros.
Number of amino acids in a molecule | Number of possible combinations of the molecule's amino acids |
5 | 3.2 E+6 |
10 | 1.024 E+13 |
20 | 1.048576 E+26 |
40 | 1.099511627 E+52 |
80 | 1.208925819 E+104 |
160 | 1.461501637 E+208 |
320 | 2.135987035 E+416 |
640 | 4.562440617 E+832 |
1280 | 2.081586438 E+1665 |
2000 | 1.148130695 E+2602 |
We can see from the chart above that the odds become utterly prohibitive once you start to get amino acid lengths much longer than about 160. Even if you very generously assume that a particular protein molecule only needs to have half of its amino acid sequence matching its actual sequence (an assumption too generous because of what we know about the sensitivity of protein molecules to small changes), you still have a case where we should never expect chance processes to produce successful amino acid sequences (corresponding to functional protein molecules) as long as 320 amino acids.
In innumerable protein complexes in the human body, we have some very complex proteins consisting of very long amino acids chains that we should never expect to have arisen by chance or Darwinian processes, never in the entire visible universe even given billions of years. I will give some numbers (you can find the specifics in my post here which names which proteins I am talking about):
- One of the protein complexes (the spliceosome) has a protein consisting of 2335 well-arranged amino acids.
- Another of the complexes (the apoptosome) had a protein consisting of 1248 well-arranged amino acids.
- The nuclear pore protein complex has one protein requiring 2090 well-arranged amino acids, and another protein requiring 2012 well-arranged amino acids, along with three other types of proteins each requiring more than 1000 well-arranged amino acids.
- The origin recognition complex/replicative helicase complex requires 7 types of proteins that each required more than 700 well-arranged amino acids.
- The RNA polymerase II protein complex has five types of proteins each requiring more than 2000 well-arranged amino acids, and three other types of proteins each requiring more than 1000 well-arranged amino acids.
Cell transcription occurs quickly. The source here lists a time of ten minutes for a gene to be transcribed by a mammal, but another source lists a speed of only about a minute. The great majority of that is used up by the reading of base pairs from the gene, with typically more than 1000 base pairs being read each time a gene is transcribed. The finding of the correct gene to read in DNA seems to occur in only seconds, not minutes, or at most a few minutes.
Descriptions of DNA transcription fail to explain a huge issue: how does a cell find the right gene in DNA so quickly? Human DNA contains more than 20,000 genes, each of which is just a section of the DNA. The DNA is like an extremely long necklace of many thousands of beads, and a typical gene is like a group of several hundred of those beads. We should actually imagine multiple such necklaces, because DNA is scattered across 23 different chromosome pairs. Now if genes had gene numbers, and DNA was a set of numbered genes in numerical order, it might be easy to find a particular gene. So if a cell knew that it was trying to find gene number 4,233, it could use a binary search method that would allow it to find that gene pretty quickly.
But no such method can be used within the human body. Genes do not have gene numbers that can be accessed within the human body, and DNA is not numerically sorted. DNA has no indexes that might allow a cell to find some particular gene that it was trying to find within DNA. So we have an explanatory "needle in a haystack" problem. Or we might call it a "needle in the haystacks" problem, because human DNA is scattered across 23 different chromosome pairs, as shown in the diagram below:
A scientific text tells us some information that makes this explanatory problem seem more pressing:
"One might have predicted that the information present in genomes would be arranged in an orderly fashion, resembling a dictionary or a telephone directory. Although the genomes of some bacteria seem fairly well organized, the genomes of most multicellular organisms, such as our Drosophila example, are surprisingly disorderly. Small bits of coding DNA (that is, DNA that codes for protein) are interspersed with large blocks of seemingly meaningless DNA. Some sections of the genome contain many genes and others lack genes altogether. Proteins that work closely with one another in the cell often have their genes located on different chromosomes, and adjacent genes typically encode proteins that have little to do with each other in the cell. Decoding genomes is therefore no simple matter. Even with the aid of powerful computers, it is still difficult for researchers to locate definitively the beginning and end of genes in the DNA sequences of complex genomes, much less to predict when each gene is expressed in the life of the organism. Although the DNA sequence of the human genome is known, it will probably take at least a decade for humans to identify every gene and determine the precise amino acid sequence of the protein it produces. Yet the cells in our body do this thousands of times a second."
We have here a very severe navigation problem. A cell is somehow able to find the right gene in only seconds or a few minutes when a new protein is made, even though DNA and chromosomes seem to have no physical organization that could allow for such blazing fast access to the right information. In an article on Chemistry World, we read this:
"How does the machinery that turns genes into proteins know which part of the genome to read in any given cell type? ‘To me that is one of the most fundamental questions in biology,’ says biochemist Robert Tjian of the University of California at Berkeley in the US: ‘How does a cell know what it is supposed to be?"
Biochemist Tjian has spoken just as if he had no idea how it is that a cell is able to navigate to the right place to read a particular gene in DNA. Later in the article we read this:
"For one thing, the regulatory machinery ‘is unbelievably complex’, says Tjian, comprising perhaps 60–100 proteins – mostly of a class called transcription factors (TFs) – that have to interact before anything happens. ....As well as promoters, mammalian genes are controlled by DNA segments called enhancers. Some proteins bind to the promoter site, others bind to the enhancer, and they have to communicate. ‘This is where things get bizarre, because the enhancer can sit miles away from the promoter,’ says Tjian – meaning, perhaps, millions of base pairs away, maybe with a whole gene or two in between. And the transcription machinery can’t just track along the DNA until it hits the enhancer, because the track is blocked. In eukaryotes, almost all of the genome is, at any given moment, packaged away by being wrapped around disk-shaped proteins called histones. These, says Tjian, ‘are like big boulders on the track’: you can’t get past them easily.... ‘Even after 40 years of studying this stuff, I don’t think we have a clear idea of how that looping happens,’ says Tjian. Until recently, the general idea was that the TFs and other components all fit together into a kind of jigsaw, via molecular recognition, that will bridge and bind a loop in place while transcription happens. ‘We molecular biologists love to draw nice model schemes of how TFs find their target genes and how enhancers can regulate promoters located millions of base pairs away,’ says Ralph Stadhouders of the Erasmus University Medical Centre in Rotterdam, the Netherlands. ‘But exactly how this is achieved in a timely and highly specific manner is still very much a mystery.’ "
Later in the article Tjian says he was shocked by the speed at which some of the process occurs. He expected it would take hours, but found something much different:
"The residence times of these proteins in vivo was not minutes or hours, but about six seconds!’, he says. ‘I was so shocked that it took me months to come to grips with my own data. How could a low-concentration protein ever get together with all its partners to trigger expression of a gene, when everything is moving at this unbelievably rapid pace?’ "
The rest of the article is just some speculation, which Tjian mostly knocks down, and the article itself calls "hand-wavy." We are left with the impression that no one understands how cells are able to instantly find the right gene.
- The one-dimensional organization of amino acids found in the sequence of amino acids that makes up a protein;
- the three-dimensional organization of such a sequence to make a complex folded three-dimensional shape needed for a particular protein molecule to function properly;
- the entirely different three-dimensional organization needed for the proteins of a protein complex to fit together in the right way to make a physical arrangement so complex that it may be called a "molecular machine."
- "The majority of cellular proteins function as subunits in larger protein complexes. However, very little is known about how protein complexes form in vivo." Duncan and Mata, "Widespread Cotranslational Formation of Protein Complexes," 2011.
- "While the occurrence of multiprotein assemblies is ubiquitous, the understanding of pathways that dictate the formation of quaternary structure remains enigmatic." -- Two scientists (link).
- "A general theoretical framework to understand protein complex formation and usage is still lacking." -- Two scientists, 2019 (link).
- "Protein assemblies are at the basis of numerous biological machines by performing actions that none of the individual proteins would be able to do. There are thousands, perhaps millions of different types and states of proteins in a living organism, and the number of possible interactions between them is enormous...The strong synergy within the protein complex makes it irreducible to an incremental process. They are rather to be acknowledged as fine-tuned initial conditions of the constituting protein sequences. These structures are biological examples of nano-engineering that surpass anything human engineers have created. Such systems pose a serious challenge to a Darwinian account of evolution, since irreducibly complex systems have no direct series of selectable intermediates, and in addition, as we saw in Section 4.1, each module (protein) is of low probability by itself." -- Steinar Thorvaldsen and Ola Hössjerm, "Using statistical methods to model the fine-tuning of molecular machines and systems," Journal of Theoretical Biology.











