Future and Cosmos: 4 Reasons Why Very Much of Protein Complex Origination Is Physically Inexplicable

Humans have more than 20,000 types of protein molecules, partially specified by about 20,000 genes, each of which lists the amino acid sequence used by a protein. Each protein uses a different sequence of amino acids. Different types of proteins combine to form teams of proteins called protein complexes. Individual proteins might be called building blocks of protein complexes, although such a term might mislead you, because a building block such as a brick is very simple, while a protein typically consists of hundreds of well-arranged parts and thousands of well-arranged atoms.

In this post I will explain some reasons why the origination of protein complexes are in general inexplicable by material science. By "origination" I mean the first-ever appearance of such a protein complex, and also the most recent origination of such a complex. I will start with an explanation of why the origination of proteins is physically inexplicable. If there is any reason why the origin of a protein is physically inexplicable, that is also a reason why the origin of a protein complex using that protein is physically inexplicable.

Reason #1: We Have No Credible Physical Explanation for the Origin of Most of the Genes Corresponding to Particular Proteins

A gene is part of the DNA in the chromosome, a part that specifies the amino acid sequence of a particular type of protein molecule. Since a gene does not specify the three-dimensional shape of a protein molecule, one might say that a gene kind of half-specifies how to make a particular type of protein molecule. No type of protein molecule can exist until their first exists a gene specifying the amino acid sequence of the protein.

Let us look at rather straightforward calculations leading to the conclusion that there exists no credible explanation for the origin of genes. There are 20 amino acids used by living things. The median amino acid length of a human protein is 375 amino acids. So to calculate the chance of a set of amino acids randomly forming into the exact set of amino acids used by a functional protein such as an enzyme, the correct figure is 1 in 20 to the three-hundred-and-seventy-fifth power. This is a probability of about 1 in 10 to the four-hundred-eighty-seventh power. Very precisely, we can say that the chance of a random sequence of amino acids exactly matching that of a protein with 375 amino acids is a probability of 1 in 7.695704335 X 10⁴⁸⁷. That is a probability similar to the probability of you correctly guessing (with 100% accuracy) the ten-digit telephone numbers of 48 consecutive strangers. The calculation is shown in the visual below:

Now, for a protein such as an enzyme to function properly, it must have a sequence of amino acids close to its actual sequence. Experiments have shown that it is easy to ruin a protein molecule by making minor changes in its sequence of amino acids. Such changes will typically “break” the protein so that it will no longer fold in the right way to achieve the function that it performs. A biology textbook tells us, "Proteins are so precisely built that the change of even a few atoms in one amino acid can sometimes disrupt the structure of the whole molecule so severely that all function is lost." And we read on a science site, "Folded proteins are actually fragile structures, which can easily denature, or unfold." Another science site tells us, "Proteins are fragile molecules that are remarkably sensitive to changes in structure." But we can imagine that a protein molecule might still be functional if some minor changes were made in its sequence of amino acids.

Let us assume that for a protein molecule to retain its function, at least half of the amino acids in a functional protein have to exist in the exact sequence found in the protein. Under such an assumption, to calculate the chance of the functional protein forming by chance, rather than calculating a probability of 1 in 20 to the three-hundredth-and-seventh-fifth power, we might calculate a probability of 1 in 20 to the one-hundred-eighty-seventh power (187 being about half of 375). This would give us a probability equal to about 1 in 10 to the two hundred and forty-third power, a probability of about 1 in 10²⁴³. The calculation is shown below.

Very generously assuming that only half of the amino acid sequence of a gene is necessary for a functional protein to arise from the gene, we are still left with an utterly prohibitive probability: a probability so low that we should never expect an event with that probability to ever occur in the history of the universe. The probability above is about the probability of you correctly guessing the 10-digit phone numbers of 24 consecutive strangers. It would seem that some miracle of luck would be required for there ever to appear any gene specifying a functional protein molecule.

The difficulty involved here is one totally unknown to Darwin, who had no idea that the basic components of living things are protein molecules, most with hundreds of well-arranged amino acids, and thousands of well-arranged atoms. If you think that we can explain the origin of genes by imagining merely one gene arising from another, a look at the topic of what are called orphan genes should discourage such optimism. An orphan gene is one that seems to exist uniquely in the DNA or genome of some particular species. Apparently no other organism has that same gene. The scientist quoted previously explains it this way:

“Orphan genes are found every time a new genome is sequenced. Their ubiquity has been one of the biggest surprises of genomics over the last 20 years. Many researchers had hypothesised that the number of orphan genes found would steadily diminish as more and more genomes were sequenced – but this is not the case. Orphan genes continue to comprise a sizeable proportion of each new genome sequenced.”

In the quote above and the quote below, the scientist seems to suggest two things: (1) that the existence of so many orphan genes was not expected or predicted under orthodox assumptions; (2) that orphan genes are receiving little attention, as if they were some kind of embarrassment that is being swept under the rug. The scientist states this:

“Orphan genes are 'the hard problem' for evolutionary genomics. Because we can't find other genes similar to them in other species, we can't build family trees for them. We cannot hypothesise their gradual evolution; instead they seem to appear out of nowhere. Orphan genes receive comparatively little attention from the research community. I suspect this is partly because they are such a difficult problem.”

The paper here also alerts us that biologists are not paying proper attention to this issue. Referring to orphan genes, the paper states the following:

“It is surprising that orphans have been completely ignored by most comparative genomics studies...Balanced and careful consideration of both commonalities and differences is vital for any proper comparative approach. We believe that (as in case of taxonomy) comparative genomics requires balanced consideration of both commonalities and differences. Oddly enough, most attention at present has been paid to genes which are shared and highly conserved throughout evolution, and not to those which are unique or lineage-specific.”

It seems from the statement above that our evolutionary biologists are guilty of great confirmation bias, the type of bias in which one eagerly seeks out evidence for one's beliefs, while failing to look for evidence that contradicts such beliefs.

Reason #2: We Have No Credible Physical Explanation for How a Transcription Event Could Promptly Find the Right Gene to Make a Particular Protein (a "Needle From the Haystack" Type of Event)

Cells are constantly creating new proteins to replace proteins that disappeared because of the short lifetimes of proteins. The page here has a chart showing the lifetimes of human proteins, and we see a bar graph showing most of the proteins have a half-life between about 10 hours and 70 hours. A muscle protein might live for three weeks, but a liver protein might live for only a few days. To create new proteins, a cell uses a process called gene transcription. In this process a particular gene in DNA will be converted to a messenger RNA molecule that helps to build the new protein.

Cell transcription occurs quickly. The source here lists a time of ten minutes for a gene to be transcribed by a mammal, but another source lists a speed of only about a minute. The great majority of that is used up by the reading of base pairs from the gene, with typically more than a 1000 base pairs being read each time a gene is transcribed. The finding of the correct gene to read in DNA seems to occur in only seconds, not minutes, or at most a few minutes.

Descriptions of DNA transcription fail to explain a huge issue: how does a cell find the right gene in DNA so quickly? Human DNA contains more than 20,000 genes, each of which is just a section of the DNA. The DNA is like an extremely long necklace of many thousands of beads, and a typical gene is like a group of several hundred of those beads. We should actually imagine multiple such necklaces, because DNA is scattered across 23 different chromosome pairs. Now if genes had gene numbers, and DNA was a set of numbered genes in numerical order, it might be easy to find a particular gene. So if a cell knew that it was trying to find gene number 4,233, it could use a binary search method that would allow it to find that gene pretty quickly.

But no such method can be used within the human body. Genes do not have gene numbers that can be accessed within the human body, and DNA is not numerically sorted. DNA has no indexes that might allow a cell to find some particular gene that it was trying to find within DNA. So we have an explanatory "needle in a haystack" problem. Or we might call it a "needle in the haystacks" problem, because human DNA is scattered across 23 different chromosome pairs, as shown in the diagram below:

A scientific text tells us some information that makes this explanatory problem seem more pressing:

"One might have predicted that the information present in genomes would be arranged in an orderly fashion, resembling a dictionary or a telephone directory. Although the genomes of some bacteria seem fairly well organized, the genomes of most multicellular organisms, such as our Drosophila example, are surprisingly disorderly. Small bits of coding DNA (that is, DNA that codes for protein) are interspersed with large blocks of seemingly meaningless DNA. Some sections of the genome contain many genes and others lack genes altogether. Proteins that work closely with one another in the cell often have their genes located on different chromosomes, and adjacent genes typically encode proteins that have little to do with each other in the cell. Decoding genomes is therefore no simple matter. Even with the aid of powerful computers, it is still difficult for researchers to locate definitively the beginning and end of genes in the DNA sequences of complex genomes, much less to predict when each gene is expressed in the life of the organism. Although the DNA sequence of the human genome is known, it will probably take at least a decade for humans to identify every gene and determine the precise amino acid sequence of the protein it produces. Yet the cells in our body do this thousands of times a second."

We have here a very severe navigation problem. A cell is somehow able to find the right gene in only seconds or a few minutes when a new protein is made, even though DNA and chromosomes seem to have no physical organization that could allow for such blazing fast access to the right information. In an article on Chemistry World, we read this:

"How does the machinery that turns genes into proteins know which part of the genome to read in any given cell type? ‘To me that is one of the most fundamental questions in biology,’ says biochemist Robert Tjian of the University of California at Berkeley in the US: ‘How does a cell know what it is supposed to be?"

Biochemist Tjian has spoken just as if he had no idea how it is that a cell is able to navigate to the right place to read a particular gene in DNA. Later in the article we read this:

"For one thing, the regulatory machinery ‘is unbelievably complex’, says Tjian, comprising perhaps 60–100 proteins – mostly of a class called transcription factors (TFs) – that have to interact before anything happens. ....As well as promoters, mammalian genes are controlled by DNA segments called enhancers. Some proteins bind to the promoter site, others bind to the enhancer, and they have to communicate. ‘This is where things get bizarre, because the enhancer can sit miles away from the promoter,’ says Tjian – meaning, perhaps, millions of base pairs away, maybe with a whole gene or two in between. And the transcription machinery can’t just track along the DNA until it hits the enhancer, because the track is blocked. In eukaryotes, almost all of the genome is, at any given moment, packaged away by being wrapped around disk-shaped proteins called histones. These, says Tjian, ‘are like big boulders on the track’: you can’t get past them easily.... ‘Even after 40 years of studying this stuff, I don’t think we have a clear idea of how that looping happens,’ says Tjian. Until recently, the general idea was that the TFs and other components all fit together into a kind of jigsaw, via molecular recognition, that will bridge and bind a loop in place while transcription happens. ‘We molecular biologists love to draw nice model schemes of how TFs find their target genes and how enhancers can regulate promoters located millions of base pairs away,’ says Ralph Stadhouders of the Erasmus University Medical Centre in Rotterdam, the Netherlands. ‘But exactly how this is achieved in a timely and highly specific manner is still very much a mystery.’ "

Later in the article Tjian says he was shocked by the speed at which some of the process occurs. He expected it would take hours, but found something much different:

"The residence times of these proteins in vivo was not minutes or hours, but about six seconds!’, he says. ‘I was so shocked that it took me months to come to grips with my own data. How could a low-concentration protein ever get together with all its partners to trigger expression of a gene, when everything is moving at this unbelievably rapid pace?’ "

The rest of the article is just some speculation, which Tjian mostly knocks down, and the article itself calls "hand-wavy." We are left with the impression that no one understands how cells are able to instantly find the right gene.

Reason #3: We Have No Credible Explanation of How Any Protein Molecule Could Reach the Three-Dimensional Folded State Needed for Its Function

A fundamental question is: how do protein molecules get their three- dimensional shapes? This problem is known as the protein folding problem. We might have an answer for this if it happened that each amino acid stored in it three numbers specifying the 3D position that it should go to. We can imagine a setup in which an amino acid would store three different numbers: one representing the X-axis coordinate that the amino acid should exist at, another representing the Y-axis coordinate the amino acid should go to, and a third representing the Z-axis coordinate the amino acid should go to. We can imagine some complicated molecular machinery that would read such numbers, and drag each amino acids to the appropriate X, Y and Z coordinates (a particular point in 3D space) that the amino acid should go to. Under such a system, a 3D protein molecule like the one below might be constructed from a one-dimensional string-like chain of amino acids.

But that is not at all the way nature works. An amino acid does not store any numbers. An amino acid stores neither 3D coordinate numbers, nor any other type of number. So how do the more than 20,000 types of protein molecules in our bodies get their intricate 3D shapes?

Around the year I was born, the question was a profoundly troubling one for materialist biologists. It seemed around then that nature was making very many thousands of intricate hard-to-achieve 3D molecular shapes, and no one knew how it was happening. The materialist biologist was therefore like some owner of a private island who kept seeing endless varieties of intricate sand castles being constructed on the beaches of the island, without any explanation of who was doing it.

Eventually an idea arose that helped make materialist biologists feel much better. The idea was that the three-dimensional shape of each protein molecule was somehow determined by its one-dimensional sequence of amino acids. The idea was originally presented under the name of the Thermodynamic Hypothesis. The idea was that there was one particular 3D shape under which some polypeptide chain would use the least amount of free energy, and that polypeptide chains migrated to this state, which corresponded to their folded 3D shapes. This Thermodynamic Hypothesis was stated like this by Christian B. Anfinsen in 1973: "This hypothesis states that the three-dimensional structure of a native protein...is the one in which the Gibbs free energy of the whole system is lowest; that is the native conformation is determined by the totality of inter-atomic interactions and hence by the amino acid sequence, in a given environment." Later the same idea was called Anfinsen's Dogma, and was stated as simply the idea that the three-dimensional structure of a protein molecule is determined by its one-dimensional amino acid sequence.

Anfinsen's Dogma is represented by the visual below:

There were some reasons why Anfinsen's Dogma never was plausible. In 1969 scientist Cyrus Levinthal calculated that a protein with about 100 amino acids could be folded into about 3 to the 198^th power shapes. If a protein molecule were to try so many shape permutations looking for and finding some state in which "the free energy of the state is lowest," it would have to explore so many possibilities that it would take very many years – eons actually. But instead a particular protein will very rapidly form into a characteristic three-dimensional shape, in a very short time – seconds for small proteins, and minutes for large proteins. So it never made any sense to think that protein molecules reached their 3D shapes because they were finding some ridiculously hard-to-find state of minimum free energy. This discrepancy between the calculated ridiculously long time protein folding should take (under a "thermodynamic hypothesis" such as Anfinsen postulated) and the actual very short time it does take is known as Levinthal's paradox.

But Christian B. Anfinsen claimed to have done some experiments supporting his dogma. He did some experiments in which he took one of the simplest proteins (something called ribonuclease), and caused it to lose its folded shape, by a process called denaturation. Anfinsen claimed that he had observed ribonuclease revert back to its folded three-dimensional shape. He claimed that this was evidence that the three-dimensional shape of the protein was a mere function of the amino acid sequence. This was always weak evidence for a claim that protein molecules in general get their three-dimensional shapes solely as a consequence of their amino acid structure and the laws of chemistry and physics. One reason was that ribonuclease has only 124 amino acids, but most protein molecules have far more amino acids. The average number of amino acids in a human protein molecule is about 470, and many types of human protein molecules have much more than 500 amino acids (many types having nearly 1000 amino acids, and quite a few types of protein molecules having more than 1000 amino acids).

Although his experimental evidence for Anfinsen's Dogma was weak, Anfinsen won the Nobel Prize in Chemistry in 1972, along with two other scientists, specifically for his experiments with ribonuclease. We should not be too impressed by this fact. We must remember that when scientists really, really want to believe something, they may tend to award some prize for experimental or observational activity that claimed to back up the cherished belief. The awarding of the Nobel Prize to Anfinsen and his colleagues was part of the social construction of the triumphal legend that Anfinsen's Dogma had been backed up by experimental work.

A 2012 paper has a statement suggesting that scientists were lazy about trying to produce some other experiments that would support Anfinsen's Dogma. It states this: "In the half-century since the annunciation of the Anfinsen postulate, there has appeared no evidence which contradicts it, but neither, seemingly, has there been any systematic experimental work on other proteins which would have further established its validity." We should not take the first half of that statement too seriously, because scientists often claim that is no evidence contradicting some beloved dogma, when there does exist very much such evidence.

A 2018 paper ("Modeling protein folding in vivo") suggests that the assumptions of Anfinsen were incorrect, and were derived from biased experiments dealing with a set of simpler-than-average proteins. The paper states the following, using the term "in vitro" to mean "in a lab setting," "native conformations" to refer to the 3D shapes of protein molecules, and "denatured" to refer to proteins that have lost their characteristic three-dimensional shape, and reverted to a simpler string-like or chain-like one-dimensional shape:

"These models arose from studies conducted in vitro on a biased sample of smaller, easier-to-isolate proteins, whose native structures appear to be thermodynamically stable. Meanwhile, the vast empirical data on the majority of larger proteins suggests that once these proteins are completely denatured in vitro, they cannot fold into native conformations without assistance. Moreover, they tend to lose their native conformations spontaneously and irreversibly in vitro, and therefore such conformations must be metastable."

Referring to "premature optimism," the paper discusses a kind of "rush to uncork champagne bottles" involved with the Anfinsen experiments:

"The most famous of these studies were the experiments by C. Anfinsen and colleagues, which observed that some small proteins, notably pancreatic ribonuclease (RNAse A), will fold spontaneously to their native conformations from an apparently completely denatured state after the restoration of favorable conditions in vitro; such an ability was postulated – in our opinion, with premature optimism – to be inherent to most proteins. These ideas gave rise to the 'thermodynamic hypothesis' stating that 'the three-dimensional structure of a native protein in its normal physiological milieu...is the one in which the Gibbs free energy of the whole system is the lowest' [17]. In other words, under physiological conditions all proteins were assumed to be able to fold spontaneously into their native conformation."

The paper states the following, using the term "in vitro" to mean "in a lab setting," "denatured" to refer to proteins that have lost their characteristic three-dimensional shape, and reverted to a simpler string-like or chain-like one-dimensional shape, and the term "native conformation" to refer to the three-dimensional shape that protein molecules have in living organisms:

"Simple and elegant as these models are, they fail to adequately accommodate some common empirical observations. The first one is the widely observed protein physical instability in vitro: most protein preparations that are initially isolated from cells in an active native conformation are not stable in vitro and inevitably denature and lose such native conformation (reviewed in [13,14,15,16]). The second is the body of experimental observations that even seemingly stable proteins, once experimentally denatured in vitro in isolation from other cell components, are often unable to fold back into their native conformations upon return to physiological conditions [29,30,31,32,33,34,35]. This phenomenon is observed for all classes of proteins, though it becomes more obvious and almost universal for proteins of larger sizes. It has been shown that many such proteins require the assistance of molecular chaperones for successful folding (reviewed in [36])...We are now witnessing the emergence of a third observation that casts doubt on the applicability of the thermodynamic folding model to the majority of proteins: despite the tremendous intellectual and computational efforts invested into modeling of protein folding in silico, software based on the current thermodynamic theory of folding is able to model the folding paths of only very short proteins, and the process is slow [41,42,43]. In other words, the model in which a polypeptide with a random starting conformation slides down the energy funnel towards the thermodynamic minimum, reducing its free energy at every step in the process, does not appear to yield successful in silico recapitulation of the folding pathways for the majority of proteins."

The limited success of the AlphaFold software (in attempts at protein folding prediction) does not invalidate any of the statements above. The AlphaFold software is able to predict the shape of many proteins not by any thermodynamic calculation process that tends to validate Anfinsen's Dogma, but instead by a frequentist "pattern matching" approach that relies on some vast database of known 3D protein shapes and their corresponding amino acid sequences. In discussions of the protein folding problem, it is very important to not mix up two very different problems:

(1) The protein folding problem, which is the problem of how it is that one-dimensional polypeptide sequences (chains of amino acids) very quickly within organisms fold into a three-dimensional shape needed for the function.

(2) The protein folding prediction problem, which is the problem of what computer techniques can be used to accurately predict the three-dimensional shape of a protein molecule, giving its one-dimensional polypeptide sequence.

The AlphaFold software has made progress on the second of these problems, not the first. News reports about the AlphaFold software will often inaccurately describe it as having made progress on the "protein folding problem" (the first of these problems), but such reports should be only reporting that progress has been made on the second of these problems (the protein folding prediction problem).

Later attempts to replicate Anfinsen's work with ribonuclease have raised grave doubts about how valid his research was. A very interesting paper published in the year 2022 was entitled "The Anfinsen Dogma: Intriguing Details Sixty-Five Years Later." In it a team of scientists reported many a problem in trying to replicate Anfinsen's work with ribonuclease. They seemed to get only a small fraction of the success that is generally claimed in accounts of Anfinsen's experiments.

Referring to what have been called metamorphic or "moonlighting" proteins which seem to be able to assume different 3D shapes, a paper states this about Anfinsen's "one sequence, one structure" dogma:

"Moreover, nuclear magnetic resonance spectroscopy (NMR)-based and computational studies have demonstrated that each protein sequence can have considerable structural plasticity, such that the 'one sequence, one structure' dogma does not capture the complex nature of a protein’s structure. In fact, this flexibility is an intrinsic feature that contributes directly to the biological function of many proteins."

At the www.researchgate.net site (an "expert answers" site similar to Quora.com), there is a page handling the question "Are Anfinsen and Levinthal still considered valid in protein folding? The question is basically asking whether Anfinsen's Dogma is any kind of explanation for the biologically vital process of protein folding. A Michael Crabtree of Oxford University claims "Anfinsen's conclusion - that protein structures are encoded within their sequence - is still the main hypothesis for how proteins fold." Be suspicious when a scientist does not claim that something is proven, but merely claims that it is "well-established" or "not controversial," for scientists often use such phrases to describe dubious claims that are not actually well-established. And when a scientist does not claim that something is well-established, but merely says that it is the "main hypothesis" to explain something, that means very little, because his "main hypothesis" to explain something may be a very bad one. Crabtree's response is then vigorously disputed at length on the page by Boguslaw Stec PhD. He states this:

"As you see there are significant developments that no longer support a simplistic notion of sequence-folding-function direct relationship. The best proof is an entire career of Baker who is the most prominent protein modeler in the world now. He showed a complete failure of the energy based optimization schemes for protein modeling."

Stec makes this sobering observation:

"This is mostly in line with a sobering recent realization of NIH in the US that around 90% all biology science results are NOT repeatable. Scientist publish what worked not a majority of experiments that do not, even if this is the same experiment."

After describing at some length why Anfinsen's Dogma does not hold up well in experiments, Stec offers this idea as an alternative:

"It looks like life is tinkering on the edge between stable and unstable world. What it practically means is that proteins are self organized systems that do not have any uniform organizing principle. The only universal principle is a utilitarian need for life (function)."

Self-organization is a phrase that is routinely used by people lacking a theory of organization explaining how some very organized thing got organized. Stec makes it sound rather like proteins are little minds seeking out biological functions, but that cannot explain why sequences of amino acids (polypeptide chains) are able to form so very quickly into the correct three-dimensional shapes needed for biological function. Claiming self-organization in this case is no more credible than trying to explain the origin of well-written functional paragraphs by claiming that the letters self-organized into paragraphs.

Very much undermining Anfinsen's Dogma is the fact that a large fraction of all protein molecules require other protein molecules (called chaperones) in order for them to achieve their folded state. Such an idea discredits the simplistic "amino acid sequence determines 3D folded shape" idea. A Stanford University press release states this:

"Scientists have determined that TRiC chaperones are common in people and other mammals. Estimates are that 10 percent of all mammalian proteins need TRiC in order to fold properly. Another 20 percent bind to the smaller chaperone, Hsp70."

That already give you 30% of protein molecules requiring other protein molecules for them to fold properly, undermining Anfinsen's idea that all you need is the amino acid sequence to get the proper folding for a protein molecule. An encyclopedia page concurs, stating that "20 to 30 percent of polypeptide chains require the assistance of a chaperone for correct folding under normal growth conditions."

Further evidence against Anfinsen's Dogma comes in the fact that a large fraction of all human proteins are what are called "Intrinsically Disordered Proteins," a poor name for a large class of proteins that can each assume many different shapes. A much better name would be "shape-shifting proteins" or "morphologically plastic proteins." Besides such shape-shifting proteins (called IDPs), a protein with a characteristic 3D shape may have some particular part of itself that takes on different shapes, such a part being called an "Intrinsically Disordered Protein Region or IDPR." A rough analogy of proteins with such IDPRs might be a person with a magically shape-shifting face, who always looks the same below the neck, but whose face can shift between different faces. It has been estimated that up to 40% of human proteins are either either such shape-shifting proteins (IDPs) or proteins that have shape-shifting regions (IDPRs). A scientific paper tells us this about such IDPs and IDPRs:

"IDPs/IDPRs, which are characterized by remarkable conformational flexibility and structural plasticity, break multiple rules established over the years to explain structure, folding, and functionality of well-folded proteins with unique structures. Despite the general belief that unique biological functions of proteins require unique 3D-structures (which dominated protein science for more than a century), structure-less IDPs/IDPRs are functional, being able to engage in biological activities and perform impossible tricks that are highly unlikely for ordered proteins. With their exceptional spatio-temporal heterogeneity and high conformational flexibility, IDPs/IDPRs represent complex systems that act at the edge of chaos and are specifically tunable by various means....Overall, IDPs/IDPRs are complex systems with sophisticated structurally and functionally heterogeneous organization. They are uniquely placed at the core of the structure-function continuum concept, where instead of the classical (but heavily oversimplified) 'one gene–one protein–one structure–one function” view, the actual protein structure-function relationship is described by the more convoluted 'one-gene–many-proteins–many-functions' model [92, 93]."

What we have in the case of Anfinsen's Dogma is an example of what has repeatedly occurred in the history of modern biology: the social construction of a dubious achievement legend, one hoisted up triumphantly largely for ideological reasons, so that biologists could claim they understood some great mystery of nature they did not at all understand, and could avoid believing in something they did not want to believe in. It works like this:

(1) Biologists will make observations of some type of extremely impressive phenomenon in nature, or some class of phenomena.

(2) One or more biologists will come up with some simplistic half-baked hypothesis that purports to offer a naturalistic mechanistic explanation for the phenomenon or class of phenomena. Typically such a hypothesis is stated through the repetition of some "sound bite," slogan or catchphrase such as "energy minimization," "natural selection," or "synapse strengthening."

(3) It will be claimed that a few miscellaneous observations or experiments lend support to the hypothesis.

(4) Limitations or defects of the observations or experiments will be ignored, and a grand chorus of biologists will start proclaiming in unison that the hypothesis is a suitable explanation for the phenomenon or class of phenomena.

(5) Gigantic reasons for rejecting the hypothesis will be ignored or swept under the rug.

(6) Illogical aspects of the hypothesis (or aspects contrary to facts) will be ignored or swept under the rug.

(7) A triumphal legend will be socially constructed by the biologist community that the impressive phenomenon or class of phenomena has been explained, because of the hypothesis offered, and the weak cheesy evidence presented in favor of it.

This is exactly what happened in the case of Darwinism, which never offered a credible explanation for the more impressive wonders of biological innovation occurring in natural history, merely offering the cheesy sound-bite slogan of "natural selection" and an implausible appeal to random mutations. This is also what happened in the case of the main phenomena of the human mind, none of which are credibly explained by brain activity, for reasons I explain at great length in the posts of the blog here.

Do not be fooled by claims that Levinthal's Paradox or the protein folding problem has been solved. Such claims are merely additional examples of the countless times scientists have made triumphant declarations that they solved problems they did not actually solve. Each claim that Levinthal's Paradox or the protein folding problem has been solved typically involves appeals to dubious speculative physics, appeals that have not been substantiated by experiments. The different claims of this type all disagree with each other, each presenting a different speculative framework. Claims that Levinthal's Paradox or the protein folding problem has been solved are as dubious and speculative as when some scientist claims to have solved the origin of life, the origin of consciousness or the puzzle of what could have caused the origin of the universe.

A scientific paper states this, using "native conformation" to mean the characteristic 3D shape of a protein molecule:

"The problem of protein folding is one of the most important problems of molecular biology. A central problem (the so called Levinthal's paradox) is that the protein is first synthesized as a linear molecule that must reach its native conformation in a short time (on the order of seconds or less). The protein can only perform its functions in this (often single) conformation. The problem, however, is that the number of possible conformational states is exponentially large for a long protein molecule. Despite almost 30 years of attempts to resolve this paradox, a solution has not yet been found. A number of authors (see, e.g., Ben-Naim, 2013; Onuchic and Wolynes, 2004; Finkelstein et al., 2017) believe that there is a solution, but they disagree on the reasons. Other scientists (see, e.g., Berger and Leighton, 1998; Davies, 2004) believe that the paradox is not yet resolved."

The phenomenon of protein folding is one of the most important things that goes on in nature, and your biological persistence from day to day vitally depends on protein folding occurring each day. Most protein molecules are short-lived. For example, the proteins in brain synapses have an average life of less than two weeks. Your body requires for protein folding to continuously occur, so that short-lived protein molecules can be continually replaced by newly created protein molecules that almost all require just-the-right protein folding to work right. The paper "Systematic study of the dynamics and half-lives of newly synthesized proteins in human cells" tells us this: "The majority of the proteins quantified have half-lives within the range of 4–14 hours. About 6% of all quantified proteins (49) have half-lives <4 hours, while 51 proteins have long half-lives (>14 hours); the median half-life is 8.7 hours."

The long-winded discussion above leads to the conclusion that physical science is unable to explain how protein folding occurs. Since such folding occurs with most protein molecules, this counts as a gigantic reason for concluding that physical science lacks any explanation for the origination of protein molecules.

Reason #4: Physical Science Is Unable to Explain the Improbable Appearance of So Many Useful Protein Complexes, Often So Complex They Are Called "Molecular Machines"

Although scientists have identified most of the proteins that exist in the human body, the task of identifying all the protein complexes and which proteins they are made up is a task that is very largely unfinished. The CORUM database of protein complexes lists 5204 protein complexes, but that number is only a small fraction of the total number of protein complexes that exist. Figure 2 of the document here has a graph showing that roughly 10,000 types of proteins are used to make up these roughly 5000 protein complexes that have been identified by the CORUM database. The same figure shows there is little reuse of protein types within protein complexes. Specifically:

About 5000 proteins are members of only one protein complex.
About 2000 proteins are members of two protein complexes.
About 750 proteins are members of three protein complexes.
About 600 proteins are members of four protein complexes.
About 450 proteins are members of five protein complexes.

The link here (Figure 2 of the paper here) takes you to a very impressive diagram that gives a map of protein complexes in the humble fly Drosophila melanogaster. The groups of circles with the same color are particular protein complexes. While there are very many small protein complexes consisting of only a few proteins, there are also quite a few protein complexes that each consist of dozens of types of proteins, including the complexes that are labeled as "mediator complex," "Snap/SNARE complex," "nucleolus," "proteasome complex," "actin cytoskeleton complex," "histone acethyltranserase complex." and so forth.

Such a situation makes it pretty much impossible to explain the formation of the more complex protein complexes by any kind of random combination effect. I can explain why. Consider the likelihood of getting the word "cat" from a random combination of tokens. If you are dealing with only the 26 lowercase characters of the English alphabet, the chance of such a thing is equal to 1 in 26 to the third power, or 1 in 17576. But if you consider all of the alphanumeric characters, the chance is much smaller, namely 1 in 36 to the third power, or 1 in 46656.

Now, consider some situation in which you need a particular arrangement of three particular proteins out of 20,000 types of proteins, to make a particular protein complex. The chance of getting that arrangement from a random combination of three of those proteins is roughly 1 in 20,000 to the third power, roughly 1 in 8 trillion or 1 in 8,000,000,000. When we start dealing with protein complexes consisting of many proteins that have to be arranged in the right way, the probability of getting the right arrangement from a random combination becomes much smaller. Consider some situation in which you need a particular arrangement of ten particular proteins out of 20,000 types of proteins. The chance of getting that arrangement from a random combination of ten of those proteins is roughly 1 in 20,000 to the tenth power, which equals about 1 in 10 to the 43rd power. You can do calculations to determine such numbers using the Large Exponents Calculator here.

A probability like that (1 in 10 to the 43rd power) is so small we would expect it to never happen in the lifetime of a human. The explanatory problem becomes much worse when you consider that it is not merely necessary for a human body to once or twice create one of the protein complexes needed for life. Such protein complexes must be continually created in very massive numbers for the human body to function properly. The average protein complex only lasts for a relatively short time. Most of the protein complexes needed for life exist in very massive numbers, with there typically existing billions of each of the type of protein complexes.

In my post "Some Accidentally Unachievable Molecular Machines in Your Body" I give examples of incredibly complex protein complexes in the human body. These included the apoptosome. A page describes the action of the individually useless proteins of the apoptosome coming together to form a functional protein complex:

"The process of programmed cell death, also known as apoptosis, is highly regulated, and the decision to die is made through the coordinated action of many molecules. The apoptosome plays the role of gatekeeper in one of the major processes, termed the intrinsic pathway. It lies between the molecules that sense a problem and the molecules that disassemble the cell once the choice is made. Normally, the many subunits of the apoptosome are separated and inactive, circulating harmlessly through the cell. When trouble occurs, they assemble into a star-shaped complex, which activates protein-cutting caspases that get apoptosis started."

Another site that includes a 3D rotating animation of the apoptosome structure above says this:

"The apoptosome is revealed as a wheel-like complex with seven spokes. On top of the wheel is a spiral-shaped disk that allows for docking and subsequent activation of proteases, which then target cellular components. When active, the apoptosome is revealed to be a dynamic machine with three to five protease molecules tethered to the wheel at any given time."

The apoptosome protein complex is shown below:

(Image credit: Wikipedia Commons, derived from Yuan et al. 2010, Structure of an apoptosome-procaspase-9 CARD complex)

Shown above is the apoptosome protein complex involved in programmed cell death. Note the references in the chart to propellers, which remind us how much the complex resembles a product of engineering. There are very many other types of protein complexes in the human body with similar complexity. In my post here I discussed several other protein complexes of comparable complexity, including the spliceosome, the proteasome, and RNA polymerase II. How is it that these teams of proteins so organized and specialized arise? Scientists do not know. It is not true that they arise because some specification for making them is read from DNA. DNA does not specify which proteins belong to particular protein complexes. DNA has no specification of the structure of protein complexes. We cannot credibly imagine such protein complexes massively arising from chance cominbations of proteins. The odds against that are prohibitive, for reasons discussed above.

Below are some quotes in which scientists confess their ignorance about how biologically necessary protein complexes arise:

"The majority of cellular proteins function as subunits in larger protein complexes. However, very little is known about how protein complexes form in vivo." Duncan and Mata, "Widespread Cotranslational Formation of Protein Complexes," 2011.
"While the occurrence of multiprotein assemblies is ubiquitous, the understanding of pathways that dictate the formation of quaternary structure remains enigmatic." -- Two scientists (link).
"A general theoretical framework to understand protein complex formation and usage is still lacking." -- Two scientists, 2019 (link).
"Protein assemblies are at the basis of numerous biological machines by performing actions that none of the individual proteins would be able to do. There are thousands, perhaps millions of different types and states of proteins in a living organism, and the number of possible interactions between them is enormous...The strong synergy within the protein complex makes it irreducible to an incremental process. They are rather to be acknowledged as fine-tuned initial conditions of the constituting protein sequences. These structures are biological examples of nano-engineering that surpass anything human engineers have created. Such systems pose a serious challenge to a Darwinian account of evolution, since irreducibly complex systems have no direct series of selectable intermediates, and in addition, as we saw in Section 4.1, each module (protein) is of low probability by itself." -- Steinar Thorvaldsen and Ola Hössjerm, "Using statistical methods to model the fine-tuning of molecular machines and systems," Journal of Theoretical Biology.

I may note that you do not explain the appearance of biologically necessary protein complexes by simple phrases such as "bonding" or "protein-protein interaction." Part of the mystery is why proteins are continuously forming into biologically necessary protein complexes, special arrangements so unlikely to occur by chance. Something equally mysterious would be happening if little stones and sticks at the edge of the incoming tide kept forming into useful messages over and over again. You would not credibly explain such a wonder by merely saying that it occurs because of random arrangement, and you do not explain the continuous appearance of vast numbers of biologically necessary protein complexes in your body by merely appealing to random arrangements produced by the bonding of individual proteins.

We cannot at all explain the formation of protein complexes by imagining that they tend to form from proteins with corresponding genes that are nearby in DNA. In my post "Some Accidentally Unachievable Molecular Machines in Your Body" I give some examples of impressive protein complexes in the human body, and give tables showing that they are not made up of proteins corresponding to genes that are contiguous or nearby within DNA. In the examples listed in that post, it is quite the opposite situation, with the protein team members of the protein complexes being constructed from very widely scattered genes found on different chromosomes in the nucleus. For example, the splicesome protein complex is constructed from more than a dozen types of proteins corresponding to very widely scattered genes in Chromosome 1, Chromosome 4, Chromosome 5, Chromosome 9, Chromosome 11, Chromosome 17, Chromosome 19 and Chromosome 22.

Scientists have no credible explanation for how it is that proteins form into very organized protein complexes that are so often necessary for the proteins to have any useful function. An honest and very thorough researcher into these matters will be left with an overwhelming impression that there is some unfathomable force of biological organization acting throughout the world of biology to achieve purposeful effects of magnificent biological engineering that are utterly beyond any low-level mechanistic explanation. I call such a force the Global Organizing Activity of a Life-force, or GOAL. For more on the rationale for believing in such a reality, see my post here. A sufficiently deep study of these topics will tend to lead a person towards suspecting how very insufficient are claims that there was merely some "long, long ago" design that helped produce the wonders of biology, and will tend to lead a person towards suspicions that there is some continuous biological dependency of human bodies on some ongoing purposeful agency beyond the mechanistic understanding of biologists.

Saturday, March 16, 2024

4 Reasons Why Very Much of Protein Complex Origination Is Physically Inexplicable

No comments:

Post a Comment