Tuesday, November 14, 2023

The Errors of Assembly Theory

University press offices these days are notorious for their shameless hype, exaggerations and misstatements in announcing new research. A recent example of their work is a press release entitled "Assembly theory unifies physics and biology to explain evolution and complexity," one grandly announcing that "an international team of researchers has developed a new theoretical framework that bridges physics and biology to provide a unified approach for understanding how complexity and evolution emerge in nature." We read in the press release that "this new work on ‘Assembly Theory’, published today (Wednesday October 4) in Nature, represents a major advance in our fundamental comprehension of biological evolution and how it is governed by the physical laws of the universe." We hear a lead author Sara Walker crowing that "with this theory, we can start to close the gap between reductionist physics and Darwinian evolution – it's a major step toward a fundamental theory unifying inert and living matter." The claim of a "new theoretical framework" is incorrect, as the underlying "assembly theory" of Walker and Cronin was introduced three years ago, as proven by the paper here (from November 2020). 

Co-author Leroy Cronin has recently written an article claiming that this "assembly theory" is some grand insight that may "help explain life." A November 1 article by him is entitled "A new theory of matter may help explain life."  Again, the theory that is at least three years old is referred to incorrectly as something "new." 

Below are some of the mistakes I can find in the writings of Cronin and Walker on this topic:

Error #1: Claiming something's assembly history is one of its properties

Very quickly in Cronin's IAI article  we hear something that sounds very wrong: the claim that the "the assembly history of material objects, how they came to have the complexity they do, is an essential property of all matter." A property of a material thing is a single attribute that can be numerically measured, and expressed as a single number. Properties of material things include height, depth, width, volume,  weight, and mass.  The assembly history of some complex object is not one of its properties. 

Error #2: Abusing the word "selection," apparently using it to refer to any addition of a new part when something useful or organized assembles. 

In a paper with the grand title " Assembly Theory Explains and Quantifies the Emergence of Selection and Evolution," Walker, Cronin and other authors use the word "selection" 82 times. You may be very  confused, because it is clear that the word "selection" is being used in a way unlike the dictionary definition (which means a choice made by a conscious agent), and also unlike when biologists use the not-literally-correct term "natural selection," not referring to a choice made by a conscious agent. What's going on is that Cronin and his co-authors seem to be most confusingly using the word "selection" to mean any kind of addition of a new part to anything useful or organized. That isn't one of the dictionary definitions of selection, and the word should not be used in so confusing and misleading a way. Walker, Cronin and the others authors simply start using the word "selection" in this novel way, without ever defining what they mean by "selection," and without ever announcing that they will be using the word "selection" in a new way. 

We can understand why chemists such as Cronin would want to start abusing the word "selection" in such a way. A great longstanding problem is that Darwinian theory is worthless in explaining the origin of life. Darwinian appeals to a so-called "natural selection" which requires life for it to happen, so it can't explain the origin of life. One sneaky and misleading way to try to get that around that is to stretch the word "selection" so that it applies to any kind of combination or joining or assembly.  Then you can kind of cover-up the total absence of natural selection before the origin of life by trying to say that there was some selection, using your strange new definition of selection. But we should not abuse language in that way.  There are already perfectly good words in the English language to refer to things getting more complicated when parts join together or bind. Such words include the words "accumulation," "concatenation," "bonding," and "binding," as well as the phrases "component addition" or "part addition."  

Error #3: Selling as something scientific the imprecise, unscientific idea of "the minimal number of steps necessary to produce the object." 

Walker and Cronin's "assembly theory" includes something they call the assembly index, which they define as "the minimal number of steps necessary to produce the object (its size)."  This is very strange language. Since when is an object its size?  Maybe this is just very clumsy wording in which "its size" refers to "the minimal number of steps." 

There's nothing particularly scientific about a concept of a minimal number of steps needed to produce an object.  That's because the term "step" is a vague, imprecise term. Here is an example showing what I mean. For a woman to produce a newly born child could be considered a single step. Or, if we consider the creation of each of the child's organs, we could think of constructing a new child as requiring about twenty steps. Or if we consider each new cell needed to make the new child, then we might consider the creation of a new child as involving many, many billions of steps. Or if we consider each new chemical reaction or each new molecule needed to make the new child, then we might think it requires trillions of steps.  Similarly, if we consider how the United States separated from the British empire, we can consider that as one step (the American Revolutionary war), or as about twenty steps (each battle in the war and each major event like the signing of the Declaration of Independence), or we can consider it as occurring by  many thousands of events (each important pen stroke or firing of a bullet being considered a separate step). 

"Steps" is a vague, imprecise word, so there's nothing scientific in the concept of "the minimal number of steps needed to produce an object."  It seems futile to try and unravel exactly what Cronin and Walker are talking about in their paper here, because it all hinges on frequent repetitions of the word "step," and the paper is never clear about what it means by a step. I looked through all 33 uses of the word "step" in that paper, and found that the paper never defines what it means by a step. It's the same thing in Walker and Cronin's paper "Quantifying the pathways to life using assembly spaces." That one has 17 uses of the word "step" without ever defining what it means by "step." 

In another paper Walker and Cronin say, "The assembly index of an object is the length of the shortest pathway to construct the object starting from its basic building blocks." That's also unprecise language, because "basic building blocks" is not a precise concept in talking about the origin of things such as life, and "shortest pathway" is also imprecise. What should we consider as building blocks of one-celled life? Is it the organelles that make up the cell? Is it protein complexes used to make up such organelles? Is it the proteins themselves? Is it the amino acids that make up the proteins? Or is it atoms that make up such amino acids? For any very complex object, there is a "combinatorial explosion" involving possible assembly pathways, meaning that the number of potential pathways to create that object is almost infinite, and we can neither innumerate all those pathways nor calculate which one is shortest. Under the assumption that the addition of each nucleotide base pair is a step, you could calculate the shortest pathway to get a DNA molecule. But the number of pathways by which a cell could be constructed from scratch is almost infinite, meaning the idea of trying to calculate a shortest pathway becomes for all practical purposes impossible once you get up to the complexity of a self-reproducing cell. 

Error #4: The incorrect idea that duplicating any previously reached object or sequence requires only one step.

In the paper here, Walker and Cronin state this:

"The ordering is important since we start from the atoms, and add bonds, in sequence. Once we have generated a given motif on the path, this motif remains available for reuse."

Later they make this strange claim: "The process of constructing new objects retains the memory of the past formation of objects." What this amounts to is the extremely erroneous principle that in calculating the steps needed to make something, you can count a duplication of any previously reached object or sequence as requiring only one step. 

In Figure 1 of the paper above, we have a little diagram giving a rather trivial example of how Walker and Cronin calculate the number of steps needed to make something, using this erroneous idea. They consider a sequence of blue squares and white squares which is the equivalent of BWBWBWBWBWBWB, where B is a blue square and W is a white square. Walker and Cronin calculate that constructing this sequence requires only four steps:

Step 1: BW

Step 2: BWBW

Step 3: BWBWBWBW

Step 4:  BWBWBWBWBWBWBWBW

Even though this is an extremely simple example, Walker and Cronin have got things very wrong. The string BWBWBWBW BWBWBWBW should not be considered as something requiring only four steps. Instead, it requires 16 steps. Each addition of a new character requires a separate step. 

It is true that if you are writing with a word processor, you can sometimes save yourself some time by copying a block of text. But when we are considering something such as the origin of life, such a power is irrelevant. Before the invention of computers, there did not exist any "copy and paste" feature. I may also note that even when people write original emails, original articles and original books, they almost never use copy and paste. For example, when I type the common word "where," I do not search for the word "where" in my previous written text, copy that word, and then paste it, using only 3 steps. Instead, I use five steps, typing five different characters.  

It would be a grave mistake to consider the difficulty of getting a multicellular organism, and to think that each cell can be duplicated in a single step. A cell is a fine-tuned arrangement of billions of atoms, and the reproduction of a cell requires billions of tiny chemical steps. Similarly, if you are considering the difficulty of building a brick house, it would be stupid to say that once you have taken hundreds of brick-laying steps to build a brick wall, that the brick wall can then be duplicated in a single step. The construction of each brick wall requires the individual placement of hundreds of bricks. Each one of those brick placements is its own separate step. 

In the paper here Walker and Cronin calculate that constructing the word "ABRACADABRA" requires only 9 steps:

Step 1: A

Step 2: B

Step 3: AB

Step 4: ABR

Step 5: ABRA

Step 6: ABRAC

Step 7: ABRACA

Step 8: ABRACAD

Step 9: ABRACADABRA

This is not correct. Constructing the word ABRACADABRA requires 11 steps, the length of the sequence. The fact that ABRA already existed in Step 5 at the beginning does not entitle you to think that the same sequence can be added at the end as a single step. 

I suspect that there is a strategic rationale in Error #4.  The authors are probably hoping later to introduce some calculation under which the origin of life will be calculated as being very much easier, under some calculation method assuming that anything that already has been constructed can be added in a single step.  But such an assumption is incorrect. 

We can get an idea of how badly things go wrong under Error #4 if you try to calculate how many steps are needed to make a square brick house in which each side is 7 meters.   The correct calculation would go like this:

Number of bricks in a row of 7 meters: About 30

Number of brick rows needed to make a one-story wall: About 30

Number of bricks needed for a 7-meter-long wall: about 900

Number of bricks need for a house in which each side is 7 meters: about 3600

Number of steps needed to make a brick house in which each side is 7 meters: more than 3600

But under the faulty method of Walker and Cronin, and their incorrect idea that anything already built can be duplicated with a single step, you would drastically under-estimate the number of steps needed to build such a house, doing some calculation like this (in which each B stands for a brick):

Step 1: B

Step 2: BB

Step 3: BBBB

Step 4: BBBBBBBB

Step 5: BBBBBBBBBBBBBBBB

Step 6: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

Step 7: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

            BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

Step 8: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

            BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

            BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

            BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

You can see where this ends up, with the calculation telling us that the four brick walls can be constructed in only maybe twenty steps, rather than the actual number of steps needed, about 3600. Error #4 therefore can lead to vast underestimations of the difficulty of constructing things, which is just the kind of thing origin-of-life researchers like to get, allowing them to make groundless "it's not so hard" claims about an accidental origin of life. 

In the paper here Walker and Cronin use their faulty methodology (include Error #4) to get some ridiculously wrong calculations, such as one seemingly calculating that E. coli can be produced in only about 32 steps (Figure 4).  This is nonsense. Besides thousands of different protein types, each its own distinct complex invention requiring very many well-arranged parts, E. coli has more than 4 million base pairs in its genome, with very little sequence duplication.  It would require millions of steps to produce E. coli from scratch. Walker and Cronin have got their wrong-as-wrong-can-be number by using the goofy methodology described in the ludicrous-looking byzantine flowcharts of Supplemental Figure 2, Supplemental Figure 3 and Supplemental Figure 4 of the paper's supplemental information, a methodology including many arbitrary and seemingly senseless or unjustifiable steps such as "Remove sets of duplicates where all substructures share one or more bonds."  Study those three figures and you'll get a depiction of about as unnatural and unjustifiable a calculation method as you could hope to find in a scientific paper.  What we have here is what goes on so often in scientific papers: the embarrassing "dirty linen" is buried in the supplemental information document that almost no one reads, as if the authors wanted to hide such things.  Having worked as a software developer for decades, I know bad spaghetti code "Rube Goldberg" calculation algorithms when I see them, and such scrambled logic is what is I see in the three flowcharts mentioned above. 

Part of the shenanigans here seems to have involved analyzing mere "E. coli lysates," which seem to be mere fragments or remnants or products of E. coli. But Figure 4 of the paper here by Walker and Cronin simply lists these "E. coli lysates" as "E. coli," helping to give readers the crazy idea that E. coli can be produced in 32 steps.

This seems to have been the trail of error:

(1) An unjustifiable algorithm containing quite a few weird arbitrary steps and a silly "complex things can be duplicated in only one step" idea was created. 

(2) Such an algorithm was applied to "E. coli lysates" which are apparently mere fragments or remnants or products of the complex microbe E. coli. (The bottom of page 23 of the Supplemental Information document describes some complex process by which E. coli was destructively manipulated with centrifuges and bead heating to produce such lysates.) 

(3) A misguided calculation was then made that such  "E. coli lysates" can be produced in only 32 steps.

(4) A chart was produced in a little-read scientific paper (Figure 4 of the paper here), labeling the mere "E. coli lysates" as simply E. coli, thereby suggesting to the very small number of readers who studied the chart that  E. coli (something which would require millions of steps to make from scratch) can be produced in only about 32 steps. 

(5) Quanta magazine then produced an article  with an easier-to-read chart, based on the paper's chart, giving its very many readers the false-as-false-can-be idea that the E. coli microbe can be produced from scratch in only about 32 steps, an idea as false as the claim that you can walk from Moscow to Shanghai by taking only 32 steps.  Very many readers were thereby grossly misinformed about the important matter of the complexity of the simplest types of living thing, and led to think the vastly erroneous idea that living microbes are easy to construct from scratch. 

For an antidote to such baloney, one giving a little glimpse of the enormous functional complexity of E. coli, take a peek at this extremely complicated diagram labeled "The extreme complexity of the E. coli transcriptional regulatory network." 

Error #5: The incorrect idea that the complexity of something very complex can be accurately quantified by calculating a number of steps to make that thing.

We cannot reasonably calculate the difficulty of assembling something complex by calculating some mere number of assembly steps.  One reason is that the difficulty of correctly accomplishing some step can vary by many orders of magnitude, with some later step being a thousand or a millions times harder than an early step. Once you get into constructing large three-dimensional objects, a new part can be placed in any of innumerable different places. The greater the number of parts in the thing, the greater the difficulty in correctly placing a new part.  If my new book manuscript has only the word  "now," there are only four different places I can put a new letter. But once my book manuscript gets to be 100,000 characters, then there are 100,000 different places I can add a new letter. Similarly, if I am building some hand-sized gizmo, there are only a few places I can drive a nail. But if I am hammering a nail in a house, there are 10,000 places I can hammer that nail, and perhaps in only 1 in a thousand of those places will it make any sense to hammer that nail.  My point is that the complexity of something increases exponentially as it grows, and the difficulty of getting each new step right changes very dramatically; so you can't calculate the complexity or difficulty of constructing something by just tallying up a number of steps taken in construction. 

Here's an example of why the number of steps needed to make something is a very poor measure of the difficulty of making something. Below are how many steps are needed to make two sequences:

101101101011010101000110001101011101011000:  about 42 steps

To be or not to be -- that is the question :  about 42 steps

Both of these require about 42 steps to make. But the first sequence is radically easier to make than the first. The first sequence can easily be made by some random mindless process, for example a sequence of days where each 1 is a day with rain, and each 0 is a day without rain (or an ink splash in which each 1 is an area hundredth with a little bit of the ink splash, and each  0 is an area hundredth without any bit of the ink splash).  But the second sequence is so hard to make by any random process that it requires an intelligent agent to make it.  We shed no real light on the difficulties here by saying both sequences require 42 steps. 

The only reason I can think of why someone would want to try to quantify the complexity of something by using a "number of construction steps" method (one including Error #4) is that the person might be trying to pick a method that would make feats that are vastly difficult (with prohibitive odds against them) look like they are relatively easy. That has been a long-standing tendency of chemists dealing with the origin-of-life: a tendency to use various tricks and sleazy strategies to make the accidental origin of life look trillions of times easier than it would have been. Rather than shedding light on the origin of life, such tricks merely mislead. 

origin-of-life researchers

Postscript
: The recent paper "Design Patterns of Biological Cells" is a paper clarifying how cells and molecular machines implement a wide variety of design patterns, some very complex. Although the authors draw no implications from their analysis, a straightforward implication analysis leads one to think their results are more nails in the coffin of attempts to mechanistically explain cells.  In Figure 4 of the paper we have an extremely complex diagram showing the metabolism of E. coli, one that helps to show how nonsensical are any insinuations that it could be assembled through a small number of steps. 

A March 2024 paper is entitled "Assembly Theory is a weak version of algorithmic complexity based on LZ compression that does not explain or quantify selection or evolution." The blog post here by a PhD is entitled "The 8 Fallacies of Assembly Theory." 

No comments:

Post a Comment