Our
scientists often give us visual displays designed to impress us
with their grasp of nature. Such visuals should often be taken with a large
grain of salt. An example is the type of “composition of the
universe” pie graph that claims the universe is about 72% dark
energy, 23% dark matter and 5% regular matter. As discussed here,
the case for dark matter is wobbly. Moreover a
2016 study has cast doubt on the research used to make the claim
that the universe is 72% dark energy, raising doubts about whether
dark energy even exists.
Another type of scientific visual we should have little trust in are those visuals
showing a kind of “tree of life” that supposedly shows how one
type of life evolved into another. Such visuals are generated using
what is called phylogenetics, which involves attempts to compute the
ancestry of living things from studying their genomes.
There
is a gigantic amount of data involved in the genome of a single
organism. Comparing the genomes of many different organisms for
similarities becomes a task too data-intensive for a person to do in
his own head or on paper. When you get into the task of estimating hypothetical
inheritance trees, the number of possibilities becomes so gigantic
that the task becomes something so difficult that it is often handled by
computers.
The
idea of doing computer analysis on genomes may sound very impressive,
but there are several reasons why this type of analysis does not in
general provide convincing evidence that some species had a
particular ancestry.
1.
Phylogenetic programs assume common descent rather than prove it.
The
computer programs used for phylogenetic analysis are not programmed
to analyze the likelihood that a particular set of species share a
common ancestor. Instead, such programs typically assume from the
beginning that such species do share a common ancestor, and the
programs busy themselves with trying to compute the most probable
inheritance tree that can link such species.
2.
Phylogenetic programs compute a “most likely” tree of evolution,
but such a tree is not a likely “tree of
evolution.”
One
must be careful to distinguish between the concept of “most likely”
and “likely.” “Likely” means having a probability of greater
than 50%. But “most likely” means more likely than any other
possibility. It is very common for a “most likely” possibility to
be unlikely, with a probability of less than 50%. For example, if you
choose a random word from a book, the “most likely” choice is the
word “the.” But such a choice is not a likely choice, and has a
likelihood of less than 10%.
In
the case of a phylogenetic software, it will not produce an
inheritance tree that is likely to be correct. It will merely produce
an inheritance tree that may be the “most likely” among many
different alternatives that the software explores. But such a tree
may still be very unlikely to be accurate.
3.With
any complicated inheritance tree problem, there is a “combinatorial
explosion” that prevents phylogenetic programs from being able to
try all possibilities, so the software resorts to a fragmentary
exploration of the solution space.
Anyone
who has studied computer science knows that when there are many
variables or data points, the number of possible arrangements
increases exponentially. A classic example is what is known as the
traveling salesman problem. If a salesman has to travel to 20 cities,
then the total number of possible travel routes is roughly 20
factorial, which is too large a number to compute.
Given
more than 200 species, the possible number of inheritance trees to be
considered becomes so great there is no possible way for any computer
program to compute all the possibilities. So phylogenetic programs
typically resort to a shortcut. They simply allow you to try a
certain number of possibilities, and rate each one for its
likelihood. The one with the best rating is singled out as the winner.
But that's not a method that should inspire confidence. The winner
is unlikely to be the actual inheritance tree for the set of species,
whenever there are many species being considered.
4.
Too few living species have had their genomes analyzed for phylogenetic
programs to be very reliable.
According
to this government web site, “more than 250” animal species have
had their genomes analyzed. The problem for phylogenetic programs is
that this is but a tiny fragment of the total number of living species,
which has been estimated as 8 million. Consequently, we don't have the data to be reliably calculating an inheritance tree based on so few genomes. Perhaps
after very many thousands of genomes have been cataloged, such analysis
may be more reliable.
5.
We don't have any DNA data for even 1% of the species that previously existed.
The
reliability of phylogenetic programs is proportional to how much DNA
data we have for extinct species that lived long ago. But we have
very, very little DNA data for species that lived long ago. The
half-life of DNA is only 521 years, meaning every 521 years half of
the DNA information will disappear. So we have no DNA information
for species such as dinosaurs. There is no truth to the idea that dinosaur DNA has been preserved because insects that bit dinosaurs have been preserved in amber. That's a fantasy of a "Jurassic Park" movie. When phylogenetic programs try to
place dinosaurs in a phylogenetic “tree of life,” they must use
guesses about what the DNA of dinosaurs looked like. Similar guesses
must be made about almost all of the species being considered.
6.
We should have little confidence in phylogenetic programs, given
their extremely complicated algorithms that are anything but
straightforward.
A
document on molecular phylogenetics says this: “The likelihood
calculations required for evolutionary trees are far from
straightforward and usually require complex
computations that must allow for all possible unobserved sequences at
the LCA nodes of hypothesized trees.” The same document shows an
equation for calculating likelihood, the type of equation used by
such a program. It looks as complicated as one of the more
complicated equations used in Einstein's theory of general
relativity. See here to look at some of the extremely
complicated math involved.
When
computer programs are based on extremely complicated algorithms,
there will very often be bugs in the program – either because of an
error in the complicated algorithm or because of a failure in
accurately translating the complicated algorithm into computer code
such as Java. For example, a recent study found bugs in software used
to analyze brain scans, and estimated that thousands of scientific
studies using such software may be inaccurate. The more complicated
an algorithm, the greater the likelihood it will not be accurately
implemented in bug-free computer code.
A
paper entitled “The State of Software in Evolutionary Biology”
reviewed various computer programs used in phylogenetics, and
concluded “the software quality of the tools we analyzed is rather
mediocre.” A later paper entitled "The State of Software for Evolutionary Biology" stated, "The software engineering quality of the tools we analyzed is rather unsatisfying." It is a huge problem in science that software programs used for scientific analysis are often written by scientists who dabble in computer programming, and the quality of their work is often second-rate. We should no more expect high-quality code from a scientist dabbling in computer programming than we should expect to get high-quality house-building and plumbing from a professional musician who dabbles in making houses.
7.
We should have little confidence in phylogenetic programs, because
there is no way to test the output of such programs.
As
a general rule, our confidence in a type of software should be
proportional to the degree to which the software has passed tests.
For example, if some baseball prediction software were to predict
that a particular player would have a batting average next season of
.314, and the player did produce exactly such a batting average, and
the same type of prediction succeeded for other players, that would
be a good sign that the software was reliable. But in the case of
phylogenetic software, there is no way to test its outputs. Although certain types of consistency checks and statistical checks can be applied to the output of phylogenetic software, we have
no way of verifying that a "tree of life" or an inheritance tree produced by such
software is historically accurate. Anyone in the software industry knows that
untested software is not something you should have much confidence
in.
8.
Lateral gene transfers cast doubt on the reliability of phylogenetic estimates.
Here
is a quote from a 2016 scientific paper:
One
of the several ways in which microbiology puts the neo-Darwinian
synthesis in jeopardy is by the threatening to “uproot the Tree of
Life (TOL)” [1].
Lateral gene transfer (LGT) is much more frequent than most
biologists would have imagined up until about 20 years ago, so
phylogenetic trees based on sequences of different prokaryotic genes
are often different. How to tease out from such conflicting data
something that might correspond to a single, universal Tree of Life
becomes problematic. Moreover, since many important evolutionary
transitions involve lineage fusions at one level or another, the
aptness of a tree (a pattern of successive bifurcations) as a summary
of life’s history is uncertain.
The
paper then goes on to say this:
Students
of animals and plants have long accepted that incomplete lineage
sorting, introgression, and full-species hybridization pose
difficulties for the sorts of trees that Darwin might have had us
draw. But it is microbes, with their promiscuous willingness to
exchange genes between widely separated branches of any “tree,”
that have most seriously jeopardized the neo-Darwinian synthesis.
9.
Disagreement about mutation rates undermines the reliability of
phylogenetic estimates.
The
output of a phylogenetic program may rely on some estimate regarding
a rate of mutation. But there is great disagreement about the rate
of mutation in the past. A scientist quoted in Nature News says this
about the “DNA clock” used in phylogenetics:
“The
fact that the clock is so uncertain is very problematic for us,” he
says. “It means that the dates we get out of genetics are really
quite embarrassingly bad and uncertain.”
10. Phylogenetic estimates based on microRNAs or fossils conflict with other phylogenetic estimates.
A molecular palaeobiologist at nearby Dartmouth College, Peterson has been reshaping phylogenetic trees for the past few years, ever since he pioneered a technique that uses short molecules called microRNAs to work out evolutionary branchings. He has now sketched out a radically different diagram for mammals: one that aligns humans more closely with elephants than with rodents. “I've looked at thousands of microRNA genes, and I can't find a single example that would support the traditional tree,” he says. The technique “just changes everything about our understanding of mammal evolution.”
The mainstream scientific paper "How reliable are human phylogenetic hypotheses?" gives a troubling answer to such a question. It tells us that "phylogenetic hypotheses regarding humans and their fossil relatives" have "never been subjected to external validation." When the authors tried to do such a validation, they found that "phylogenetic hypotheses based on the craniodental data were incompatible with the molecular phylogenies." This led them to conclude that "existing phylogenetic hypotheses about human evolution are unlikely to be reliable."
Below is a visual from a 2016 paper "A new view of the tree of life." In this paper this visual comes underneath a headline "A current view of the tree of life." You may notice that the strange shape has no actual resemblance to a tree, although it looks a little like some erupting fireworks sparkler stick that I would use as a young boy on the fourth of July.
Below is a visual from a 2016 paper "A new view of the tree of life." In this paper this visual comes underneath a headline "A current view of the tree of life." You may notice that the strange shape has no actual resemblance to a tree, although it looks a little like some erupting fireworks sparkler stick that I would use as a young boy on the fourth of July.
No
doubt computational phylogenetics will continue to be very popular.
Although such analysis seems to add little to our knowledge, it's a
nice easy way to make a living if you are an evolutionary biologist.
Rather than having to do the messy and frustrating work of trying
to dig up fossils, an evolutionary biologist can just comfortably
sit in an office and crunch genome data. It's a lot easier than
writing software, where there is typically the requirement that your computer
work must actually achieve some useful innovation. A scientist
specializing in phylogenetics can just grind out hypothetical “trees
of life” or “ancestry trees” year after year, with very little
disturbance from people objecting to his work or analyzing his
methods. So if you are an evolutionary biologist making a living
doing such comfortable work in a clean office, you will vigorously
defend the value of what you are doing. The last thing you want is to
have to go out in the mud and get your fingernails dirty.
No comments:
Post a Comment