Header 1

Our future, our universe, and other weighty topics


Monday, September 8, 2014

The Questionable Task of Trying to Do Science With Computer Simulations

Scientific findings have usually been produced through two different ways. The first way is observation, such as when a biologist classifies some new deep-sea fish he photographed in an underwater expedition, or when an astronomer uses a telescope to get some new radiation readings from a distant galaxy. The second way is physical experimentation, such as when a chemist tries combining two chemicals, or when physicists try smashing together particles in a particle accelerator. But nowadays more and more scientists are trying to do science in a different way: by running computer simulations.

The idea is that a scientist can do a computer experiment rather similar to a physical experiment. A scientist might formulate a hypothesis, and then use (or write) some computer program that simulates physical reality. The scientist might get some results (in the form of computer output data) that can be used to test the validity of his hypothesis. Nowadays many millions of dollars are being spent on these computer simulations, often funded by taxpayer dollars.

But while this approach to doing science sounds reasonable enough in theory, there are some reasons for being skeptical about this particular approach to doing science. The first reason has to do with the computer programs that are used for doing these simulations. Such programs are very often not written according to industry standards, but are instead “cowboy-coded” affairs written mainly by one scientist who is kind of moonlighting as a programmer.

The difference between software written according to software industry “best practices” and “cowboy-coded” software is discussed here. Software written according to industry “best practices” typically involves a team of programmers and a team of quality assurance experts. “Cowboy-coded” software, to the contrary, is typically written mainly by a single programmer who often takes a “quick and dirty” approach, producing something with lots of details that only he understands. Once a computer program has been written in such a way, it is often what is called a “black box,” something that only the original programmer can understand fully. If the programmer has not documented the code very well, and the subject matter is very complex, it is often the case that even the original programmer no longer fully understands what is going on inside the program, given the passage of a few years.

Many of the programs used in scientific computer simulations are written by scientists who decided to take up programming. The problem with that is the art of writing bulletproof, reliable software is something that often takes a programmer many years to master, years of fulltime software development, often involving 60-hour or 70-hour weeks. There is no reason to be optimistic that such a subtle art would have been mastered by a scientist who does some part-time work at writing computer programs. I once downloaded a computer program being used nowadays as part of astronomical computer simulations. It was written in an amateurish style that violated several basic rules of writing reliable software. I can only imagine how many other scientific computer simulations are based on similar type of coding written by scientists doing computer programming.

This is not merely some purist objection. Even when software code is written according to industry “best practices” standards, such code almost always has errors. Code that is not written according to such standards will be likely to have many errors, and the result may be that such software simply gives the wrong answers when it produces outputs. Faulty software cannot be a basis for reliable scientific conclusions.

Another reason for skepticism is related to the fact that all computer programs designed to simulate physical reality require that a user select a variety of different inputs before running the computer program to simulate physical reality. It is never a simple matter of “just run the program and see what the results are.” Instead, it almost always works like this:

(1) Choose between 5 and 40 input arguments or model assumptions, in each case choosing a particular number from some range of possible values.
(2) Run the computer simulation.
(3) Interpret the results of the simulation outputs.

The problem is that Step 1 here gives abundant opportunities for experimental bias. A scientist who wants to run a computer simulation supporting hypothesis X may make all types of choices in Step 1 that he might not have made if he was not interested in supporting hypothesis X. Whenever you hear about a scientist running a computer simulation, remember: there are almost always a billion different ways to run any particular computer simulation. So when a scientist talks about “the results of running the simulation,” what he really should be saying is “the results of running the simulation using my set of input choices, which is only one of billions of possible sets of input choices.”

It seems almost as if many scientists who run these computer experiments try to hide the fact that they ran a computer program that allows billions of possible input choices, after choosing one particular set of input choices from billions of possibilities. The main way in which they seem to do such a thing is to not even discuss the input possibilities allowed by the computer program, and to not even list which choices they made for those input choices. This is, of course, a pathetic way of doing science. I saw an example of this in a recent scientific paper in which a scientist did some computer simulation using a publicly available program he named. The scientist did not bother to even specify in his paper what input parameters he chose for that program. The online documentation for the program makes clear that it takes many different input parameters.

Another reason why scientific computer simulations are doubtful is that there is ample opportunity for cherry-picking and biased interpretations when interpreting the results of a simulation. Very complicated computer simulations often produce ridiculous amounts of output data. A scientist might be able to make 10,000 different output graphs using that data, and he is free to choose whatever output graph will most look like a verification of whatever hypothesis he is trying to prove. He may ignore other possible output graphs that may look incompatible with such a hypothesis. 

Output of a typical computer cosmology simulation

Yet another reason why scientific computer simulations are doubtful is that it is often impractical or unrealistic to try to simulate a complicated physical reality in a computer program. If the thing being simulated is fairly simple, such as a particular set of chemical reactions, we might have a fairly high degree of confidence that the programmer has managed to capture physical reality pretty well in his computer program. But the more complicated the physical reality, and the greater the number of particles and forces involved, the less confidence we should have that the insanely complicated physical reality has been adequately captured in mere computer code. For example, we should be very skeptical that any computer program in existence is able to simulate very accurately things such as the dynamics or origin of galaxies.

Still another reason why scientific computer simulations are doubtful is that those who run such simulations typically fail completely to follow “blindness” methodology followed in other sciences. When testing new drugs, scientists follow a “double blind” methodology. For example, the person dispensing a drug to patients may be kept in the dark as to whether he is dispensing the real drug or a “sugar pill” placebo. Then the person collecting the results data or interpeting it may also be kept in the dark as to whether he is dealing with the real drug or a placebo. Scientists running a scientific computer simulation could follow similar policies, but they almost never do. For example, when running a scientific computer simulation the choice of input parameters and interpretation of the outputs could be made by some scientist who does not know which hypothesis is being tested, or who has no interest in whether the results confirm that hypothesis. But it is rare for any such “blindness” techniques to be followed when doing scientific computer simulations.

But despite their dubious validity, scientific computer simulations will continue to gather a great of deal of scientists' time, while chewing up many a tax dollar. One reason is that they can be a lot of fun to create and run. Imagine if you create a computer program simulating the origin and evolution of the entire universe. The scientific worth of the program may be very questionable, but scientists may have a blind eye to that when they are having so much fun playing God on their supercomputers. 

Postscript:  I am not suggesting here that scientific-related computer simulations are entirely worthless. I am merely suggesting that such simulations are a "distant second" to science produced through observations and physical experiments. Also, the points made here do not undermine the case for global warming, as that case relies not mainly on computer simulations, but also on many direct observations, such as observations of increased carbon dioxide levels, and observations of increasing temperatures.