Header 1

Our future, our universe, and other weighty topics


Monday, November 4, 2019

The Seven Sins of “Memory Engram” Experiments

There are some very good reasons for thinking that long-term memories cannot be stored in brains, which include:
  • the impossibility of credibly explaining how the instantaneous recall of some obscure and rarely accessed piece of information could occur as a neural effect, in a brain that is without any indexing system and subject to a variety of severe signal slowing effects;
  • the impossibility of explaining how reliable accurate recall could occur in a brain subject to many types of severe noise effects;
  • the short lifetimes of proteins in synapses, the place where scientists most often claim our memories are stored;
  • the lack of any credible theory explaining how memories could be translated into neural states;
  • the complete failure to ever find any brain cells containing any encoded information in neurons or synapses other than the genetic information in DNA;
  • the lack of any known read or write mechanism in a brain.
But scientists occasionally produce research papers trying to persuade us that memories are stored in a brain, in cells that are called "engram cells." In this post, I will discuss why such papers are not good examples of experimental science, and do not provide any real evidence that a memory was stored in a brain. I will discuss seven problems that we often see in such science papers. The "sins" I refer to are merely methodological sins rather than moral sins. 

Sin #1: assuming or acting as if a memory is stored in some exact speck-sized spot of a brain without any adequate basis for such a “shot in the dark” assumption.

Scientists never have a good basis for believing that a particular memory is stored in some exact tiny spot of the brain. But a memory experiment will often involve some assumption that a memory is stored in one exact spot of the brain (such as some exact spot of a cubic millimeter in width). For example, an experimental study may reach some conclusion (based on inadequate evidence) about a memory being stored in some exact tiny spot of the brain, and then attempt to reactivate that memory by electrically or optogenetically stimulating that exact tiny spot.

The type of reasoning that is used to justify such a “shot in the dark” assumption is invariably dubious. For example, an experiment may observe parts of a brain of an animal that is acquiring some memory, and look for some area that is “preferentially activated.” But such a technique is as unreliable as reading tea leaves. When brains are examined during learning activities, brain regions (outside of the visual cortex) do not actually show more than a half of 1% signal variation. There is never any strong signal allowing anyone to be able to say with even a 25% likelihood that some exact tiny part of the brain is where a memory is stored. If a scientist picks some tiny spot of the brain based on “preferential activation” criteria, it is very likely that he has not picked the correct location of a memory, even under the assumption that memories are stored in brains. Series of brains scans do not show that some particular tiny spot of the brain tends to repeatedly activate to a greater degree when some particular memory is recalled. 

Sin #2: Either a lack of a blinding protocol, or no detailed discussion of how an effective technique for blinding was achieved.

Randomization and blinding techniques are a very important scientific technique for avoiding experimenter bias. For example, what is called the “gold standard” in experimental drug studies is a type of study called a double-blind, randomized experiment. In such a study, both the doctors or scientific staff handing out pills and the subjects taking the pills do not know whether the pills are the medicine being tested or a placebo with no effect.

If similar randomization and blinding techniques are not used in a memory experiment, there will be a high chance of experimenter bias. For example, let's suppose a scientist looks for memory behavior effects in two groups of animals, the first being a control group having no stimulus designed to affect memory, and the second group having a stimulus designed to affect memory. If the scientist knows which group is which when analyzing the behavior of the animals, he will be more likely to judge the animal's behavior in a biased way, so that the desired result is recorded.

A memory experiment can be very carefully designed to achieve this blind randomization ideal that minimizes the chance of experimenter bias. But such a thing is usually not done in memory experiments purporting to show evidence of a brain storage of memories. Scientists working for drug trials are very good about carefully designing experiments to meet the ideal of blind randomization, because they know the FDA will review their work very carefully, rejecting the drug for approval if the best experimental techniques were not used. But neuroscientists have no such incentive for experimental rigor.

Even in studies where some mention is made of a blinding protocol, there is very rarely any discussion of how an effective protocol was achieved. When dealing with small groups of animals, it is all too easy for a blinding protocol to be ineffective and worthless. For example, let us suppose there is one group of 10 mice that have something done to their brains, and some other control group that has no such done thing. Both may be subjected to a stimulus, and their “freezing behavior” may be judged. The scientists judging such a thing may be supposedly “blind” to which experimental group is being tested. But if a scientist is able to recognize any physical characteristic of one of the mice, he may actually know which group the mouse belongs to. So it is very easy for a supposed blinding protocol to be ineffective and worthless. What is needed to have confidence in such studies is not a mere mention of a blinding protocol, but a detailed discussion of exactly how an effective blinding protocol was achieved. We almost never get such a thing in memory experiments. The minority of them that refer to a blinding protocol almost never discuss in detail how an effective blinding protocol was achieved, one that really prevented scientists from knowing something that might have biased their judgments. 

For an experiment that judges "freezing behavior" in rodents, an effective blinding protocol would be one in which such freezing was judged by a person who never previously saw the rodents being tested. Such a protocol would guarantee that there would be no recognition of whether the animals were in an experimental group or a control group. But in "memory engram" papers we never read that such a thing was done.  To achieve an effective blinding protocol, it is not enough to use automated software for judging freezing, for such software can achieve biased results if it is run by an experimenter who knows whether or not an animal was in a control group. 

Sin #3: inadequate sample sizes, and a failure to do a sample size calculation to determine how large a sample size to test with.

Under ideal practice, as part of designing an experiment a scientist is supposed to perform what is called a sample size calculation. This is a calculation that is supposed to show how many subjects to use per study group to provide adequate evidence for the hypothesis being tested. Sample size calculations are included in rigorous experiments such as experimental drug trials.

The PLOS paper here reported that only one of the 410 memory-related neuroscience papers it studied had such a calculation. The PLOS paper reported that in order to achieve a moderately convincing statistical power of .80, an experiment typically needs to have 15 animals per group; but only 12% of the experiments had that many animals per group. Referring to statistical power (a measure of how likely a result is to be real and not a false alarm), the PLOS paper states, “no correlation was observed between textual descriptions of results and power.” In plain English, that means that there's a whole lot of BS flying around when scientists describe their memory experiments, and that countless cases of very weak evidence have been described by scientists as if they were strong evidence.

The paper above seems to suggest that 15 animals per study group is needed.  But In her post “Why Most Published Neuroscience Findings Are False,” Kelly Zalocusky PhD calculates (using Ioannidis’s data) that the median effect size of neuroscience studies is about .51. She then states the following, talking about statistical power:

"To get a power of 0.2, with an effect size of 0.51, the sample size needs to be 12 per group. This fits well with my intuition of sample sizes in (behavioral) neuroscience, and might actually be a little generous. To bump our power up to 0.5, we would need an n of 31 per group. A power of 0.8 would require 60 per group."

So the number of animals per study group for a moderately convincing result (one with a statistical power of .80) is more than 15 (according to one source), and something like 60, according to another source.  But the vast majority of "memory engram" papers do not even use 15 animals per study group.

Sin #4: a high occurrence of low statistical significance near the minimum of .05, along with a frequent hiding of such unimpressive results, burying them outside of the main text of a paper rather than placing them in the abstract of the paper.

Another measure of how robust a research finding is the statistical significance reported in the paper. Memory research papers often have marginal statistical significance close to .05.

Nowadays you can publish a science paper claiming a discovery if you are able to report a statistical significance of only .05. But it has been argued by 72 experts that such a standard is way too loose, and that things should be changed so that a discovery can only be claimed if a statistical significance of .005 is reached, which is a level ten times harder to achieve.

It should be noted that it is a big misconception that when you have a result with a statistical significance (or P-value) of .05, this means there is a probability of only .05 that the result was a false alarm and that the null hypothesis is true. This paper calls such an idea “the most pervasive and pernicious of the many misconceptions about the P value.” 

When memory-related scientific papers report unimpressive results having a statistical significance such as only .03, they often make it hard for people to see this unimpressive number. An example is the recent paper “Artificially Enhancing and Suppressing Hippocampus-Mediated Memories.”  Three of the four statistical significance levels reported were only .03, but this was not reported in the summary of the paper, and was buried in hard-to-find places in the text.

Sin #5: using presumptuous or loaded language in the paper, such as referring in the paper to the non-movement of an animal as “freezing” and referring to some supposedly "preferentially activated" cell as an "engram cell." 

Papers claiming to find evidence of memory engrams are often guilty of using presumptuous language that presupposes what they are attempting to prove. For example,  the non-movement of a rodent in an experiment is referred to by the loaded term "freezing," which suggests an animal freezing in fear, even though we have no idea whether the non-movement actually corresponds to fear.  Also, some cell that is guessed to be a site of memory storage (because of some alleged "preferential activation" that is typically no more than a fraction of 1 percent) is referred to repeatedly in the papers as an "engram cell,"  which means a memory-storage cell, even though nothing has been done to establish that the cell actually stores a memory. 

We can imagine a psychology study using similar loaded language.  The study might make hidden camera observations of people waiting at a bus stop.  Whenever the people made unpleasant expressions, such expressions would be labeled in the study as "homicidal thoughts."  The people who had slightly more of these unpleasant expressions would be categorized as "murderers."   The study might say, "We identified two murderers at the bus stop from their increased display of homicidal expressions." Of course, such ridiculously loaded, presumptuous language has no place in a scientific paper.  It is almost as bad for "memory engram" papers to be referring so casually to "engram cells" and "freezing" when neither fear nor memory storage at a specific cell has been demonstrated.  We can only wonder whether the authors of such papers were thinking something like, "If we use the phrase engram cells as much as we can, maybe people will believe we found some evidence for engram cells." 

Sin #6: failing to mention or test alternate explanations for the non-movement of an animal (called “freezing”), explanations that have nothing to do with memory recall.

A large fraction of all "memory engram" papers hinge on judgments that some rodent engaged in increased "freezing behavior,"  perhaps while some imagined "engram cells" were electrically or optogenetically stimulated. A science paper says that it is possible to induce freezing in rodents by stimulating a wide variety of regions. It says, "It is possible to induce freezing by activating a variety of brain areas and projections, including the hippocampus (Liu et al., 2012), lateral, basal and central amygdala (Ciocchi et al., 2010); Johansen et al., 2010; Gore et al., 2015a), periaqueductal gray (Tovote et al., 2016), motor and primary sensory cortices (Kass et al., 2013), prefrontal projections (Rajasethupathy et al., 2015) and retrosplenial cortex (Cowansage et al., 2014).” 

But we are not informed of such a reality in quite a few papers claiming to supply evidence for an engram. In such studies typically a rodent will be trained to fear some stimulus. Then some part of the rodent's brain will be stimulated when the stimulus is not present. If the rodent is nonmoving (described as "freezing") more often than a rodent whose brain is not being stimulated, this is hailed as evidence that the fearful memory is being recalled by stimulating some part of the brain.  But it is no such thing. For we have no idea whether the increased freezing or non-movement is being produced merely by the brain stimulation, without any fear memory, as so often occurs when different parts of the brain are stimulated.

If a scientist thinks that some tiny part of a brain stores a memory, there is an easy way to test whether there is something special about that part of the brain. The scientists could do the "stimulate cells and test fear" kind of test on multiple parts of the brain, only one of which was the area where the scientist thought the memory was stored. The results could then be compared, to see whether stimulating the imagined "engram cells" produced a higher level of freezing than stimulating other random cells in the brain. Such a test is rarely done. 

Sin #7: a dependency on arbitrarily analyzed brain scans or an uncorroborated judgment of "freezing behavior" which is not a reliable way of measuring fear.

A crucial element of a typical "memory engram" science paper is a judgment of what degree of "freezing behavior" a rodent displayed.  The papers typically equate non-movement with fear coming from recall of a painful stimulus. This doesn't make much sense. Many times in my life I saw a house mouse that caused me or someone else to shreik, and I never once saw a mouse freeze. Instead, they seem invariably to flee rather than to freeze. So what sense does it make to assume that the degree of non-movement ("freezing") of a rodent should be interpreted as a measurement of fear?  Moreover, judgments of the degree of "freezing behavior" in mice are too subjective and unreliable. 

Fear causes a sudden increase in heart rate in rodents, so measuring a rodent's heart rate is a simple and reliable way of corroborating a manual judgment that a rodent has engaged in increased "freezing behavior." A scientific study showed that heart rates of rodents dramatically shoot up instantly from 500 beats per minute to 700 beats per minute when the rodent is subjected to the fear-inducing stimuli of an air puff or a platform shaking. But rodent heart rate measurements seem to be never used in "memory engram" experiments. Why are the researchers relying on unreliable judgments of "freezing behavior" rather than a far-more-reliable measurement of heart rate, when determining whether fear is produced by recall? In this sense, it's as if the researchers wanted to follow a technique that would give them the highest chance of getting their papers published, rather than using a technique that would give them the most reliable answer as to whether a mouse is feeling fear. 


animal freezing

Another crucial element of many "memory engram" science papers is analysis of brain scans.  But there are 1001 ways to analyze the data from a particular brain scan.  Such flexibility almost allows a researcher to find whatever "preferential activation" result he is hoping to find.  

Page 68 of this paper discusses how brain scan analysis involves all kinds of arbitrary steps:

"The time series of voxel changes may be motion-corrected, coregistered, transformed to match a prototypical brain, resampled, detrended, normalized, smoothed, trimmed (temporally or spatially)...Furthermore, each of these steps can be done in a number of ways, each with many free parameters that experimenters set, often arbitrarily....The wholebrain analysis is often the first step in defining a region of interest in which the analyses may include exploration of time courses, voxelwise correlations, classification using support vector machines or other machine learning methods, across-subject correlations, and so on. Any one of these analyses requires making crucial decisions that determine the soundness of the conclusions."

The problem is that there is no standard way of doing such things. Each study arbitrarily uses some particular technique, and it is usually true that the results would have been much different if some other brain scan analysis technique had been used. 

Examples of Such Shortcomings

Let us look at a recent paper that claimed evidence for memory engrams. The paper stated, “Several studies have identified engram cells for different memories in many brain regions including the hippocampus (Liu et al., 2012; Ohkawa et al., 2015; Roy et al., 2016), amygdala (Han et al., 2009; Redondo et al., 2014), retrosplenial cortex (Cowansage et al., 2014), and prefrontal cortex (Kitamura et al., 2017).” But the close examination below will show that none of these studies are robust evidence for memory engrams in the brain. 

Let's take a look at some of these studies. The Kitamura study claimed to have “identified engram cells” in the prefrontal cortex is the study “Engrams and circuits crucial for systems consolidation of a memory.”  In Figure 1 (containing multiple graphs), we learn that the number of animals used in different study groups or experimental activities were 10, 10, 8, 10, 10, 12, 8, and 8, for an average of 9.5. In Figure 3 (also containing multiple subgraphs), we have even smaller numbers. The numbers of animals mentioned in that figure are 4, 4, 5, 5, 5, 10, 8, 5, 6, 5 and 5. None of these numbers are anything like what would be needed for a moderately convincing result, which would be a minimum of 15 animals per study group. So the study is very guilty of Sin #3. The study is also guilty of Sin #2, because no detailed description is given of an effective blinding protocol. The study is also guilty of Sin #4, because Figure 3 lists two statistical significance values of “< 0.05” which is the least impressive result you can get published nowadays. Studies reaching a statistical significance of less than 0.01 will always report such a result as “< 0.01” rather than “<0.05.”  The study is also guilty of Sin #7, because it relies on judgments of freezing behavior of rodents, which were not corroborated by something such as heart rate measurements. 

The Liu study claimed to have “identified engram cells” in the hippocampus of the brain is the study “Optogenetic stimulation of a hippocampal engram activates fear memory recall.” We see in Figure 3 that inadequate sample sizes were used. The number of animals listed in that figure (during different parts of the experiments) are 12, 12, 12, 5, and 6, for an average of 9.4. That is not anything like what would be needed for a moderately convincing result, which would be a minimum of 15 animals per study group. So the study is  guilty of Sin #3. The study is also guilty of Sin #7. The experiment relied crucially on judgments of fear produced by manual assessments of freezing behavior, which were not corroborated by any other technique such as heart-rate measurement. The study does not describe in detail any effective blinding protocol, so it is also guilty of Sin #2. The study is also guilty of Sin #6. The study involved stimulating certain cells in the brains of mice, with something called optogenetic stimulation. The authors have assumed that when mice freeze after stimulation, that this is a sign that they are recalling some fear memory stored in the part of the brain being stimulated. What the authors neglect to tell us is that stimulation of quite a few regions of a rodent brain will produce freezing behavior. So there is actually no reason for assuming that a fear memory is being recalled when the stimulation occurs. 

The Ohkawa study claimed to have “ identified engram cells” in the hippocampus of the brain is the study “Artificial Association of Pre-stored Information to Generate a Qualitatively New Memory.” In Figure 3 we learn that the animal study groups had a size of about 10 or 12, and in Figure 4 we learn that the animal study groups used were as small as 6 or 8 animals. So the study is guilty of Sin #3. Because the paper used a “zap their brains and look for freezing” approach, without discussing or testing alternate explanations for freezing behavior having nothing to do with memory, the Ohkawa study is also guilty of Sin #6. Judgment of fear is crucial to the experimental results, and it was done purely by judging "freezing behavior," without measurement of heart rate.  So the study is also guilty of Sin #7. This particular study has a few skimpy phrases which claims to have used a blinding protocol: “Freezing counting experiments were conducted double blind to experimental group.” But no detailed discussion is made of how an effective blinding protocol was achieved, so the study is also guilty of Sin #2.

The Roy study claimed to have “identified engram cells” in the hippocampus of the brain is the study "Memory retrieval by activating engram cells in mouse models of early Alzheimer’s disease."  Looking at Figure 1, we see that the study groups used sometimes consisted of only 3 or 4 animals, which is a joke from any kind of statistical power standpoint. Looking at Figure 3, we see the same type of problem. The text mentions study groups of only "3 mice per group," "4 mice per group," and "9 mice per group,"  and "10 mice per group."   So the study is guilty of Sin #3. Although a blinding protocol is mentioned in the skimpiest language,  no detailed discussion is made of how an effective blinding protocol was achieved, so the study is also guilty of Sin #2.  Some of the results reported have a statistical significance of only "<.05," so the study is guilty of Sin #4. 

The Han study (also available here) claimed to have “identified engram cells” in the amygdala is the study "Selective Erasure of a Fear Memory." In Figure 1 we see a larger-than average sample size was used for two groups (17 and 24), but that a way-too-small sample size of only 4 was used for the corresponding control group. You need a sufficiently high number of animals in all study groups, including the control group, for a reliable result.  The same figure tells us that in another experiment the number of animals in the study group were only 5 or 6, which is way too small. Figure 3 tells us that in other experiments only 8 or 9 mice were used, and Figure 4 tells us that in other experiments only 5 or 6 mice were used. So this paper is guilty of Sin #3. No mention is made in the paper of any blinding protocol, so this paper is guilty of Sin #2. Figure 4 refers to two results with a borderline statistical significance of only "< 0.05," so this paper is also guilty of Sin #4.  The paper relies heavily on judgments of fear in rodents, but these were uncorroborated judgments based on "freezing behavior," without any measure of heart rate to corroborate such judgments. So the paper is also guilty of Sin #7. 

The Redondo study claimed to have “identified engram cells” in the amygdala is the study "Bidirectional switch of the valence associated with a hippocampal contextual memory engram."  We see 5 or 6 results reported with a borderline statistical significance of only "< 0.05," so this paper is  guilty of Sin #4. No detailed description is given of how an effective blinding protocol was achieved, and only the skimpiest mention is made of blinding, so this paper is guilty of Sin #2.  The study used only "freezing behavior" to try to measure fear, without corroborating such a thing by measuring heart rates.  So the paper was guilty of Sin #7.  The study involved stimulating certain cells in the brains of mice, with something called optogenetic stimulation. The authors have assumed that when mice freeze after stimulation, that this is a sign that they are recalling some fear memory stored in the part the brain being stimulated. What the authors neglect to tell us is that stimulation of quite a few regions of a rodent brain will produce freezing behavior. So there is actually no reason for assuming that a fear memory is being recalled when the stimulation occurs.  So the study is also guilty of Sin #6. 

The Cowansage study claimed to have “identified engram cells” in the retrosplinial cortex of the brain is the study "Direct Reactivation of a Coherent Neocortical Memory of Context." Figure 2 tells us that only 12 mice were used for one experiment. Figure 4 tells us that only 3 and 5 animals were used for other experiments. So this paper is guilty of Sin #3. No detailed description is given of how an effective blinding protocol was achieved, and only the skimpiest mention is made of blinding, so this paper is guilty of Sin #2.    It's a paper using the same old "zap rodent brains and look for some freezing behavior" methodology, without explaining why such results can occur for reasons having nothing to do with memory recall. So the study is guilty of Sin #6. Some of the results reported have a statistical significance of only "<.05," so the study is guilty of Sin #4. 

So I have examined each of the papers that were claimed as evidence for memory traces or engrams in the brain. Serious problems have been found in every one of them.  Not a single one of the studies made a detailed description of how an effective blinding protocol was executed. All of the studies were guilty of Sin #7.  Not a single one of the studies makes a claim to have followed some standardized method of brain scan analysis. Whenever there are brain scans we can say that the experiments merely chose one of 101 possible ways to analyze brain scan data. Not a single one of the studies has corroborated "freezing behavior" judgments by measuring heart rates of rodents to determine whether the animals suddenly became afraid. But all of the studies had a depenency on either brain scanning, uncorroborated freezing behavior judgments, or both. The studies all used sample sizes far too low to get a reliable result (although one of them used a decent sample size to get part of its results). 

The papers I have discussed are full of problems, and do not provide robust evidence for any storage of memories in animal brains. There is no robust evidence that memories are stored in the brains of any animal, and no robust evidence that any such thing as an "engram cell" exists. 

The latest press report of a "memory wonder" produced by scientists is a claim that scientists implanted memories in the brains of songbirds. For example, The Scientist magazine has an article entitled, "Researchers Implant Memories in Zebra Finch Brains."  If you read the scientific paper in the journal Science, you will find that one of the crucial study groups used consisted of only seven birds, which is less that half of the fifteen animals per study group that is recommended for a moderately convincing result. The relevant scientific study is hidden behind a paywall of the journal Science.  But by reading the article in The Scientist, we can get enough information to have the strongest suspicion that the headline is an unjustified brag. 

Of course, the scientists didn't actually implant musical notes into the brains of birds.  Nothing of the sort could ever occur, because no one has the slightest idea of how learned or episodic information could ever be represented as neural states. The scientists merely gave little bursts of energy into the brains of some birds. The scientists claimed that the birds who got shorter bursts of energy tended to sing shorter songs. "When these finches grew up, they sang adult courtship songs that corresponded to the duration of light they’d received," the story tells us.  Of course, it would be not very improbable that such a mere "duration similarity" would occur by chance.  

It is very absurd to be describing such a mere "duration similarity" as a memory implant.  It was not at all true that the birds sung some melody that had been artifically implanted in their heads.  The scientists in question have produced zero evidence that memories can be artificially implanted in animals.  From an example like this, we get the impression that our science journalists will uncritically parrot any claim of success in brain experiments with memory, no matter how glaring are the shortcomings of the relevant study. 

No comments:

Post a Comment