Statistical Analysis and Commentary

Goldfish behavioral response to magnetic field direction was measured for possible use in the pilot metacognition experiment. The evidence suggests goldfish detect magnetic field direction. However, behavioral response to different colors was used instead of field direction in the pilot metacognition experiment because (1) goldfish are already known to respond to color differences, and (2) it was easier to present a range of intermediate colors than to present a range of variation of magnetic field direction that was testable.

Note: The sample size of this preliminary study is too small to provide evidence. The purpose of the following analyses is to find what may warrant further more rigorous testing and to identify needed improvements in the experimental design.

Initially, five trials were conducted, with each trial being repeated five times.

“Target” means correct target (suction cup) for fish to peck.

There were three targets (suction cups) at inside bottom of tank: left, right, and middle.

“Left hits” means number of pecks on left target, etc.

Relevant data is organized in the table below. Trial 1 is a control.


  • many subtrials recorded a value of zero hits

  • there is a small sample size

  • the total number of hits per subtrial vary dramatically.

Because of this, mean values would not accurately represent goldfish responsiveness. Therefore the following statistical tests could not be used to meaningfully analyze the data:

  • A t-test, which is performed by comparing a mean of a sample to an expected population distribution to determine if the sample mean is likely to be expected under random chance.

  • ANOVA, which also tests means of a sample. It is performed by taking two samples and comparing the in-sample variance to the between-sample variance, and attempts to determine if two samples came from the same or different underlying distributions.

  • Time series analyses, which could not be used because we lack reliable time data.

The chi-square test can be used, however, because we can take measurement sums rather than mean values for this.

The following data, trials 6 - 8, for the control and magnetic field experiments was obtained after further training of the fish, and will be analyzed. However, the meaningfulness of this data is undermined by an alternative explanation for why the fish were hitting the correct target so much more often, namely that the field directions were alternated rather than randomized, in this data collection, as well as preceding training, so the fish may anticipate the location of the next reward rather than, or in addition to, detecting the field direction which they then associate with the food. The lesson drawn is to remember to randomize the field direction as was done in trials 1 through 5. This data is interesting nonetheless as evidence for goldfish pattern learning (provided there were no other clues to the correct target).

Description of the chi-square distribution test:

The chi-square test (specifically, of goodness-of-fit) determines whether our observed distribution of goldfish targeting different targets differs significantly from the theoretical distribution of their target selection if they had no directional cues (or behavioral response to them).

  • The normalized squared sum of the differences between the observed and expected distributions defines the chi-squared test statistic, or ꭓ2. (The advantage of squaring is that it weights large differences more than smaller, more stochastic ones.)

  • The degrees of freedom, df, is the number of observational categories that can change the system configuration, which is the number of samples minus one (if only one kind of observation is made, there is nothing to compare it to).

The ꭓ2 value is compared to the chi-squared distribution with df degrees of freedom, which provides a p-value, representing the probability that the difference in distributions is due to random chance. I take a p-value of 0.05 as the standard for rejecting the null hypothesis.

The following are the steps of performing a chi-square test on the trial 1 (control) data. These are our observed values:

This indicates that with no magnetic field or color stimulus, the goldfish selected the left target 23 times, the middle target 11 times, and the right target 14 times.

If the null hypothesis were correct, namely that goldfish directional behavior is not affected by anything non-random, i.e. it is random, we would expect a uniform distribution of target hits. So we expect that given 48 hits, the goldfish would hit each target 16 times (dividing the total number of hits, 48, by three targets).

To find the ꭓ2 statistic, we must compute the normalized squared difference between observed values and expected values. To do so, for each column, we take o, the observed number of target hits, and e, the expected number of hits, and compute (o-e)2/e. The sum of these values is the ꭓ2 statistic. This is the bottom row (the ꭓ2 statistic is highlighted):

Finally, the p-value of this statistic is computed, given 2 degrees of freedom. (This can be done with a variety of online calculators or with the spreadsheet function CHIDIST(ꭓ2, 2).) This gives a computed p-value of 0.087, which is greater than 0.05. This means there is an 8.7% chance that the results observed in this trial are due to random chance and not a dominating influence on the behavior of the goldfish. Because this is larger than 0.05, we fail to reject the null hypothesis, as expected. The goldfish were not given any indications of a correct target in this control trial and so are expected to only randomly hit a target.

Analyses of the remaining trials are below:


As mentioned, for trial 1, the p-value is greater than the significance level, so we fail to reject the null hypothesis, as expected because it is the control. (8.7% is somewhat surprisingly low, but, again, the small sample size means this data is not informative).

For trials 2 and 3, testing goldfish detection of magnetic field direction (by learned association with correct targets), the p-value is less than the significance level, which means we reject the null hypothesis, and accept the alternative hypothesis that goldfish are indeed able to detect magnetic field direction. Because of the small sample size, these results only indicate that a more thorough experiment with more data goldfish do indeed have the capability to sense the direction of magnetic fields. The same is true for trials 4 and 5, testing goldfish discrimination between, and response to, blue and red, which is expected because it is already known that goldfish do so.

For trials 6 through 8, the p-value is less than the significance level, so we reject the null hypotheses. This reveals an important challenge to our experimental design because trial 6 was a control, so the null hypothesis should not have been rejected. What happened? The most plausible explanation is that the fish learned from previous training and testing that pecking the middle target does not ever result in food and so they favored the left and right targets. This is an important finding because it indicates we cannot expect a random distribution of hits among targets after prior training and testing. In the case of trials 7-8 rejection of the null hypothesis indicates goldfish can detect magnetic field direction. Here, p-values are calculated as zero. However, this may be because the magnetic field was alternated and the fish anticipated this based on the alternations in preceding training.

Metacognition Experiment

Adding up all the values in each column of each test, the table for our chi square matrix looks like this :

The results for the chi - squared test are:


For the second through fourth trials, when the color was intermediate between clearly red and clearly blue, the p value was greater than .05, though trial two was close, at .054. Thus, we cannot say with the confidence of standard 95% probability that these results were not due to chance. The p values above .05 means only that we cannot be suitably confident the fish favored one target over another (beyond what would be expected by chance). However, the experiment was to test if the goldfish increasingly favor the lesser reward of the third target when its chance of hitting the correct target is decreased by the indeterminacy of the correct target (by the color’s being a shade of purple rather than blue or red). This is the case. The hits on the middle target increased from an average of 7.5 for the first and fifth (red and blue) trials to an average of 14 for the purple trials. The samples are too small and varied, however, to conclude the goldfish was aware of its lessened chance of obtaining the preferred reward, even if the experimental design was correct.

Most importantly, there is an alternative explanation to metacognition in this case, due to a flaw in the experimental design. If the fish is “unsure” whether to go right or left because the color is intermediate between red and blue, it may go back and forth and therefore spend more time in center or may “hesitate” between right and left and spend more time in middle. Spending more time in the middle will result in more hits on the middle target, not necessarily because the fish was aware it may hit the wrong large reward target and so it attempts to hit the middle target to at least get lesser reward, but just by increased chance due to location. This design flaw is addressed in the “Experimental Design Challenges and Improvements” document.