How Well do Feature Visualizations Support Causal Understanding of CNN Activations?

Roland S. Zimmermann*
University of Tübingen & IMPRS-IS
Judy Borowski*
University of Tübingen & IMPRS-IS
Robert Geirhos
University of Tübingen & IMPRS-IS
Matthias Bethge
University of Tübingen
Thomas S. A. Wallis
Technical University of Darmstadt
Wieland Brendel
University of Tübingen

tl;dr: Using psychophysical experiments, we show that widely used synthetic feature visualizations by Olah et al. (2017) do not support causal understanding much better than no visualizations, and only similarly well as other visualizations like natural dataset samples.


Sep '21 Our NeurIPS submission was accepted for a spotlight presentation!
July '21 We really enjoyed discussing this project with visitors at the ICML XAI workshop.
June '21 A shorter workshop version of the paper was accepted at the Theoretic Foundation, Criticism, and Application Trend of Explainable AI Workshop at ICML 2021.
June '21 The pre-print is now available on arXiv.


How useful are synthetic feature visualizations to interpret the effects of interventions? Given strongly activating reference images (either synthetic or natural), a human participant chooses which out of two manipulated images activates a unit more. Note that the presented trial is made up - real trials are often more difficult. Synthetic images are generated via feature visualization (Olah et al. (2017)).

Why we care

Feature visualizations via activation maximization are a popular explanation method for CNNs. They are believed to provide humans with precise information about the image features that cause a unit to be activated. A popular example is that they can distinguish whether a unit responds to a whole dog’s face or just an eye:

In this project, we test this intuition and investigate how well feature visualizations support causal understanding of CNN activations. Our assumption is that if these visualizations grant more causal insight, then they should allow humans to predict the effect of an intervention better.

What we did

In online experiments on Amazon Mechanical Turk (MTurk), we test how well participants understand the causal relation between manipulated images and a CNN unit’s activation. Here is an example trial:

Based on strongly activating reference images on the left, a participant chooses which manipulated image on the right elicits higher activation.

For a certain CNN unit, participants see several strongly activating feature visualizations on the left. On the right hand side, they see yet another strongly activating image - this time, though, a natural one. Below this natural image, two copies with square occlusions at different locations are shown. The question is: Which of these two manipulated images elicits higher activation? When we break this task down, what we’re really asking is which of the manipulated images contains as much content as possible of whatever seems important given the reference images.

Moreover, we compare feature visualizations with other visualization. For example, we test strongly activating dataset samples from ImageNet. This is what the trial looks like then:

What we found

Our main finding is that feature visualizations do not support causal understanding particularly well. With 67%, performance is above chance level for these synthetic images, which suggests that feature visualizations do provide some helpful information about the most important image patch. However, this performance is only slightly higher than when participants make their choices without any reference images (“None”). Finally, natural dataset samples as well as other combinations and types of visualizations are similarly helpful.

On average, humans reach the same performance regime with any visualization method. This holds for both lay participants on MTurk (dark colors) as well as for expert measurements (light colors).

As performances between conditions are very similar, we thoroughly investigate whether participants really understand the task and try their best. The good news is: We are confident that that is the case. While we describe five reasons for this in our paper, we only want to mention the most intuitive one here (see the paper for the other reasons, as well as more analyses): Measurements of the two first authors are similar to those of online participants - and we certainly engaged during this experiment ;-)

What we take from this

In summary, we showed that the widely used visualization method by Olah et al. (2017) does not convey causal understanding of CNN activations as well as previously thought. It is out of doubt that feature visualizations have an important place within the field of interpretability and that with more and more societal applications of machine learning, this method will become even more used. Therefore, developing realistic expectations of what we can - and what we cannot - expect from explanation methods is crucial. We hope that our task will serve as a challenging test case to steer further development of visualization methods.

Acknowledgements & Funding


When citing our project, please use our pre-print:

  author = {
    Zimmermann, Roland S. and
    Borowski, Judy and
    Geirhos, Robert and
    Bethge, Matthias and
    Wallis, Thomas S. A., and
    Brendel, Wieland
  title = {
    How Well do Feature Visualizations
    Support Causal Understanding
    of CNN Activations?
  journal = {CoRR},
  volume = {abs/2106.12447},
  year = {2021},
Webpage designed using Bootstrap 4.5.