How Well do Feature Visualizations Support Causal Understanding of CNN Activations?
University of Tübingen
tl;dr: Using psychophysical experiments, we show that widely used synthetic feature visualizations by Olah et al. (2017) do not support causal understanding much better than no visualizations, and only similarly well as other visualizations like natural dataset samples.
News
| Sep '21 | Our NeurIPS submission was accepted for a spotlight presentation! | 
| July '21 | We really enjoyed discussing this project with visitors at the ICML XAI workshop. | 
| June '21 | A shorter workshop version of the paper was accepted at the Theoretic Foundation, Criticism, and Application Trend of Explainable AI Workshop at ICML 2021. | 
| June '21 | The pre-print is now available on arXiv. | 
Abstract
One widely used approach towards understanding the inner workings of deep convolutional neural networks is to visualize unit responses via activation maximization. Feature visualizations via activation maximization are thought to provide humans with precise information about the image features that cause a unit to be activated. If this is indeed true, these synthetic images should enable humans to predict the effect of an intervention, such as whether occluding a certain patch of the image (say, a dog's head) changes a unit's activation. Here, we test this hypothesis by asking humans to predict which of two square occlusions causes a larger change to a unit's activation. Both a large-scale crowdsourced experiment and measurements with experts show that on average, the extremely activating feature visualizations by Olah et al. (2017) indeed help humans on this task (67 ± 4% accuracy; baseline performance without any visualizations is 60 ± 3%). However, they do not provide any significant advantage over other visualizations (such as e.g. dataset samples), which yield similar performance (66 ± 3% to 67 ± 3% accuracy). Taken together, we propose an objective psychophysical task to quantify the benefit of unit-level interpretability methods for humans, and find no evidence that feature visualizations provide humans with better “causal understanding” than simple alternative visualizations.
Why we care
Feature visualizations via activation maximization are a popular explanation method for CNNs. They are believed to provide humans with precise information about the image features that cause a unit to be activated. A popular example is that they can distinguish whether a unit responds to a whole dog’s face or just an eye:
In this project, we test this intuition and investigate how well feature visualizations support causal understanding of CNN activations. Our assumption is that if these visualizations grant more causal insight, then they should allow humans to predict the effect of an intervention better.
What we did
In online experiments on Amazon Mechanical Turk (MTurk), we test how well participants understand the causal relation between manipulated images and a CNN unit’s activation. Here is an example trial:
For a certain CNN unit, participants see several strongly activating feature visualizations on the left. On the right hand side, they see yet another strongly activating image - this time, though, a natural one. Below this natural image, two copies with square occlusions at different locations are shown. The question is: Which of these two manipulated images elicits higher activation? When we break this task down, what we’re really asking is which of the manipulated images contains as much content as possible of whatever seems important given the reference images.
Moreover, we compare feature visualizations with other visualization. For example, we test strongly activating dataset samples from ImageNet. This is what the trial looks like then:
What we found
Our main finding is that feature visualizations do not support causal understanding particularly well. With 67%, performance is above chance level for these synthetic images, which suggests that feature visualizations do provide some helpful information about the most important image patch. However, this performance is only slightly higher than when participants make their choices without any reference images (“None”). Finally, natural dataset samples as well as other combinations and types of visualizations are similarly helpful.
As performances between conditions are very similar, we thoroughly investigate whether participants really understand the task and try their best. The good news is: We are confident that that is the case. While we describe five reasons for this in our paper, we only want to mention the most intuitive one here (see the paper for the other reasons, as well as more analyses): Measurements of the two first authors are similar to those of online participants - and we certainly engaged during this experiment ;-)
What we take from this
In summary, we showed that the widely used visualization method by Olah et al. (2017) does not convey causal understanding of CNN activations as well as previously thought. It is out of doubt that feature visualizations have an important place within the field of interpretability and that with more and more societal applications of machine learning, this method will become even more used. Therefore, developing realistic expectations of what we can - and what we cannot - expect from explanation methods is crucial. We hope that our task will serve as a challenging test case to steer further development of visualization methods.
Acknowledgements & Funding
                            We thank the reviewers and commenters (e.g. Chris Olah) on our previous paper to stimulate this further work. We thank Felix A. Wichmann and Isabel Valera for a helpful discussion. We further thank Ludwig Schubert for information on technical details via slack. In addition, we thank our colleagues for helpful discussions, and especially Matthias Kümmerer, Dylan Paiton, Wolfram Barfuss, and Matthias Tangemann for valuable feedback on our task, and/or technical support. And finally, we thank all our participants for taking part in our experiments.
                            We thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting JB, RZ and RG.
                            We acknowledge support from the German Federal Ministry of Education and Research (BMBF)
                            through the Competence Center for Machine
                            Learning (TUE.AI, FKZ 01IS18039A) and the Bernstein Computational
                            Neuroscience Program Tübingen (FKZ: 01GQ1002), the Cluster of Excellence Machine Learning: New Perspectives for Sciences (EXC2064/1), and the German Research Foundation (DFG; SFB 1233, Robust Vision: Inference Principles and Neural Mechanisms, TP3, project number 276693517).
                            MB and WB acknowledge funding from the MICrONS program of the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior/Interior Business Center (DoI/IBC) contract number D16PC00003.
                        
BibTeX
When citing our project, please use our pre-print:
author = {
Zimmermann, Roland S. and
Borowski, Judy and
Geirhos, Robert and
Bethge, Matthias and
Wallis, Thomas S. A., and
Brendel, Wieland
},
title = {
How Well do Feature Visualizations
Support Causal Understanding
of CNN Activations?
},
journal = {CoRR},
volume = {abs/2106.12447},
year = {2021},
}