Frontiers of Information Technology & Electronic Engineering  2018 Vol.19 No.1 P.27-39


Visual interpretability for deep learning: a survey

Author(s):  Quan-shi Zhang, Song-chun Zhu

Affiliation(s):  University of California, Los Angeles, California 90095, USA

Corresponding email(s):   zhangqs@ucla.edu, sczhu@stat.ucla.edu

Key Words:  Artificial intelligence, Deep learning, Interpretable model

This paper reviews recent studies in understanding neural-network representations and learning neural networks with interpretable/disentangled middle-layer representations. Although deep neural networks have exhibited superior performance in various tasks, interpretability is always Achilles’ heel of deep neural networks. At present, deep neural networks obtain high discrimination power at the cost of a low interpretability of their black-box representations. We believe that high model interpretability may help people break several bottlenecks of deep learning, e.g., learning from a few annotations, learning via human–computer communications at the semantic level, and semantically debugging network representations. We focus on convolutional neural networks (CNNs), and revisit the visualization of CNN representations, methods of diagnosing representations of pre-trained CNNs, approaches for disentangling pre-trained CNN representations, learning of CNNs with disentangled representations, and middle-to-end learning based on model interpretability. Finally, we discuss prospective trends in explainable artificial intelligence.


概要:总结了近年来在理解神经网络内部特征表达和训练一个具有中层表达可解释性的深度神经网络上的相关研究工作。虽然深度神经网络在众多人工智能任务中已有杰出表现,但神经网络中层表达的可解释性依然是该领域发展的重大瓶颈。目前,深度神经网络以低解释性的黑箱表达为代价,获取了强大的分类能力。我们认为提高神经网络中层特征表达的可解释性,可以帮助人们打破众多深度学习的发展瓶颈,比如,小数据训练,语义层面上的人机交互式训练,以及基于内在特征语义定向精准修复网络中层特征表达缺陷等难题。本文着眼于卷积神经网络,调研了:(1) 网络表达可视化方法;(2) 网络表达的诊断方法;(3) 自动解构解释卷积神经网络的方法;(4) 学习中层特征表达可解释的神经网络的方法;(5) 基于网络可解释性的中层对端的深度学习算法。最后,讨论了可解释性人工智能未来可能的发展趋势。


