Ding-Xuan Zhou
City University of Hong Kong
Theory of deep convolutional neural networks
Abstract:
Deep learning has been widely applied and brought breakthroughs in speech recognition,
computer vision, and many other domains. The involved deep neural network architectures
and computational issues have been well studied in machine learning. But there lacks a theoretical
foundation for understanding the modeling, approximation or generalization ability of deep learning
models with network architectures such as deep convolutional neural networks (CNNs) with
convolutional structures. The convolutional architecture gives essential differences between the deep
CNNs and fully-connected deep neural networks, and the classical theory for fully-connected networks
developed around 30 years ago does not apply.
This talk describes a mathematical theory of deep CNNs associated with the rectified linear unit
(ReLU) activation function. In particular, we give the first proof for the universality of deep CNNs,
meaning that a deep CNN can be used to approximate any continuous function to an arbitrary accuracy
when the depth of the neural network is large enough. We also give explicit rates of approximation and
show that the approximation ability of deep CNNs is at least as good as that of fully-connected multi-
layer neural networks. Our quantitative estimate, given tightly in terms of the number of free
parameters to be computed, verifies the efficiency of deep CNNs in dealing with large dimensional
data.
<< back