Deep Learning in Scientific Computing: Understanding the Instability Mystery
Deep learning (DL) is causing profound changes in society. It has inspired unprecedented advances in historically challenging problems, such as image classification and speech recognition. And now, perhaps inevitably, it is markedly affecting scientific computing.
Yet DL has an Achilles’ heel. Current implementations can be highly unstable, meaning that a certain small perturbation to the input of a trained neural network can cause substantial change in its output. This phenomenon is both a nuisance and a major concern for the safety and robustness of DL-based systems in critical applications—like healthcare—where reliable computations are essential. It also raises several questions. Why does this instability occur? Can it be prevented? And what does it mean for scientific computing, a field in which accuracy and stability are paramount? Here we consider these questions in the context of inverse problems, an area of scientific computing where DL has shown significant promise.
Instabilities in Image Classification
The story of instabilities begins with image classification. Researchers first observed these instabilities in 2013 upon the introduction of an algorithm that fooled a trained neural network classifier [10]. Given a fixed input image \(x\) with label \(p\), the algorithm computes a small perturbation \(r\), such that the image \(x+r\)—while indistinguishable from \(x\) to the human eye—is misclassified with label \(q \ne p\). Figure 1 depicts several examples of this effect. Though the perturbations are barely visible, each one prompts the classifier to fail in a dramatic way.
The study of adversarial perturbations (or adversarial attacks) on classification problems has since become an active subfield of machine learning research [11]. Scientists have constructed real-world adversarial perturbations in applications that range from image classification and speech recognition to surveillance, self-driving vehicles, and automated diagnosis.
Deep Learning for Inverse Problems
Although quite different from classification problems, inverse problems—specifically inverse problems in imaging—comprise an area in which DL methods have made particularly rapid progress. Numerous studies have reported superior DL performance over current state-of-the-art techniques in various image reconstruction tasks, including medical imaging modalities like magnetic resonance imaging (MRI) and X-ray computed tomography [1, 5, 8, 12]. Such optimism is perhaps best exemplified by the following quote from Nature Methods [9], which reports on recent work [12]: “AI transforms image reconstruction. A deep-learning-based approach improves the speed, accuracy, and robustness of biomedical image reconstruction.”
The simplest type of inverse problem—but one that is often sufficient in practice—is the discrete linear problem:
\[\textrm {Given} \enspace \textrm{measurements} \enspace y = A x + e \in \mathbb{C}^m \enspace \textrm {of} \enspace x \in \mathbb{C}^N, \enspace \textrm{recover} \enspace x.\tag1\]
Here, \(x \in \mathbb{C}^N\) is the (vectorized) unknown image, \(A \in \mathbb{C}^{m \times N}\) represents the measurement process, and \(e \in \mathbb{C}^m\) is the noise. Because of physical constraints, this problem is often highly undersampled in practice—the number of measurements \(m\) is generally much smaller than the image size \(N\)—and therefore challenging. Typical DL approaches seek to overcome this issue by learning a neural network \(\Psi : \mathbb{C}^m \rightarrow \mathbb{C}^N\) that produces accurate reconstructions \(\Psi(Ax+e) \approx x\) for relevant image classes. This process is facilitated by a set of training data
\[\mathcal{T} = \{ (x^j , y^j ) : j = 1,\ldots,K \},\]
which consists of typical images \(x^j\) (e.g., MRI scans of different brains) and their measurements \(y^j = A x^j + e^j .\)
Researchers have proposed multiple different DL approaches to solve \((1)\). However, growing evidence indicates that many of these approaches are also unstable. Figures 2 and 3 provide examples of this effect. In both cases, a small perturbation causes a significant degradation in the reconstruction’s quality. While Figure 2 is based on a worst-case perturbation (similar to the case of classification problems), Figure 3 indicates that purely random perturbations can sometimes elicit substantial effects. In contrast, state-of-the-art (untrained) sparse regularization methods [1] are typically far less susceptible to perturbations.
The Universal Instability Theorem
While DL approaches perform very well on some image reconstruction tasks, many methods appear to do so at the price of instability. The universal instability theorem sheds light on this issue [4]. Let \(\Psi : \mathbb{C}^m \rightarrow \mathbb{C}^N \) be a continuous reconstruction map for \((1)\), and suppose that there are two vectors \(x,x' \in \mathbb{C}^N\) for which
\[\parallel x-x' \parallel > 2 \eta \quad (x \enspace \textrm{and} \enspace x' \enspace \textrm {are} \enspace \textrm{far} \enspace \textrm{apart}),\tag2\]
\[\parallel Ax - Ax' \parallel \leq \eta \quad (\textrm{the} \enspace \textrm{measurements} \enspace \textrm{of} \enspace x \enspace \textrm{and} \enspace x' \enspace \textrm{are} \enspace \textrm{similar}),\tag3\]
and
\[\parallel \Psi (Ax) - x \parallel + \parallel \Psi(Ax')-x' \parallel < 2 \eta \quad (\Psi \enspace \textrm{recovers} \enspace x \enspace \textrm{and} \enspace x' \enspace \textrm{well})\tag4\]
for some \(\eta >0\). The theorem then states the following:
(a) Instability. There is a closed, non-empty ball \(\mathcal{B}_y \subset \mathbb{C}^m\) centered at \(y=Ax\), such that the local \(\varepsilon\)-Lipschitz constant at any \(\tilde {y} \in \mathcal{B}_y\) satisfies
\[L^{\varepsilon}(\Psi,\tilde y) : = \sup_{0 < \parallel z-\tilde y \parallel \le \varepsilon} \frac{\parallel \Phi(z)-\Phi(\tilde y)\parallel}{\parallel z- \tilde y\parallel} \geq \frac{1}{\eta}\left(\parallel x-x'\parallel - 2\eta \right), \qquad \forall \varepsilon \geq \eta.\]
Because the Lipschitz constant measures the effect of perturbations, this result states that any map that overperforms—i.e., accurately recovers two vectors \(x\) and \(x'\) \((4)\) even though their measurements are similar \((3)\)—must also be unstable. This implies that a delicate tradeoff exists between accuracy and stability, with the quest for too much accuracy (i.e., attempting to extract more from the data than is reasonable) leading to poor stability.
The prior result helps explain why DL can become unstable. Simply put, DL approaches often have no mechanisms for protecting against overperformance. Recall that a typical training goal is to obtain a small training error, i.e., \(\Psi(y^j) \approx x^j\) for \(j = 1,\ldots,K\). However, if the training set contains two elements \(x,x'\) with \(\parallel x - x' \parallel \gg 2\eta\) and \(\parallel A x - A x' \parallel \leq \eta\), successful training will necessarily cause instabilities. As the training set is often large and \(A\) often has a large null space (e.g., when \(m \ll N\)), this situation can arise in many potential ways.
(b) False negatives. There is a \(z \in \mathbb{C}^N\) with \(\parallel z\parallel \geq \parallel x-x'\parallel\); an \(e \in \mathbb{C}^m\) with \(\parallel e \parallel \leq \eta\); and closed, non-empty balls \(\mathcal{B}_{x}\), \(\mathcal{B}_{e}\) centered at \(x\) and \(e\) respectively, such that
\[\parallel \Psi(A (\tilde {x} +z) + \tilde{e})- \tilde {x} \parallel \leq \eta, \quad \forall \tilde{x} \in \mathcal{B}_x, \, \tilde{e} \in \mathcal{B}_e.\tag5\]
False positives also arise in an analogous way. One can interpret this property by viewing \(x\) as a “healthy” brain image and \(z\) as a “tumor.” It asserts that \(\Psi\) may falsely reconstruct a healthy brain \(x\) given measurements of an unhealthy brain \(x+z\). It also implies that instabilities are not rare events. If \(e\) is a random vector (with mild assumptions on its distribution), then the fact that \((5)\) occurs in a ball means that
\[\mathbb{P}(\parallel\Psi( A (x+z) + e) - x\parallel \leq \eta) \geq c > 0\]
for some \(c>0\). Therefore, purely random perturbations can create false negatives (and positives) with nontrivial probability, as seen in Figure 3.
False Negatives and Threading the Accuracy-Stability Needle
What to do? It is of course elementary to create a stable network. The zero network would do the job but obviously produce many false negatives. The difficulty comes with simultaneously ensuring both stability and performance; Figure 4 highlights this issue. The network was trained on images that are comprised of ellipses and is quite stable in practice. Yet if a small detail that was not in the training set is inserted, the network washes it out almost entirely. The 2019 FastMRI challenge has also reported similar effects on practical MRI datasets, with networks failing to reconstruct small but physically-relevant image abnormalities [3]. It is also worth noting that encouraging stability during training is not easy. Common methods like adversarial training, random sampling patterns, and enforcing consistency fail to protect against overperformance and thus remain susceptible to the universal instability theorem [4]. Overall, determining the best approach to walking the tightrope between accuracy and stability remains a significant open problem.
Limits of Deep Learning in Scientific Computing
The universal instability theorem is an example of a methodological boundary. Historically, scientific progress is often shaped by the presence or absence of such boundaries. Theoretical computer science, for example, developed with a thorough understanding of its limitations thanks to Gödel and Turing’s fundamental work on non-computability. Numerical analysis has boundaries such as the Dahlquist and Butcher barriers in numerical ordinary differential equations, stability of Gaussian elimination, performance of the simplex method, and so forth.
Given the tradition for trial-and-error approaches in DL research—often accompanied by grand performance claims—such boundaries are more important now than ever. Neural networks are substantially more complex than the traditional tools of scientific computing. Critical assessment of new DL methods is needed, and further theoretical insights into accuracy-stability tradeoffs are essential for navigating the development of these new methods. To do so, we must ask a guiding question: What are the limits of DL in scientific computing?
Acknowledgments: Nina M. Gottschling acknowledges support from the U.K. Engineering and Physical Sciences Research Council grant EP/L016516/1. Anders C. Hansen thanks NVIDIA for a GPU grant in the form of a Titan X Pascal and acknowledges support from a Royal Society University Research Fellowship and the 2017 Philip Leverhulme Prize. Ben Adcock acknowledges support from the Pacific Institute for the Mathematical Sciences CRG in High-dimensional Data Analysis, Simon Fraser University’s Big Data Initiative “Next Big Question” Fund, and the Natural Sciences and Engineering Research Council through grant R611675.
References
[1] Adcock, B., & Hansen, A.C. (2020). Compressive Imaging: Structure, Sampling, Learning. Cambridge University Press (in press).
[2] Antun, V., Renna, F., Poon, C., Adcock, B., & Hansen, A.C. (2020). On instabilities of deep learning in image reconstruction and the potential costs of AI. Proc. Natl. Acad. Sci., 117(48), 30088-30095.
[3] Cheng, K., Calivá, F., Shah, R., Han, M., Majumdar, S., & Pedoia, V. (2020). Addressing the false negative problem of deep learning MRI reconstruction models by adversarial attacks and robust training. In T. Arbel, I.B. Ayed, M. de Bruijne, M. Descoteaux, H. Lombaert, & C. Pal (Eds.), Proceedings of the Third Conference on Medical Imaging with Deep Learning (pp. 121-135). Montreal, Canada: PMLR.
[4] Gottschling, N.M., Antun, V., Adcock, B., & Hansen, A.C. (2020). The troublesome kernel: Why deep learning for inverse problems is typically unstable. Preprint, arXiv:2001.01258.
[5] Hammernik, K., Klatzer, T., Kobler, E., Recht, M.P., Sodickson, D.K., Pock, T., & Knoll, F. (2018). Learning a variational network for reconstruction of accelerated MRI data. Magn. Reson. Med., 79(6), 3055-3071.
[6] Jin, K.H., McCann, M.T., Froustey, E., & Unser, M. (2017). Deep convolutional neural network for inverse problems in imaging. IEEE Trans. Image Process., 26(9), 4509-4522.
[7] Moosavi-Dezfooli, S., Fawzi, A., Fawzi, O., & Frossard, P. (2017). Universal adversarial perturbations. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 86-94). Honolulu, HI: IEEE.
[8] Schlemper, J., Caballero, J., Hajnal, J.V., Price, A., & Rueckert, D. (2017). A deep cascade of convolutional neural networks for MR image reconstruction. In M. Niethammer, M. Styner, S. Aylward, H. Zhu, I. Oguz, P.-T. Yap, & D. Shen (Eds.), Information Processing in Medical Imaging (Vol. 10265). In Lecture Notes in Computer Science (pp. 647-658). Springer.
[9] Strack, R. (2018). Imaging: AI transforms image reconstruction. Nature Methods, 15(5), 309.
[10] Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I.J., & Fergus, R. (2013). Intriguing properties of neural networks. Preprint, arXiv:1312.6199.
[11] Wiyatno, R.R., Xu, A., Dia, O., & de Berker, A. (2019). Adversarial examples in modern machine learning: A review. Preprint, arXiv:1911.05268.
[12] Zhu, B., Liu, J.Z., Cauley, S.F., Rosen, B.R., & Rosen, M.S. (2018). Image reconstruction by domain-transform manifold learning. Nature, 555(7697), 487-492.
About the Authors
Vegard Antun
Postdoctoral Fellow, University of Oslo
Vegard Antun is a postdoctoral fellow in applied mathematics at the University of Oslo. His research is centered on deep learning-based techniques for scientific computing, with a particular focus on inverse problems and imaging.
Nina M. Gottschling
Ph.D. Student, University of Cambridge
Nina M. Gottschling is a third-year Ph.D. student in the Applied Functional and Harmonic Analysis group in the Department of Applied Mathematics and Theoretical Physics at the University of Cambridge. Her research interests include harmonic analysis, applied functional analysis, numerical analysis, optimisation theory, and mathematical physics.
Anders C. Hansen
Professor, University of Cambridge and University of Oslo
Anders C. Hansen is a professor of mathematics at the University of Cambridge and the University of Oslo.
Ben Adcock
Associate Professor, Simon Fraser University
Ben Adcock is an associate professor of mathematics at Simon Fraser University. His research interests include numerical analysis, mathematics of data science, approximation theory, and applied and computational harmonic analysis.
Stay Up-to-Date with Email Alerts
Sign up for our monthly newsletter and emails about other topics of your choosing.