Quenched Disorder

Quenched disorder occurs in systems with local random heterogeneities. Formally, quenched disorder is frozen heterogeneity that is a background random potential for the fluctuating (thermal and/or quantum mechanical) degrees of freedom.

Motivation

Classical examples of quenched disorder systems traditionally studied in context of solid-state and condensed matter physics include localization of electrons, responsible for the existence of Anderson insulators characterized by vanishing vanishing zero-temperature conductivity, finite zero-temperature residual resistivity of metals, pinning of vortices in superconductors (which would otherwise move and dissipate energy resulting in finite resistivity), and charge density waves exhibiting impurities-induced nonlinear current-voltage characteristics.[1]

While initial early studies of condensed matter focused on idealized homogeneous systems, e.g., localized spins and electron liquid in ideal impurity-free crystals, phase transitions and ordered states of homogeneous matter, realistic systems include local random heterogeneities. Quenched disorder is also distinguished from annealed disorder where random degrees of freedom are ergodic. The latter is nothing more than an additional thermodynamic degree of freedom, and so not qualitatively distinct from a disorder-free multi-component system. Additional annealed disorder degrees of freedom can in principle be traced out, obtaining a disorder-free system with modified parameters.

Properties

In general, quenched disorder systems constitute an extremely challenging subject. It requires one to understand the behavior of infinite number of degrees of freedom without translational invariance in the presence of thermal fluctuations, divergent near a continuous phase transition. Even at zero temperature, the problem difficult because minimizing heterogeneous energy functional requires balancing two frustrated tendencies, order and disorder.

It is impossible and, in fact, unnecessary to find solutions for a specific realization of disorder. Instead, we often only need statistical typical properties of the system. Thus, in many cases, it is sufficient to compute disorder averaged physical properties, such the average free-energy and order parameter correlators.

Impurity defects can typically easily rearrange and equilibrate (acting like annealed β€œdisorder”) inside soft matter and thus quenched disorder is less common in such soft systems. However, there are many interesting and nontrivial exceptions. These include soft matter encapsulated inside a random solid matrix or in contact with a solid rough substrate. Interesting studied examples include liquid crystals confined inside a random aerogel or aerosil matrix or liquid crystal cells perturbed by a random substrate.

Study of quenched disorder has led to more challenging problems of "self-generated disorder" as in structural glasses and jammed systems where even without background heterogeneity the degrees of freedom get kinetically arrested, falling out of equilibrium.

Influence of ever-present quenched disorder on phase transitions and on concomitant ordered phases is another extremely developed subject of research. Prominent examples include magnetism and elastic soft media randomly pinned by the defected host atomic matrix or an underlying heterogeneous substrate as realized in pinned vortex lattices, charge density waves, magnetic domain walls, contact lines, earthquake and friction phenomena, extensively discussed by Pierre Le Doussal.

Learning From Data

In supervised machine learning, methods for minimizing the training error involve descending the error landscape over a the parameter vector of weights $\mathbf{w}$ via (stochastic) gradient descent. Since training loss function can be thought of as an energy function over thermal degrees of freedom $\mathbf{w}$, where data introduces quenched disorder of self-generated type, coupling together the degrees of freedom on the neural network. More classical forms of quenched disorder introduced by a topological heterogeneity can also shed light on mechanics of synaptic pruning. From these observations it can be posited that more general Hebbian learning is a problem of quenched disorder on networks.

This approach has led to a rich body of work,[2] where the focus was, in contrast to deriving upper bounds on test error, to asymptotically exact calculations of training and test errors in a thermodynamic limit in which both neural network size $N$ and dataset size $P$ remain $O(1)$. In this framework, the dataset $\mathcal{D}$ of $P$ points $\{\mathbf{x}^{0,\mu}, \mathbf{y}^\mu\}_{\mu=1}^{P}$ is drawn from the random inputs and outputs of a teacher neural network with ground truth parameters $\mathbf{w}^\ast$. The training error $\mathcal{E}_\mathrm{Train}(\mathbf{w}, \mathcal{D})$ is then thought of as an energy function over the thermal degrees of freedom $\mathbf{w}$ of a student neural network, where the data $\mathcal{D}$ the ground state of this statistical mechanical system on the student network parameters $\mathbb{w}$ is then compared to ground truth teacher weights $\mathbb{w}^\ast$ to assess generalization. Unfortunately, it can be difficult to carry out these calculations for complex modern neural networks as well as analytically demonstrate good generalization at tiny values of Ξ± for realistically structured data sets $D$.

In absence of adequate theory, many numerical explorations of generalization abound. One intriguing possibility is that a good generalization is a kinetic, nonequilibrium dynamical property of stochastic gradient descent, which biases the learned parameters to highly flat regions of the training error landscape,[3][4] such flatness yields stability, which can yield generalization. Other approaches suggest weights do not accumulate much information about the training data, thereby yielding generalization.

Other, information theoretic approach suggests weights do not accumulate much information about the training data, thereby yielding generalization.[5] This is related to the idea that learned neural networks might actually have a much smaller minimum description length.[6][7]

(edit)

Topics in Statistical Mechanics and Thermodynamics

x
Bibliography
1. Radzihovsky L., (2015), "Introduction to Quenched Disorder", Soft Matter In and Out of Equilibrium. : University of Colorado, Boulder. Available online.
2. Engel A., den Broeck C. V., (2001), Statistical Mechanics of Learning. Cambridge, UK: Cambridge Univ. Press.
3. Hochreiter S, Schmidhuber J. (1997), Neural Comput. 9:1–42
4. Keskar N.S., Mudigere D., Nocedal J., Smelyanskiy M., Tang P.T.P., et al. (2017) "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima". arXiv:1609.04836v2
5. Shwartz-Ziv R, Tishby N. (2017), "Opening the Black Box of Deep Neural Networks via Information
" arXiv:1703.00810
6. Hinton G., Van Camp D.. (1993), In Proceedings of the 6th Annual Conference on Computational Learning
Theory (COLT 1993), ed. L Pitt, pp. 5–13. New York: Assoc. Comput. Mach.
7. Hochreiter S, Schmidhuber J. 1994. In Advances in Neural Information Processing Systems 31 (NIPS 1994),
ed. S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, R Garnett, pp. 529–36. Red Hook, NY: Curran Assoc.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License