Singular Learning Theory (SLT) is a mathematical framework that expands and improves upon traditional statistical Learning theory using techniques from algebraic geometry, bayesian statistics, and statistical physics.
In the case of learning algorithms, such as deep neural networks, where there are multiple parameter values corresponding to the same statistical distribution, the preimage of the target distribution may take the form of a singular subspace of the parameter space.
Motivation
- Deep learning has ability to model extremely complex functions due to its hierarchical structure and hidden variables, which are in general nonidentifiable (the map from the parameter to a statistical model is not one-to-one) and singular (the likelihood function cannot be approximated by any Gaussian function); almost all eigen values of Fisher information matrix are zero.[1][2]
- The maximum likelihood and Bayesian methods have different predictive performances, even if the sample size goes to infinity.
- Even if both statistisical model and a prior distribution are in an overparametrized state for unknown uncertainty, the generalization error fails to increase.[4]
These facts show that deep learning is quite different from conventional statistical models. Singular learning theory seeks to construct a mathematical foundation for such models on the basis of algebraic geometry.
SLT obtains mathematical theorems which hold for an arbitrary triple of (a true distribution, a statistical model, and a prior), which can be applied to the real world data science problems where uncertainty is unknown. Furthermore, the marginal likelihood and the generalization error of singular learning machines is clarified and can be calculated in real world problems.[3].
|
|