Information Geometry is the study of statistical manifolds, where each point represents a hypothesis about a state of affairs. In statistics, a hypothesis corresponds to a probability distribution, and in quantum mechanics, it corresponds to a mixed state.
The Fisher information metric is used to measure distances and angles in these manifolds.
Definitions
Information geometry (IG) is a field that explores the world of information using modern geometry. It geometrically investigates information sciences, providing a differentialgeometric structure on manifolds useful for statistical decision rules, including applications in neural networks and statistical mechanics.
 Information Manifold: An information manifold $(M, g, \nabla, \nabla^\ast)$ is equipped with both a metric tensor field $g$ and an affine connection $\nabla$, along with a dual structure $(M, g, \nabla^*)$.
 Metric Tensor Fields: The metric tensor field $g$ defines a smooth symmetric positivedefinite bilinear form on the tangent bundle, allowing the measurement of vector magnitudes and angles. The dual metric tensor $g^*$ is defined as: $$g_{ij} = \langle e_i, e_j \rangle, \quad g^{*}_{ij} = \langle e^*_i, e^*_j \rangle.$$
 Affine Connection: The affine connection $\nabla$ is a differential operator that defines the covariant derivative operator, parallel transport, $\nabla$geodesics, and the intrinsic curvature and torsion of the manifold.
 Conjugate Connection Manifolds: These are information manifolds $(M, g, \nabla, \nabla^*)$ that present dualistic structures, including statistical manifolds $(M, g, C)$, where $C$ denotes a cubic tensor.
 Fisher Information Metric: In the context of parametric family of probability models, the Fisher information metric is coupled to the exponential connection $e \nabla$ and the mixture connection $m \nabla$, forming dual connections. It is the unique invariant metric (up to a scaling factor).
Theorems and Facts
 Fundamental Theorem of Riemannian Geometry: This theorem states the existence of a unique torsionfree LeviCivita connection compatible with the metric, derived from the metric tensor.
 Dually Flat Manifolds: When considering Bregman divergences, dually flat manifolds are obtained, and the dualistic structure of statistical manifolds can be applied to mathematical programming and neural network optimization.
 Statistical Invariance: The Fisher information metric is the unique invariant metric, and the fdivergences are the unique separable invariant divergences.
Statistical Manifolds
Main article: statistical manifold
A statistical manifold is a space where each point represents a hypothesis about some state of affairs. In classical statistics, this corresponds to a probability distribution, while in quantum mechanics, it corresponds to a mixed state. In neural networks, statistical manifolds provide a geometric framework for understanding and improving learning algorithms.
Fisher Information Metric
The Fisher information metric is a way of measuring distances and angles in statistical manifolds, where each point represents a probability distribution. Given a family of probability distributions $p(x; \theta)$, parameterized by $\theta$, the Fisher information metric is given by:
$$ g_{ij} = \int p(x; \theta) \frac{\partial \log p(x; \theta)}{\partial \theta_i} \frac{\partial \log p(x; \theta)}{\partial \theta_j} \mathrm{d}x $$
Properties
 Riemannian Metric: The Fisher information metric is a Riemannian metric, providing a way to measure distances and angles on the manifold.
 Positive Definiteness: The metric is positive definite, meaning vectors always have a nonnegative length.
 Connection to Entropy: The metric can be related to entropy, providing insights into the distinguishability of distributions.
This metric measures the local sensitivity of the likelihood function and can be used to define a natural gradient for learning algorithms.
Fisher Metric in Mechanics
The formula for the Fisher information metric in classical mechanics is given by:
$$g_{ij} = \int \frac{\partial p(\cdot)}{\partial i} \frac{\partial \ln p(\cdot)}{\partial j} d\cdot$$
Here, $p$ is a smooth function from a manifold $M$ to the space of probability distributions on some measure space. In classical statistical mechanics, the Fisher information metric can be related to thermodynamic quantities. Consider a system described by the Gibbs ensemble:
$$ p(x; \beta) = \frac{e^{\beta H(x)}}{Z(\beta)} $$
where $H(x)$ is the Hamiltonian, $\beta$ is the inverse temperature, and $Z(\beta)$ is the partition function. The Fisher information metric becomes:
$$g_{\beta\beta} = \int p(x; \beta) \left( \frac{\partial \log p(x; \beta)}{\partial \beta} \right)^2 \mathrm{d}x = \beta^2 \left( \langle H^2 \rangle  \langle H \rangle^2 \right)$$
Relevant Concepts
 Gibbs State: Gibbs state is a unique maximalentropy state for which the expected value of observable $X_i$ is $x_i$.
 Covariance Matrix: The covariance matrix measures correlations in the fluctuations of observables.
 Thermodynamic Length: length of a path with respect to the Fisher information metric, referring to it as 'thermodynamic length'.
 Amari's αConnections: These connections provide a family of connections that interpolate between the exponential and mixture connections, used in neural networks to define loss functions.
See also
TODO
This section has been left to be completed at a later date. You can help by editing it.
