Sandbox 9

Information Geometry is the study of statistical manifolds, where each point represents a hypothesis about a state of affairs. In statistics, a hypothesis corresponds to a probability distribution, and in quantum mechanics, it corresponds to a mixed state.

The Fisher information metric is used to measure distances and angles in these manifolds.

# Definitions

Information geometry (IG) is a field that explores the world of information using modern geometry. It geometrically investigates information sciences, providing a differential-geometric structure on manifolds useful for statistical decision rules, including applications in neural networks and statistical mechanics.

• Information Manifold: An information manifold $(M, g, \nabla, \nabla^\ast)$ is equipped with both a metric tensor field $g$ and an affine connection $\nabla$, along with a dual structure $(M, g, \nabla^*)$.
• Metric Tensor Fields: The metric tensor field $g$ defines a smooth symmetric positive-definite bilinear form on the tangent bundle, allowing the measurement of vector magnitudes and angles. The dual metric tensor $g^*$ is defined as: $$g_{ij} = \langle e_i, e_j \rangle, \quad g^{*}_{ij} = \langle e^*_i, e^*_j \rangle.$$
• Affine Connection: The affine connection $\nabla$ is a differential operator that defines the covariant derivative operator, parallel transport, $\nabla$-geodesics, and the intrinsic curvature and torsion of the manifold.
• Conjugate Connection Manifolds: These are information manifolds $(M, g, \nabla, \nabla^*)$ that present dualistic structures, including statistical manifolds $(M, g, C)$, where $C$ denotes a cubic tensor.
• Fisher Information Metric: In the context of parametric family of probability models, the Fisher information metric is coupled to the exponential connection $e \nabla$ and the mixture connection $m \nabla$, forming dual connections. It is the unique invariant metric (up to a scaling factor).

# Theorems and Facts

• Fundamental Theorem of Riemannian Geometry: This theorem states the existence of a unique torsion-free Levi-Civita connection compatible with the metric, derived from the metric tensor.
• Dually Flat Manifolds: When considering Bregman divergences, dually flat manifolds are obtained, and the dualistic structure of statistical manifolds can be applied to mathematical programming and neural network optimization.
• Statistical Invariance: The Fisher information metric is the unique invariant metric, and the f-divergences are the unique separable invariant divergences.

# Statistical Manifolds

Main article: statistical manifold

A statistical manifold is a space where each point represents a hypothesis about some state of affairs. In classical statistics, this corresponds to a probability distribution, while in quantum mechanics, it corresponds to a mixed state. In neural networks, statistical manifolds provide a geometric framework for understanding and improving learning algorithms.

## Fisher Information Metric

The Fisher information metric is a way of measuring distances and angles in statistical manifolds, where each point represents a probability distribution. Given a family of probability distributions $p(x; \theta)$, parameterized by $\theta$, the Fisher information metric is given by:

$$g_{ij} = \int p(x; \theta) \frac{\partial \log p(x; \theta)}{\partial \theta_i} \frac{\partial \log p(x; \theta)}{\partial \theta_j} \mathrm{d}x$$

### Properties

• Riemannian Metric: The Fisher information metric is a Riemannian metric, providing a way to measure distances and angles on the manifold.
• Positive Definiteness: The metric is positive definite, meaning vectors always have a non-negative length.
• Connection to Entropy: The metric can be related to entropy, providing insights into the distinguishability of distributions.

This metric measures the local sensitivity of the likelihood function and can be used to define a natural gradient for learning algorithms.

### Fisher Metric in Mechanics

The formula for the Fisher information metric in classical mechanics is given by:

$$g_{ij} = \int \frac{\partial p(\cdot)}{\partial i} \frac{\partial \ln p(\cdot)}{\partial j} d\cdot$$

Here, $p$ is a smooth function from a manifold $M$ to the space of probability distributions on some measure space. In classical statistical mechanics, the Fisher information metric can be related to thermodynamic quantities. Consider a system described by the Gibbs ensemble:

$$p(x; \beta) = \frac{e^{-\beta H(x)}}{Z(\beta)}$$

where $H(x)$ is the Hamiltonian, $\beta$ is the inverse temperature, and $Z(\beta)$ is the partition function. The Fisher information metric becomes:

$$g_{\beta\beta} = \int p(x; \beta) \left( \frac{\partial \log p(x; \beta)}{\partial \beta} \right)^2 \mathrm{d}x = \beta^2 \left( \langle H^2 \rangle - \langle H \rangle^2 \right)$$

### Relevant Concepts

• Gibbs State: Gibbs state is a unique maximal-entropy state for which the expected value of observable $X_i$ is $x_i$.
• Covariance Matrix: The covariance matrix measures correlations in the fluctuations of observables.
• Thermodynamic Length: length of a path with respect to the Fisher information metric, referring to it as 'thermodynamic length'.
• Amari's α-Connections: These connections provide a family of connections that interpolate between the exponential and mixture connections, used in neural networks to define loss functions.