Information Geometry

Information Geometry is the study of statistical manifolds, where each point represents a hypothesis about a state of affairs. In statistics, a hypothesis corresponds to a probability distribution, and in quantum mechanics, it corresponds to a mixed state.

The Fisher information metric is used to measure distances and angles in these manifolds.

# Definitions

Information geometry (IG) is a field that explores the world of information using modern geometry. It geometrically investigates information sciences, providing a differential-geometric structure on manifolds useful for statistical decision rules, including applications in neural networks and statistical mechanics.

• Information Manifold: An information manifold $(M, g, \nabla, \nabla^\ast)$ is equipped with both a metric tensor field $g$ and an affine connection $\nabla$, along with a dual structure $(M, g, \nabla^*)$.
• Metric Tensor Fields: The metric tensor field $g$ defines a smooth symmetric positive-definite bilinear form on the tangent bundle, allowing the measurement of vector magnitudes and angles. The dual metric tensor $g^*$ is defined as: $$g_{ij} = \langle e_i, e_j \rangle, \quad g^{*}_{ij} = \langle e^*_i, e^*_j \rangle.$$
• Affine Connection: The affine connection $\nabla$ is a differential operator that defines the covariant derivative operator, parallel transport, $\nabla$-geodesics, and the intrinsic curvature and torsion of the manifold.
• Conjugate Connection Manifolds: These are information manifolds $(M, g, \nabla, \nabla^*)$ that present dualistic structures, including statistical manifolds $(M, g, C)$, where $C$ denotes a cubic tensor.
• Fisher Information Metric: In the context of parametric family of probability models, the Fisher information metric is coupled to the exponential connection $e \nabla$ and the mixture connection $m \nabla$, forming dual connections. It is the unique invariant metric (up to a scaling factor).

## Manifold Structure Hierarchy

Manifold Structure in Information Geometry

 Smooth Manifold $M$ Manifold with a Metric Tensor $(M, g)$ Affine Connection $(M, g, \nabla)$ Information Manifold $(M, g, \nabla, \nabla^\ast)$ $\to$ $\to$ $\to$ $\nabla^g$-Connection Divergence $\nabla^D$ Amari's α-Connection Conjugate Connection Manifolds Connections coupled to $g$ Dual parallel transport on $g$

# Theorems and Facts

• Fundamental Theorem of Riemannian Geometry: This theorem states the existence of a unique torsion-free Levi-Civita connection compatible with the metric, derived from the metric tensor.
• Dually Flat Manifolds: When considering Bregman divergences, dually flat manifolds are obtained, and the dualistic structure of statistical manifolds can be applied to mathematical programming and neural network optimization.
• Statistical Invariance: The Fisher information metric is the unique invariant metric, and the f-divergences are the unique separable invariant divergences.

# Statistical Manifolds

Main article: statistical manifold

A statistical manifold is a space where each point represents a hypothesis about some state of affairs. In classical statistics, this corresponds to a probability distribution, while in quantum mechanics, it corresponds to a mixed state. In neural networks, statistical manifolds provide a geometric framework for understanding and improving learning algorithms.

## Fisher Information Metric

The Fisher information metric is a way of measuring distances and angles in statistical manifolds, where each point represents a probability distribution. Given a family of probability distributions $p(x; \theta)$, parameterized by $\theta$, the Fisher information metric is given by:

$$g_{ij} = \int p(x; \theta) \frac{\partial \log p(x; \theta)}{\partial \theta_i} \frac{\partial \log p(x; \theta)}{\partial \theta_j} \mathrm{d}x$$

### Properties

• Riemannian Metric: The Fisher information metric is a Riemannian metric, providing a way to measure distances and angles on the manifold.
• Positive Definiteness: The metric is positive definite, meaning vectors always have a non-negative length.
• Connection to Entropy: The metric can be related to entropy, providing insights into the distinguishability of distributions.

This metric measures the local sensitivity of the likelihood function and can be used to define a natural gradient for learning algorithms.

### Fisher Metric in Mechanics

The formula for the Fisher information metric in classical mechanics is given by:

$$g_{ij} = \int \frac{\partial p(\cdot)}{\partial i} \frac{\partial \ln p(\cdot)}{\partial j} d\cdot$$

Here, $p$ is a smooth function from a manifold $M$ to the space of probability distributions on some measure space. In classical statistical mechanics, the Fisher information metric can be related to thermodynamic quantities. Consider a system described by the Gibbs ensemble:

$$p(x; \beta) = \frac{e^{-\beta H(x)}}{Z(\beta)}$$

where $H(x)$ is the Hamiltonian, $\beta$ is the inverse temperature, and $Z(\beta)$ is the partition function. The Fisher information metric becomes:

$$g_{\beta\beta} = \int p(x; \beta) \left( \frac{\partial \log p(x; \beta)}{\partial \beta} \right)^2 \mathrm{d}x = \beta^2 \left( \langle H^2 \rangle - \langle H \rangle^2 \right)$$

### Relevant Concepts

• Gibbs State: Gibbs state is a unique maximal-entropy state for which the expected value of observable $X_i$ is $x_i$.
• Covariance Matrix: The covariance matrix measures correlations in the fluctuations of observables.
• Thermodynamic Length: length of a path with respect to the Fisher information metric, referring to it as 'thermodynamic length'.

## Amari's α-Connections

Amari's α-Connections are used to define a family of divergences, known as α-divergences, that measure the difference between probability distributions. In other words, it is family of connections that interpolate between the exponential and mixture connections, used in neural networks to define loss functions. The geodesics of the α-Connections provide a natural way to interpolate between probability distributions.

Given a statistical manifold $(M, g)$, Amari's α-Connections are defined by a one-parameter family of connections $\nabla^\alpha$ given by:

$$\nabla^\alpha_X Y = \nabla_X Y + \alpha \Gamma(X,Y) - (1-\alpha) \Gamma^*(X,Y)$$

where $\alpha \in \mathbb{R}$, $\nabla$ is the Levi-Civita connection, and $\Gamma$ and $\Gamma^*$ are the Christoffel symbols of the exponential and mixture connections, respectively.

### Properties

• Interpolation: Amari's α-Connections interpolate between the exponential connection ($\alpha = 0$) and the mixture connection ($\alpha = 1$).
• Duality: The connections exhibit a duality property, where the α-connection is dual to the $(1-\alpha)$-connection.
• Compatibility with Fisher Metric: Amari's $\alpha$-Connections are compatible with the Fisher information metric, preserving the metric structure.
• Uniqueness: Amari's α-Connections are the unique family of connections that interpolate between the exponential and mixture connections while preserving the metric structure.
• Geodesics: The geodesics of the α-Connections provide a natural way to interpolate between probability distributions, with applications in information retrieval and data analysis.