Rényi Entropy

In information theory, Rényi entropy refers to a class of measures, of entropy that are essentially logarithms of diversity indices. For special values of its parameter, the notion of Rényi entropy reproduces all of: Shannon entropy, Hartley entropy (max-entropy) and min-entropy.

# Defintion

The Rényi entropy of $p$ at order $\alpha$ is:
$$H_\alpha(p) := \frac{1}{1-\alpha}\log\left( \sum_{i=1}^n (p_i)^\alpha \right)$$

# Properties

The Rényi entropy is an anti-monotone function in the order-parameter $\alpha$. That is:
$$\alpha_1 \leq \alpha_2 \iff H_{\alpha_1}(p) \geq H_{\alpha_2}(p)$$

For various (limiting) values of $\alpha$ the Rényi entropy reduces to notions of entropy that are known by their own names:

 Order Rényi entropy $0$ $\lim_{\alpha \to 1}$ $2$ $\cdots$ $\lim_{\alpha \to \infty}$ max-entropy Shannon entropy collision entropy … min-entropy

In particular, in terms of the above special cases, this means that:

Hartley entropy $\leq$ Shannon entropy $\leq$ collision entropy $\leq \cdots \leq$ min-entropy

# Interpretation

The geometric interpretation of Rényi entropy can be understood in terms of the volume of the probability simplex. When $\alpha \to 1$, it reduces to Shannon entropy, representing the expected "surprise" or uncertainty of a random variable. As $\alpha$ varies, Rényi entropy captures different moments of the distribution, and its level sets can be visualized as hyper-surfaces in the probability space.

Therefore, Rényi divergence can be seen as a distance measure between probability distributions in a statistical manifold. It provides a way to quantify the difference between two distributions, and its geometry can be explored through the study of geodesics in the space of probability distributions.

# Rényi Divergence

Main article: Rényi divergence

Related to Rényi entropy is the concept of Rényi divergence, a measure of how one probability distribution diverges from another. For two probability distributions $p$ and $q$, the Rényi divergence of order $\alpha$ is defined as:

$$D_\alpha(p || q) = \frac{1}{\alpha - 1} \log \left( \sum_i \frac{p_i^\alpha}{q_i^{\alpha-1}} \right)$$

Rényi divergence generalizes Kullback-Leibler divergence in much the same way as Rényi entropy generalizes Shannon entropy:

 Order Rényi divergence $0$ $\lim_{\alpha \to 1}$ $2$ $\cdots$ $\lim_{\alpha \to \infty}$ $-\log Q(\{i : p_i > 0\})$ Kullback-Leibler divergence $\log \left\langle \frac{p_i}{q_i} \right\rangle$ … $\log \sup_i \frac{p_i}{q_i}$

Rényi divergence can be seen as a distance measure in the space of probability distributions. It defines a Riemannian metric, giving rise to a manifold structure. Thus, Rényi divergence generalizes other divergence measures like Kullback-Leibler divergence and Hellinger distance, providing a unifying framework for understanding various distance measures in information geometry.

# Properties

The value $\alpha = 1$, which gives the Shannon entropy and the Kullback–Leibler divergence, is the only value at which the chain rule of conditional probability holds exactly, both for absolute:

$$H(A,X) = H(A) + \mathbb{E}_{a \sim A} \big[ H(X| A=a) \big]$$

and relative entropies:

$$D_\mathrm{KL}(p(x|a)p(a)\|m(x,a)) = D_\mathrm{KL}(p(a)\|m(a)) + \mathbb{E}_{p(a)}\{D_\mathrm{KL}(p(x|a)\|m(x|a))\}$$

The latter in particular means that if we seek a distribution $p(x, a)$ which minimizes the divergence from some underlying prior measure $m(x, a)$, and we acquire new information which only affects the distribution of $a$, then the distribution of $p(x|a)$ remains $m(x|a)$, unchanged.

The other Rényi divergences satisfy the criteria of being positive and continuous, being invariant under 1-to-1 co-ordinate transformations, and of combining additively when $A$ and $X$ are independent, so that if $p(A, X) = p(A)p(X)$, then

$$H_\alpha(A,X) = H_\alpha(A) + H_\alpha(X)$$

$$D_\alpha(P(A)P(X)\|Q(A)Q(X)) = D_\alpha(P(A)\|Q(A)) + D_\alpha(P(X)\|Q(X))$$

The stronger properties of the $\alpha = 1$ quantities allow the definition of conditional information and mutual information from communication theory.

 (edit) Topics in Information Theory and Control Theory
page revision: 3, last edited: 17 Aug 2023 04:11