Rényi Entropy

In information theory, Rényi entropy refers to a class of measures, of entropy that are essentially logarithms of diversity indices. For special values of its parameter, the notion of Rényi entropy reproduces all of: Shannon entropy, Hartley entropy (max-entropy) and min-entropy.

Defintion

The Rényi entropy of $p$ at order $\alpha$ is:
$$H_\alpha(p) := \frac{1}{1-\alpha}\log\left( \sum_{i=1}^n (p_i)^\alpha \right)$$

Properties

The Rényi entropy is an anti-monotone function in the order-parameter $\alpha$. That is:
$$\alpha_1 \leq \alpha_2 \iff H_{\alpha_1}(p) \geq H_{\alpha_2}(p)$$

For various (limiting) values of $\alpha$ the Rényi entropy reduces to notions of entropy that are known by their own names:

Order $0$ $\lim_{\alpha \to 1}$ $2$ $\cdots$ $\lim_{\alpha \to \infty}$
Rényi entropy max-entropy Shannon entropy collision entropy min-entropy

In particular, in terms of the above special cases, this means that:

Hartley entropy $\leq$ Shannon entropy $\leq$ collision entropy $\leq \cdots \leq$ min-entropy

Interpretation

The geometric interpretation of Rényi entropy can be understood in terms of the volume of the probability simplex. When $\alpha \to 1$, it reduces to Shannon entropy, representing the expected "surprise" or uncertainty of a random variable. As $\alpha$ varies, Rényi entropy captures different moments of the distribution, and its level sets can be visualized as hyper-surfaces in the probability space.

Therefore, Rényi divergence can be seen as a distance measure between probability distributions in a statistical manifold. It provides a way to quantify the difference between two distributions, and its geometry can be explored through the study of geodesics in the space of probability distributions.

Rényi Divergence

Main article: Rényi divergence

Related to Rényi entropy is the concept of Rényi divergence, a measure of how one probability distribution diverges from another. For two probability distributions $p$ and $q$, the Rényi divergence of order $\alpha$ is defined as:

$$D_\alpha(p || q) = \frac{1}{\alpha - 1} \log \left( \sum_i \frac{p_i^\alpha}{q_i^{\alpha-1}} \right)$$

Rényi divergence generalizes Kullback-Leibler divergence in much the same way as Rényi entropy generalizes Shannon entropy:

Order $0$ $\lim_{\alpha \to 1}$ $2$ $\cdots$ $\lim_{\alpha \to \infty}$
Rényi divergence $-\log Q(\{i : p_i > 0\})$ Kullback-Leibler divergence $\log \left\langle \frac{p_i}{q_i} \right\rangle$ $\log \sup_i \frac{p_i}{q_i}$

Rényi divergence can be seen as a distance measure in the space of probability distributions. It defines a Riemannian metric, giving rise to a manifold structure. Thus, Rényi divergence generalizes other divergence measures like Kullback-Leibler divergence and Hellinger distance, providing a unifying framework for understanding various distance measures in information geometry.

Properties

The value $\alpha = 1$, which gives the Shannon entropy and the Kullback–Leibler divergence, is the only value at which the chain rule of conditional probability holds exactly, both for absolute:

$$H(A,X) = H(A) + \mathbb{E}_{a \sim A} \big[ H(X| A=a) \big]$$

and relative entropies:

$$D_\mathrm{KL}(p(x|a)p(a)\|m(x,a)) = D_\mathrm{KL}(p(a)\|m(a)) + \mathbb{E}_{p(a)}\{D_\mathrm{KL}(p(x|a)\|m(x|a))\}$$

The latter in particular means that if we seek a distribution $p(x, a)$ which minimizes the divergence from some underlying prior measure $m(x, a)$, and we acquire new information which only affects the distribution of $a$, then the distribution of $p(x|a)$ remains $m(x|a)$, unchanged.

The other Rényi divergences satisfy the criteria of being positive and continuous, being invariant under 1-to-1 co-ordinate transformations, and of combining additively when $A$ and $X$ are independent, so that if $p(A, X) = p(A)p(X)$, then

$$H_\alpha(A,X) = H_\alpha(A) + H_\alpha(X) $$

$$D_\alpha(P(A)P(X)\|Q(A)Q(X)) = D_\alpha(P(A)\|Q(A)) + D_\alpha(P(X)\|Q(X))$$

The stronger properties of the $\alpha = 1$ quantities allow the definition of conditional information and mutual information from communication theory.

(edit)

Topics in Information Theory and Control Theory

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License