In information theory, Rényi entropy refers to a class of measures, of entropy that are essentially logarithms of diversity indices. For special values of its parameter, the notion of Rényi entropy reproduces all of: Shannon entropy, Hartley entropy (maxentropy) and minentropy.
Defintion
The Rényi entropy of $p$ at order $\alpha$ is:
$$H_\alpha(p) := \frac{1}{1\alpha}\log\left( \sum_{i=1}^n (p_i)^\alpha \right)$$
Properties
The Rényi entropy is an antimonotone function in the orderparameter $\alpha$. That is:
$$\alpha_1 \leq \alpha_2 \iff H_{\alpha_1}(p) \geq H_{\alpha_2}(p)$$
For various (limiting) values of $\alpha$ the Rényi entropy reduces to notions of entropy that are known by their own names:
Order  $0$  $\lim_{\alpha \to 1}$  $2$  $\cdots$  $\lim_{\alpha \to \infty}$ 

Rényi entropy  maxentropy  Shannon entropy  collision entropy  …  minentropy 
In particular, in terms of the above special cases, this means that:
Hartley entropy $\leq$ Shannon entropy $\leq$ collision entropy $\leq \cdots \leq$ minentropy
Interpretation
The geometric interpretation of Rényi entropy can be understood in terms of the volume of the probability simplex. When $\alpha \to 1$, it reduces to Shannon entropy, representing the expected "surprise" or uncertainty of a random variable. As $\alpha$ varies, Rényi entropy captures different moments of the distribution, and its level sets can be visualized as hypersurfaces in the probability space.
Therefore, Rényi divergence can be seen as a distance measure between probability distributions in a statistical manifold. It provides a way to quantify the difference between two distributions, and its geometry can be explored through the study of geodesics in the space of probability distributions.
Rényi Divergence
Main article: Rényi divergence
Related to Rényi entropy is the concept of Rényi divergence, a measure of how one probability distribution diverges from another. For two probability distributions $p$ and $q$, the Rényi divergence of order $\alpha$ is defined as:
$$D_\alpha(p  q) = \frac{1}{\alpha  1} \log \left( \sum_i \frac{p_i^\alpha}{q_i^{\alpha1}} \right)$$
Rényi divergence generalizes KullbackLeibler divergence in much the same way as Rényi entropy generalizes Shannon entropy:
Order  $0$  $\lim_{\alpha \to 1}$  $2$  $\cdots$  $\lim_{\alpha \to \infty}$ 

Rényi divergence  $\log Q(\{i : p_i > 0\})$  KullbackLeibler divergence  $\log \left\langle \frac{p_i}{q_i} \right\rangle$  …  $\log \sup_i \frac{p_i}{q_i}$ 
Rényi divergence can be seen as a distance measure in the space of probability distributions. It defines a Riemannian metric, giving rise to a manifold structure. Thus, Rényi divergence generalizes other divergence measures like KullbackLeibler divergence and Hellinger distance, providing a unifying framework for understanding various distance measures in information geometry.
Properties
The value $\alpha = 1$, which gives the Shannon entropy and the Kullback–Leibler divergence, is the only value at which the chain rule of conditional probability holds exactly, both for absolute:
$$H(A,X) = H(A) + \mathbb{E}_{a \sim A} \big[ H(X A=a) \big]$$
and relative entropies:
$$D_\mathrm{KL}(p(xa)p(a)\m(x,a)) = D_\mathrm{KL}(p(a)\m(a)) + \mathbb{E}_{p(a)}\{D_\mathrm{KL}(p(xa)\m(xa))\}$$
The latter in particular means that if we seek a distribution $p(x, a)$ which minimizes the divergence from some underlying prior measure $m(x, a)$, and we acquire new information which only affects the distribution of $a$, then the distribution of $p(xa)$ remains $m(xa)$, unchanged.
The other Rényi divergences satisfy the criteria of being positive and continuous, being invariant under 1to1 coordinate transformations, and of combining additively when $A$ and $X$ are independent, so that if $p(A, X) = p(A)p(X)$, then
$$H_\alpha(A,X) = H_\alpha(A) + H_\alpha(X) $$
$$D_\alpha(P(A)P(X)\Q(A)Q(X)) = D_\alpha(P(A)\Q(A)) + D_\alpha(P(X)\Q(X))$$
The stronger properties of the $\alpha = 1$ quantities allow the definition of conditional information and mutual information from communication theory.
