The log-likelihood is an essential concept in statistical modeling and machine learning, related to probability theory and information theory. It's a logarithmic transformation of the likelihood function, playing a key role in parameter estimation, model selection, and hypothesis testing. The log-likelihood is also related to the Kullback-Leibler divergence and to Boltzmann distribution as understood in context of energy-based models in machine learning.
Table of Contents
|
Overview
The log-likelihood function is given by the natural logarithm of the likelihood function, expressed as:
$$ l(\theta) = \log L(\theta) = \sum_{i=1}^n \log f(x_i; \theta) $$
Here, $L(\theta)$ is the likelihood function, $f(x_i; \theta)$ is the probability density function (pdf), and $\theta$ is the vector of parameters. The log-likelihood is easier to work with computationally because it turns products into sums.
Properties
The log-likelihood has several important properties:
- Concavity: Under regular conditions, the log-likelihood is a concave function, leading to a unique maximum.
- Invariance: The maximum likelihood estimator (MLE) is invariant under transformation, and this invariance extends to the log-likelihood.
- Asymptotic Normality: Asymptotically, the MLE follows a normal distribution, and the log-likelihood has a well-known Fisher information matrix.
Applications
Applications of the log-likelihood include:
- Information Geometry: Log-likelihood and its derivatives lead to important geometric constructs like Fisher metric in the space of probability distributions. KL divergence can also be expressed in terms of the log-likelihood, as shown in the article on EBMs.
- Model Selection: Criteria like Akaike Information Criterion (AIC) and Bayesian Information Criterion are based on the log-likelihood.
- Hypothesis Testing: Likelihood ratio tests are constructed using the log-likelihood.
- Parameter Estimation: Maximizing the log-likelihood yields the MLE of the parameters.
Statistical Mechanics
In statistical mechanics, the log-likelihood plays a role in mean-field theory, connecting with free energy and entropy.
Energy-Based Models
Main article: Energy based model
In the context of energy-based models (EBMs), the log-likelihood is connected to the Boltzmann distribution. The negative log-likelihood is often utilized as a loss function to be minimized during training. An expression for the negative log-likelihood in EBMs is:
$$ -l(\mathbf{w}) = \langle E(\mathbf{x}; \mathbf{w})\rangle_q - F_\mathbf{w} $$
Where $\langle \cdot \rangle_q$ denotes an average with respect to the data distribution $q(\mathbf{x})$, and $F_\mathbf{w} = -\log Z_\mathbf{w}$ is the Helmholtz free energy of the model distribution $p(\mathbf{x}; \mathbf{w})$. Learning corresponds to maximizing the log-likelihood or minimizing the negative log-likelihood.
See Also
|