The loglikelihood is an essential concept in statistical modeling and machine learning, related to probability theory and information theory. It's a logarithmic transformation of the likelihood function, playing a key role in parameter estimation, model selection, and hypothesis testing. The loglikelihood is also related to the KullbackLeibler divergence and to Boltzmann distribution as understood in context of energybased models in machine learning.
Table of Contents

Overview
The loglikelihood function is given by the natural logarithm of the likelihood function, expressed as:
$$ l(\theta) = \log L(\theta) = \sum_{i=1}^n \log f(x_i; \theta) $$
Here, $L(\theta)$ is the likelihood function, $f(x_i; \theta)$ is the probability density function (pdf), and $\theta$ is the vector of parameters. The loglikelihood is easier to work with computationally because it turns products into sums.
Properties
The loglikelihood has several important properties:
 Concavity: Under regular conditions, the loglikelihood is a concave function, leading to a unique maximum.
 Invariance: The maximum likelihood estimator (MLE) is invariant under transformation, and this invariance extends to the loglikelihood.
 Asymptotic Normality: Asymptotically, the MLE follows a normal distribution, and the loglikelihood has a wellknown Fisher information matrix.
Applications
Applications of the loglikelihood include:
 Information Geometry: Loglikelihood and its derivatives lead to important geometric constructs like Fisher metric in the space of probability distributions. KL divergence can also be expressed in terms of the loglikelihood, as shown in the article on EBMs.
 Model Selection: Criteria like Akaike Information Criterion (AIC) and Bayesian Information Criterion are based on the loglikelihood.
 Hypothesis Testing: Likelihood ratio tests are constructed using the loglikelihood.
 Parameter Estimation: Maximizing the loglikelihood yields the MLE of the parameters.
Statistical Mechanics
In statistical mechanics, the loglikelihood plays a role in meanfield theory, connecting with free energy and entropy.
EnergyBased Models
Main article: Energy based model
In the context of energybased models (EBMs), the loglikelihood is connected to the Boltzmann distribution. The negative loglikelihood is often utilized as a loss function to be minimized during training. An expression for the negative loglikelihood in EBMs is:
$$ l(\mathbf{w}) = \langle E(\mathbf{x}; \mathbf{w})\rangle_q  F_\mathbf{w} $$
Where $\langle \cdot \rangle_q$ denotes an average with respect to the data distribution $q(\mathbf{x})$, and $F_\mathbf{w} = \log Z_\mathbf{w}$ is the Helmholtz free energy of the model distribution $p(\mathbf{x}; \mathbf{w})$. Learning corresponds to maximizing the loglikelihood or minimizing the negative loglikelihood.
See Also
