Mercer's theorem is a fundamental result in mathematics, particularly in functional analysis and operator theory. It plays a crucial role in the theory of integral equations and has significant applications in machine learning, especially in the construction of kernel methods.
Table of Contents

Overview
Mercer's theorem is a result concerning the representation of a continuous, symmetric, and positive definite kernel function. It states that such a kernel $k(x, y)$ can be expressed as a sum:
$$ k(x, y) = \sum_{n=1}^{\infty} \lambda_n \phi_n(x) \phi_n(y) $$
Where:
 $\lambda_n$ are nonnegative eigenvalues
 $\phi_n(x)$ are corresponding eigenfunctions.
The theorem ensures that the kernel can be represented in terms of orthogonal functions, providing a solid foundation for various mathematical and computational applications.
Mercer's theorem provides a bridge between linear algebra and functional analysis, allowing for the decomposition of a kernel function into a series of eigenfunctions. This decomposition is essential in various fields, including statistics, machine learning, and signal processing. By understanding the spectral properties of a kernel, Mercer's theorem offers insights into the underlying structure of the data, facilitating more sophisticated modeling and analysis.
Definition
To explain Mercer's theorem, we first consider an important special case; see below for a more general formulation. A kernel, in this context, is a symmetric continuous function $k : [a,b] \times [a,b] \to \mathbb{R}$ that is symmetric, that is $k(x,y) = k(y,x)$ for all $x, y \in [a,b]$.
Kernel $k$ is said to be positivedefinite if and only if
$\sum _{i=1}^{n}\sum _{j=1}^{n}k(x_{i},x_{j})c_{i}c_{j}\geq 0$
for all finite sequences of points $x_1, \cdots, x_n$ of $[a, b]$ and all choices of real numbers $c_1, \cdots, c_n$. Note that the term "positivedefinite" is wellestablished in literature despite the weak inequality in the definition.
Associated to $k$ is a linear operator (more specifically a Hilbert–Schmidt integral operator) on functions defined by the integral
$$[T_{k}\varphi ](x)=\int _{a}^{b}k(x,s)\varphi (s)\,ds.$$
For technical considerations we assume $\varphi$ can range through the space $L^2[a, b]$ (see Lp space) of squareintegrable realvalued functions. Since $T_k$ is a linear operator, we can talk about eigenvalues and eigenfunctions of $T_k$.
Theorem. Let $k$ be a continuous symmetric positivedefinite kernel function $k : [a,b]\times[a,b] \to \mathbb{R}$. Then there is an orthonormal basis $\{e_i\}_i$ of $L^2[a, b]$ consisting of eigenfunctions of HilbertSchmidt integral operator $T_k$ such that the corresponding sequence of eigenvalues $\{\lambda_i\}_i$ is nonnegative. The eigenfunctions corresponding to nonvanishing eigenvalues are continuous on $[a, b]$ and $k$ has the representation $$K(s,t)=\sum _{j=1}^{\infty }\lambda _{j}\,e_{j}(s)\,e_{j}(t)$$ where the convergence is absolute and uniform. 
Spectral Theory
Main article: Spectral theory
The proof of Mercer's theorem relies heavily on the spectral theory of compact operators. It begins by considering the integral operator associated with the kernel and then studies its spectral properties. The compactness of the operator ensures the existence of a countable set of eigenvalues and corresponding eigenfunctions.
The structure of the proof highlights the deep connection between Mercer's theorem and the spectral theory of compact operators. By understanding the eigenvalues and eigenfunctions of the operator, one can derive the representation of the kernel function in terms of these spectral components. This connection has profound implications in various fields, including functional analysis, differential equations, and mathematical physics.
Mercer's Condition
Mercer's condition is a specific requirement that ensures the applicability of Mercer's theorem. It requires that the kernel function be continuous, symmetric, and positive definite. These properties guarantee that the kernel can be decomposed into a series of orthogonal functions, allowing for the application of Mercer's theorem.
Kernel Trick
Main article: Kernel trick
The kernel trick is a technique in machine learning that leverages Mercer's theorem to implicitly map data into a higherdimensional space. By using a kernel function that satisfies Mercer's condition, one can apply linear methods in the transformed space without explicitly dealing with the highdimensional representation. This trick has led to groundbreaking developments in various algorithms and models, including support vector machines and kernel PCA.
See Also:
 Kernel function
 Hilbert space
 Spectral theory
 Support Vector Machines (SVM)
 Kernel Hilbert space
 Eigenfunction
 Eigenvalue problem
 Positive definite kernel

