The Boltzmann distribution, also known as the MaxwellBoltzmann distribution, is a fundamental concept in statistical mechanics, describing the probability distribution of the microscopic states of a system in thermal equilibrium.
Overview
The Boltzmann distribution describes how the number of particles is distributed among available energy levels at thermal equilibrium. It provides a bridge between the macroscopic properties of a system and its microscopic behavior, allowing for a statistical description of the distribution of energies among the constituent particles.
Definition
The Boltzmann distribution is defined for a system in equilibrium at temperature $T$ and is given by:
$$P(E) = \frac{1}{Z} e^{\frac{E}{kT}}$$
Here,
 $P(E)$ is the probability of a microstate with energy \( E \),
 $k$ is the Boltzmann constant,
 $T$ is the absolute temperature, and
 $Z$ is the partition function, defined as: $$Z = \sum_i e^{\frac{E_i}{kT}}$$
The sum is over all possible microstates of the system, and the partition function ensures the proper normalization of probabilities.
Equipartition Theorem
Main article: Equipartition theorem
The Equipartition Theorem asserts that the energy is considered to be partitioned equally among all the available degrees of freedom, such as translational, rotational, and vibrational modes of motion. Equipartition is fundamental in understanding the behavior of macroscopic systems and connects statistical mechanics to thermodynamic quantities like temperature and heat capacity.
Theorem. For a system with a Hamiltonian $H(\mathbf{p}, \mathbf{q})$ with generalized coordinates $\mathbf{q}$ and momenta $\mathbf{p}$:$$\left\langle q_i \frac{\partial H}{\partial p_i} \right\rangle = k T$$for each quadratic degree of freedom, where $\langle \cdot \rangle$ denotes the canonical ensemble average. 
That is for a system in thermal equilibrium at temperature $T$, each quadratic degree of freedom in the Hamiltonian will contribute an amount $\frac{1}{2} k T$ to the total energy of the system, where $k$ is the Boltzmann constant. This principle helps describe the average behavior of an ensemble of particles in systems like ideal gases.
For example, in the case of a classical ideal gas with $N$ particles, the Hamiltonian is given by:
$$H = \sum_{i=1}^{3N} \frac{p_i^2}{2m}$$
where $m$ is the mass of a particle, and the sum runs over all 3 spatial directions for each particle. By applying the equipartition theorem, the total average kinetic energy of the gas is:
$$U = \frac{3}{2} N k T$$
The Equipartition Theorem is based on classical mechanics and also fails to predict the correct behavior at very low temperatures, where quantum effects become significant. For example, the specific heat of a solid approaches zero as temperature approaches absolute zero, contrary to the prediction of equipartition. Quantum mechanical corrections to the equipartition theorem lead to more accurate models of systems at low temperatures, including the behavior of quantum gases like BoseEinstein condensates and Fermi gases.
In nonlinear systems, the equipartition may not hold, leading to a complex and chaotic behavior. The study of such deviations leads to insights into chaos theory and complex systems behavior.
Brownian Motion
Main article: Langevin dynamics
Brownian motion refers to the random motion of particles suspended in a fluid. It can be derived from the Langevin equation, which describes the motion of a particle in a viscous medium under the influence of thermal forces. The connection between the Langevin equation and Brownian motion is intimately related to the equipartition theorem.
The Langevin equation models the motion of a particle in a viscous medium with a stochastic force representing thermal fluctuations:
$$\frac{dv}{dt} = \gamma v + \xi(t),$$
where $v$ is the velocity, $\gamma$ is the damping constant, and $xi(t)$ is a white noise stochastic force with
$$\langle \xi(t) \xi(t') \rangle = 2 \gamma kT \delta(t  t').$$
Applying the equipartition theorem to the Langevin equation, we find the following relation for the mean square velocity:
$$\langle v^2 \rangle = \frac{kT}{m},$$
where $m$ is the mass of the particle. Combining the Langevin equation with the equipartition theorem, we can derive the following expression for the mean squared displacement (MSD) of the particle, characterizing the Brownian motion:
$$\langle x^2 \rangle = 2Dt,$$
where $D = \frac{kT}{\gamma}$ is the diffusion coefficient. MSD shows that the particle's displacement grows linearly with time, and the proportionality constant is related to the temperature and damping constant.
Ergodicity
In a system that is ergodic, all accessible microstates are equally likely over a long period of time. This property leads to the equipartition of energy among the degrees of freedom, consistent with the equipartition theorem. For an ergodic dynamical system with Hamiltonian dynamics, the time average along the trajectory of the system equals the ensemble average, ensuring the energy distribution prescribed by equipartition.
Deep Learning
The principles underlying the equipartition theorem find parallels in deep learning through the framework of Boltzmann machines and related concepts of ergodicity. Below, we detail the connections.
Boltzmann Machines
Main article: Boltzmann machine
In the context of Boltzmann machines, we can interpret equipartition theorem as pertaining to the distribution of weights and activations within the network. Specifically, under certain conditions, the average contribution of each parameter to the total energy is constant, which may inform regularization techniques and influence model architecture.
The connection to the equipartition theorem arises from the Hamiltonian of the system, defined as:
$$H(\mathbf{v}, \mathbf{h}) = \sum_{i,j} w_{i,j} v_i h_j  \sum_i b_i v_i  \sum_j c_j h_j$$
where $mathbf{v}$ and $mathbf{h}$ are visible and hidden units, respectively, and $w_{i,j}$, $b_i$, and $c_j$ are the model's parameters. The joint probability distribution over visible and hidden units is given by the Boltzmann distribution:
$$p(\mathbf{v}, \mathbf{h}) = \frac{1}{Z} \exp\left(H(\mathbf{v}, \mathbf{h})\right)$$
where $Z$ is the partition function.
Energy Distribution in Neural Networks
In deep learning, the weights and activations of a neural network can be seen as degrees of freedom. Some studies have shown that during the training process, the gradients' magnitudes tend to become uniform across different layers. This has been likened to the equipartition of kinetic and potential energy in a physical system.
During training, an analogous phenomenon might occur where the gradients' magnitudes become equalized across different layers. This can be formally stated as:
$$\langle q_i^2 \rangle = kT,$$
where $\langle q_i^2 \rangle$ represents the mean square of the generalized coordinate corresponding to the $i$th degree of freedom.
Significance
The Boltzmann distribution is foundational in various fields and concepts:
 Thermodynamics: It describes the behavior of gases and forms the basis of the kinetic theory of gases.
 Statistical Mechanics: In equilibrium statistical mechanics, the distribution is central to understanding systems at thermal equilibrium.
 Machine Learning: The concept is used in energybased models (EBMs) as a form of generative models.
 Quantum Statistics: The distribution is a classical limit and is generalized by the FermiDirac and BoseEinstein distributions for quantum systems.
See also
