Main Page
Welcome to a free encyclopedia on complexity sciences and information physics anyone can edit!
Read our Style Guide



"Research is about friendship and having fun."


(edit the sidebar)(edit this header)(edit top row)]

$$\sigma_\mu(A) = -\log\; \mu(A)$$

Did you know that…
(edit this page)

Recently Added Articles:
(edit this page)

In the works:
Cognitive EffortOrder parameterTopological Data AnalysisStatistical mechanicsEquilibrium Statistical mechanicsNon-Equilibrium Statistical mechanicsSpin glassKuramoto ModelNeural Mean-Field TheoryPhase SpaceDiscourse sheafČech cohomologyČech complexTopological EntropyMetric Measure SpaceGromov–Hausdorff Distance (Gromov–Wasserstein Distance) • HolonomyKalman FilterAlgorithmic ProbabilityCognitive EffortSalience NetworkAutoencoderTaken's theoremManifold LearningKullback-Leibler divergenceMaximum Likelihood EstimationMean-Field Theory (Curie-Weiss Theorem, Bethe-Peierls Approximation)

Rewrite: Hopf Decomposition

Chosen Page:
(edit this page)

Energy-based model

Energy-based probabilistic models are closely related to physics, specified as a Boltzmann distribution (with the Boltzmann factor $kT = 1$):

$$p(\mathbf{x}; \mathbf{w}) = \frac{1}{Z_\mathbf{w}}e^{E(\mathbf{x}; \mathbf{w})}$$

The earliest energy-based probabilistic models in machine learning were in fact called Boltzmann machines, and map directly onto Ising spin models with a learned coupling structure $\mathbf{w}$. Inserting the Boltzmann form into the log-likelihood learning objective $l(\mathbf{w}) = \int q(\mathbf{x}) \log(\mathbf{x}; \mathbf{w}) \mathrm{d}\mathbf{x}$ yields:

$$-l(\mathbf{w}) = \langle E(\mathbf{x}; \mathbf{w})\rangle_q - F_\mathbf{w}$$

Where $\langle \cdot \rangle_q$ denotes an average with respect to the data distribution $q(\mathbf{x})$ and $F_\mathbf{w} = -\log Z_\mathbf{w}$ is the Helmholtz free energy of the model distribution $p(\mathbf{x}; \mathbf{w})$. Thus, learning via maximizing the log-likelihood corresponds to minimizing the energy of observed data while increasing overall free energy of the model distribution. Maximizing $l(\mathbf{w})$ is also equivalent to minimizing the Kullback-Leibler divergence,

$$D_\mathrm{KL}(q\| p) = \int q(\mathbf{x}) \log \left( \frac{q(\mathbf{x})}{p(\mathbf{x}; \mathbf{w})}\right)\mathrm{d}\mathbf{x} = G_\mathbf{w}(q) - F_\mathbf{w}$$

Kullback-Leibler divergence $D_\mathrm{KL}(q \| p)$ is nonnegative measure of the divergence between two distributions $q$ and $p$ that is zero if and only if $q=p$. In the special case when $p$ takes the Boltzmann form, the KL divergence becomes the difference between the Gibbs free energy of $q$, defined as $G_\mathbf{w}(q) = \langle E(\mathbf{x}; \mathbf{w})\rangle_q - S(q)$ (where $S(q) = -\int q(\mathbf{x})\log q(\mathbf{x}) \mathrm{d}\mathbf{x}$ is the entropy of $q$) and the Helmholtz free energy $F_\mathbf{w}$ of $p$.

Continue reading...


Topics in Artificial Intelligence and Machine Learning


Topics in Nonlinear Dynamics, Chaos, and Ergodic Theory


Topics in Statistical Mechanics and Thermodynamics


Topics in Biology, Biophysics, and Bioinformatics


Topics in Cognitive and Computational Neuroscience


Topics in Information Theory and Control Theory


Topics in Graph Theory, Networks, and Econophysics


Topics in Topological Data Analysis


Topics in Theoretical Computer Science


Topics in Social Sciences

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License