Understanding KL-Divergence
Definition & Intuition
In mathematical statistics, the Kullback–Leibler (KL) divergence , denoted \(D_{KL}(P \parallel Q)\) is a type of statistical distance: a measure of how one reference probability distribution \(P\) is different from a second probability distribution \(Q\). KL-divergence is also known as relative entropy and I-divergence. Mathematically, in discrete domain KL-divergence is defined as
\[D_{KL}(P\parallel Q) = \sum_{i} P(i) log \bigg(\frac{P(i)}{Q(i)}\bigg)\]and in continuous domain it is
\[D_{KL}(P\parallel Q) = \int P(x) log \frac{P(x)}{Q(x)} dx\]where \(P\) is the original distribution and \(Q\) is the estimated distribution.
Some important properties of KL-Divergence
- Non-negativity: Divergence \(0\) means the two distributions \(P\) and \(Q\) are identical.
- Assemetry: \(D_{KL}(P \parallel Q) \neq D_{KL}(Q \parallel P)\). Intuitively \(D_{KL}(P\parallel Q)\) represents the expected additional information needed to encode samples from distribution \(P\) using a code designed for distribution \(Q\).
Enjoy Reading This Article?
Here are some more articles you might like to read next: