Understanding KL-Divergence

Definition & Intuition

In mathematical statistics, the Kullback–Leibler (KL) divergence , denoted \(D_{KL}(P \parallel Q)\) is a type of statistical distance: a measure of how one reference probability distribution \(P\) is different from a second probability distribution \(Q\). KL-divergence is also known as relative entropy and I-divergence. Mathematically, in discrete domain KL-divergence is defined as

\[D_{KL}(P\parallel Q) = \sum_{i} P(i) log \bigg(\frac{P(i)}{Q(i)}\bigg)\]

and in continuous domain it is

\[D_{KL}(P\parallel Q) = \int P(x) log \frac{P(x)}{Q(x)} dx\]

where \(P\) is the original distribution and \(Q\) is the estimated distribution.

Some important properties of KL-Divergence

Non-negativity: Divergence \(0\) means the two distributions \(P\) and \(Q\) are identical.
Assemetry: \(D_{KL}(P \parallel Q) \neq D_{KL}(Q \parallel P)\). Intuitively \(D_{KL}(P\parallel Q)\) represents the expected additional information needed to encode samples from distribution \(P\) using a code designed for distribution \(Q\).

Definition & Intuition

Some important properties of KL-Divergence

Enjoy Reading This Article?