Kullback-Leibler Divergence

The Kullback-Leibler divergence, sometimes also called the relative entropy, of a distribution \(p\) from a distribution \(q\) is defined as:

\[\DKL{p || q} = \sum_{x \in \mathcal{X}} p(x) \log_2 \frac{p(x)}{q(x)}\]

The Kullback-Leibler divergence quantifies the average number of extra bits required to represent a distribution \(p\) when using an arbitrary distribution \(q\). This can be seen through the following identity:

\[\DKL{p || q} = \xH{p || q} - \H{p}\]

Where the Cross Entropy quantifies the total cost of encoding \(p\) using \(q\), and the Entropy quantifies the true, minimum cost of encoding \(p\). For example, let’s consider the cost of representing a biased coin by a fair one:

In [1]: from dit.divergences import kullback_leibler_divergence

In [2]: p = dit.Distribution(['0', '1'], [3/4, 1/4])

In [3]: q = dit.Distribution(['0', '1'], [1/2, 1/2])

In [4]: kullback_leibler_divergence(p, q)
Out[4]: 0.18872187554086717

That is, it costs us \(0.1887\) bits of wasted overhead by using a mismatched distribution.

Not a Metric

Although the Kullback-Leibler divergence is often used to see how “different” two distributions are, it is not a metric. Importantly, it is neither symmetric nor does it obey the triangle inequality. It does, however, have the following property:

\[\DKL{p || q} \ge 0\]

with equality if and only if \(p = q\). This makes it a premetric.


kullback_leibler_divergence(dist1, dist2, rvs=None, crvs=None, rv_mode=None)[source]

The Kullback-Liebler divergence between dist1 and dist2.

  • dist1 (Distribution) – The first distribution in the Kullback-Leibler divergence.
  • dist2 (Distribution) – The second distribution in the Kullback-Leibler divergence.
  • rvs (list, None) – The indexes of the random variable used to calculate the Kullback-Leibler divergence between. If None, then the Kullback-Leibler divergence is calculated over all random variables.
  • rv_mode (str, None) – Specifies how to interpret rvs and crvs. Valid options are: {‘indices’, ‘names’}. If equal to ‘indices’, then the elements of crvs and rvs are interpreted as random variable indices. If equal to ‘names’, the the elements are interpreted as random variable names. If None, then the value of dist._rv_mode is consulted, which defaults to ‘indices’.

dkl – The Kullback-Leibler divergence between dist1 and dist2.

Return type:



ditException – Raised if either dist1 or dist2 doesn’t have rvs or, if rvs is None, if dist2 has an outcome length different than dist1.