KullbackLeibler Divergence¶
The KullbackLeibler divergence, sometimes also called the relative entropy, of a distribution \(p\) from a distribution \(q\) is defined as:
The KullbackLeibler divergence quantifies the average number of extra bits required to represent a distribution \(p\) when using an arbitrary distribution \(q\). This can be seen through the following identity:
Where the Cross Entropy quantifies the total cost of encoding \(p\) using \(q\), and the Entropy quantifies the true, minimum cost of encoding \(p\). For example, let’s consider the cost of representing a biased coin by a fair one:
In [1]: from dit.divergences import kullback_leibler_divergence
In [2]: p = dit.Distribution(['0', '1'], [3/4, 1/4])
In [3]: q = dit.Distribution(['0', '1'], [1/2, 1/2])
In [4]: kullback_leibler_divergence(p, q)
Out[4]: 0.18872187554086717
That is, it costs us \(0.1887\) bits of wasted overhead by using a mismatched distribution.
Not a Metric¶
Although the KullbackLeibler divergence is often used to see how “different” two distributions are, it is not a metric. Importantly, it is neither symmetric nor does it obey the triangle inequality. It does, however, have the following property:
with equality if and only if \(p = q\). This makes it a premetric.
API¶

kullback_leibler_divergence
(dist1, dist2, rvs=None, crvs=None, rv_mode=None)[source]¶ The KullbackLiebler divergence between dist1 and dist2.
Parameters:  dist1 (Distribution) – The first distribution in the KullbackLeibler divergence.
 dist2 (Distribution) – The second distribution in the KullbackLeibler divergence.
 rvs (list, None) – The indexes of the random variable used to calculate the KullbackLeibler divergence between. If None, then the KullbackLeibler divergence is calculated over all random variables.
 rv_mode (str, None) – Specifies how to interpret rvs and crvs. Valid options are: {‘indices’, ‘names’}. If equal to ‘indices’, then the elements of crvs and rvs are interpreted as random variable indices. If equal to ‘names’, the the elements are interpreted as random variable names. If None, then the value of dist._rv_mode is consulted, which defaults to ‘indices’.
Returns: dkl – The KullbackLeibler divergence between dist1 and dist2.
Return type: Raises: ditException
– Raised if either dist1 or dist2 doesn’t have rvs or, if rvs is None, if dist2 has an outcome length different than dist1.