Jensen-Shannon Divergence¶

The Jensen-Shannon divergence is a principled divergence measure which is always finite for finite random variables. It quantifies how “distinguishable” two or more distributions are from each other. In its basic form it is:

$\JSD{X || Y} = \H{\frac{X + Y}{2}} - \frac{\H{X} + \H{Y}}{2}$

That is, it is the entropy of the mixture minus the mixture of the entropy. This can be generalized to an arbitrary number of random variables with arbitrary weights:

$\JSD{X_{0:n}} = \H{\sum w_i X_i} - \sum \left( w_i \H{X_i} \right)$
In [1]: from dit.divergences import jensen_shannon_divergence

In [2]: X = dit.ScalarDistribution(['red', 'blue'], [1/2, 1/2])

In [3]: Y = dit.ScalarDistribution(['blue', 'green'], [1/2, 1/2])

In [4]: jensen_shannon_divergence([X, Y])
Out[4]: 0.5

In [5]: jensen_shannon_divergence([X, Y], [3/4, 1/4])
Out[5]: 0.40563906222956647

In [6]: Z = dit.ScalarDistribution(['blue', 'yellow'], [1/2, 1/2])

In [7]: jensen_shannon_divergence([X, Y, Z])
Out[7]: 0.7924812503605778

In [8]: jensen_shannon_divergence([X, Y, Z], [1/2, 1/4, 1/4])
Out[8]: 0.75


Derivation¶

Where does this equation come from? Consider Jensen’s inequality:

$\Psi \left( \mathbb{E}(x) \right) \geq \mathbb{E} \left( \Psi(x) \right)$

where $$\Psi$$ is a concave function. If we consider the divergence of the left and right side we find:

$\Psi \left( \mathbb{E}(x) \right) - \mathbb{E} \left( \Psi(x) \right) \geq 0$

If we make that concave function $$\Psi$$ the Shannon entropy $$\H{}$$, we get the Jensen-Shannon divergence. Jensen from Jensen’s inequality, and Shannon from the use of the Shannon entropy.

Note

Some people look at the Jensen-Rényi divergence (where $$\Psi$$ is the Rényi Entropy) and the Jensen-Tsallis divergence (where $$\Psi$$ is the Tsallis Entropy).

Metric¶

The square root of the Jensen-Shannon divergence, $$\sqrt{\JSD{}}$$, is a true metric between distributions.

Relationship to the Other Measures¶

The Jensen-Shannon divergence can be derived from other, more well known information measures; notably the Kullback-Leibler Divergence and the Mutual Information.

Kullback-Leibler divergence¶

The Jensen-Shannon divergence is the average Kullback-Leibler divergence of $$X$$ and $$Y$$ from their mixture distribution, $$M$$:

$\begin{split}\JSD{X || Y} &= \frac{1}{2} \left( \DKL{X || M} + \DKL{Y || M} \right) \\ M &= \frac{X + Y}{2}\end{split}$

Mutual Information¶

$\JSD{X || Y} = \I{Z : M}$

where $$M$$ is the mixture distribution as before, and $$Z$$ is an indicator variable over $$X$$ and $$Y$$. In essence, if $$X$$ and $$Y$$ are each an urn containing colored balls, and I randomly selected one of the urns and draw a ball from it, then the Jensen-Shannon divergence is the mutual information between which urn I drew the ball from, and the color of the ball drawn.

API¶

jensen_shannon_divergence(*args, **kwargs)[source]

The Jensen-Shannon Divergence: H(sum(w_i*P_i)) - sum(w_i*H(P_i)).

The square root of the Jensen-Shannon divergence is a distance metric.

Parameters: dists ([Distribution]) – The distributions, P_i, to take the Jensen-Shannon Divergence of. weights ([float], None) – The weights, w_i, to give the distributions. If None, the weights are assumed to be uniform. jsd – The Jensen-Shannon Divergence float ditException – Raised if there dists and weights have unequal lengths. InvalidNormalization – Raised if the weights do not sum to unity. InvalidProbability – Raised if the weights are not valid probabilities.