Cross Entropy

The cross entropy between two distributions \(p(x)\) and \(q(x)\) is given by:

\[\xH{p || q} = -\sum_{x \in \mathcal{X}} p(x) \log_2 q(x)\]

This quantifies the average cost of representing a distribution defined by the probabilities \(p(x)\) using the probabilities \(q(x)\). For example, the cross entropy of a distribution with itself is the entropy of that distribion because the entropy quantifies the average cost of representing a distribution:

In [1]: In [1]: from dit.divergences import cross_entropy

If, however, we attempted to model a fair coin with a biased on, we could compute this mis-match with the cross entropy:

In [2]: In [4]: q = dit.Distribution(['0', '1'], [3/4, 1/4])

In [3]: In [5]: cross_entropy(p, q)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-135b3ec6acbf> in <module>
----> 1 cross_entropy(p, q)

NameError: name 'p' is not defined

In [4]: Out[5]: 1.207518749639422

Meaning, we will on average use about \(1.2\) bits to represent the flips of a fair coin. Turning things around, what if we had a biased coin that we attempted to represent with a fair coin:

In [5]: In [6]: cross_entropy(q, p)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-5-ecb6c3af528a> in <module>
----> 1 cross_entropy(q, p)

NameError: name 'p' is not defined

In [6]: Out[6]: 1.0

So although the entropy of \(q\) is less than \(1\), we will use a full bit to represent its outcomes. Both of these results can easily be seen by considering the following identity:

\[\xH{p || q} = \H{p} + \DKL{p || q}\]

So in representing \(p\) using \(q\), we of course must at least use \(\H{p}\) bits – the minimum required to represent \(p\) – plus the Kullback-Leibler divergence of \(q\) from \(p\).

API

cross_entropy(dist1, dist2, rvs=None, crvs=None, rv_mode=None)[source]

The cross entropy between dist1 and dist2.

Parameters
  • dist1 (Distribution) – The first distribution in the cross entropy.

  • dist2 (Distribution) – The second distribution in the cross entropy.

  • rvs (list, None) – The indexes of the random variable used to calculate the cross entropy between. If None, then the cross entropy is calculated over all random variables.

  • rv_mode (str, None) – Specifies how to interpret rvs and crvs. Valid options are: {‘indices’, ‘names’}. If equal to ‘indices’, then the elements of crvs and rvs are interpreted as random variable indices. If equal to ‘names’, the the elements are interpreted as random variable names. If None, then the value of dist._rv_mode is consulted, which defaults to ‘indices’.

Returns

xh – The cross entropy between dist1 and dist2.

Return type

float

Raises

ditException – Raised if either dist1 or dist2 doesn’t have rvs or, if rvs is None, if dist2 has an outcome length different than dist1.