Earth Mover’s Distance

The Earth mover’s distance is a distance measure between probability distributions. If we consider each probability mass function as a histogram of dirt, it is equal to the amount of work needed to optimally move the dirt of one histogram into the shape of the other.

For categorical data, the “distance” between unequal symbols is unitary. In this case, \(1/6\) of the probability in symbol ‘0’ needs to be moved to ‘1’, and \(1/6\) needs to be moved to ‘2’, for a total of \(1/3\):

In [1]: from dit.divergences import earth_movers_distance

In [2]: d1 = dit.ScalarDistribution([0, 1, 2], [2/3, 1/6, 1/6])

In [3]: d2 = dit.ScalarDistribution([0, 1, 2], [1/3, 1/3, 1/3])

In [4]: earth_movers_distance(d1, d2)
Out[4]: 0.5

API

earth_movers_distance(dist1, dist2, distances=None)[source]

Compute the Earth Mover’s Distance (EMD) between dist1 and dist2. The EMD is the least amount of “probability mass flow” that must occur to transform dist1 to dist2.

Parameters:
  • dist1 (Distribution) – The first distribution.
  • dist2 (Distribution) – The second distribution.
  • distances (np.ndarray, None) – A matrix of distances between outcomes of the distributions. If None, a distance matrix is constructed; if the distributions are categorical each non-equal event is considered at unit distance, and if numerical abs(x, y) is used as the distance.
Returns:

emd – The Earth Mover’s Distance.

Return type:

float