Numpy-based Distribution

The primary method of constructing a distribution is by supplying both the outcomes and the probability mass function:

In [1]: from dit import Distribution

In [2]: outcomes = ['000', '011', '101', '110']

In [3]: pmf = [1/4]*4

In [4]: xor = Distribution(outcomes, pmf)

In [5]: print(xor)
Class:          Distribution
Alphabet:       ('0', '1') for all rvs
Base:           linear
Outcome Class:  str
Outcome Length: 3
RV Names:       None

x     p(x)
000   0.25
011   0.25
101   0.25
110   0.25

Another way to construct a distribution is by supplying a dictionary mapping outcomes to probabilities:

In [6]: outcomes_probs = {'000': 1/4, '011': 1/4, '101': 1/4, '110': 1/4}

In [7]: xor2 = Distribution(outcomes_probs)

In [8]: print(xor2)
Class:          Distribution
Alphabet:       ('0', '1') for all rvs
Base:           linear
Outcome Class:  str
Outcome Length: 3
RV Names:       None

x     p(x)
000   0.25
011   0.25
101   0.25
110   0.25

Yet a third method is via an ndarray:

In [9]: pmf = [[0.5, 0.25], [0.25, 0]]

In [10]: d = Distribution.from_ndarray(pmf)

In [11]: print(d)
Class:          Distribution
Alphabet:       (0, 1) for all rvs
Base:           linear
Outcome Class:  tuple
Outcome Length: 2
RV Names:       None

x        p(x)
(0, 0)   0.5
(0, 1)   0.25
(1, 0)   0.25
Distribution.__init__(outcomes, pmf=None, sample_space=None, base=None, prng=None, sort=True, sparse=True, trim=True, validate=True)[source]

Initialize the distribution.

Parameters:
  • outcomes (sequence, dict) – The outcomes of the distribution. If outcomes is a dictionary, then the keys are used as outcomes, and the values of the dictionary are used as pmf instead. The values will not be used if probabilities are passed in via pmf. Outcomes must be hashable, orderable, sized, iterable containers. The length of an outcome must be the same for all outcomes, and every outcome must be of the same type.
  • pmf (sequence, None) – The outcome probabilities or log probabilities. pmf can be None only if outcomes is a dict.
  • sample_space (sequence, CartesianProduct) – A sequence representing the sample space, and corresponding to the complete set of possible outcomes. The order of the sample space is important. If None, then the outcomes are used to determine a Cartesian product sample space instead.
  • base (float, str, None) – If pmf specifies log probabilities, then base should specify the base of the logarithm. If ‘linear’, then pmf is assumed to represent linear probabilities. If None, then the value for base is taken from ditParams[‘base’].
  • prng (RandomState) – A pseudo-random number generator with a rand method which can generate random numbers. For now, this is assumed to be something with an API compatibile to NumPy’s RandomState class. This attribute is initialized to equal dit.math.prng.
  • sort (bool) – If True, then each random variable’s alphabets are sorted before they are finalized. Usually, this is desirable, as it normalizes the behavior of distributions which have the same sample spaces (when considered as a set). Note that addition and multiplication of distributions is defined only if the sample spaces are compatible.
  • sparse (bool) – Specifies the form of the pmf. If True, then outcomes and pmf will only contain entries for non-null outcomes and probabilities, after initialization. The order of these entries will always obey the order of sample_space, even if their number is not equal to the size of the sample space. If False, then the pmf will be dense and every outcome in the sample space will be represented.
  • trim (bool) – Specifies if null-outcomes should be removed from pmf when make_sparse() is called (assuming sparse is True) during initialization.
  • validate (bool) – If True, then validate the distribution. If False, then assume the distribution is valid, and perform no checks.
Raises:
  • InvalidDistribution – If the length of values and outcomes are unequal. If no outcomes can be obtained from pmf and outcomes is None.
  • See validate() for a list of other potential exceptions.

To verify that these two distributions are the same, we can use the is_approx_equal method:

In [12]: xor.is_approx_equal(xor2)
Out[12]: True
Distribution.is_approx_equal(other, rtol=None, atol=None)

Returns True is other is approximately equal to this distribution.

For two distributions to be equal, they must have the same sample space and must also agree on the probabilities of each outcome.

Parameters:
  • other (distribution) – The distribution to compare against.
  • rtol (float) – The relative tolerance to use when comparing probabilities.
  • atol (float) – The absolute tolerance to use when comparing probabilities.

Notes

The distributions need not have the length, but they must have the same base.