Scalar Distributions

Distributions are used to represent distributions over real numbers, for example a six-sided die or the number of heads when flipping 100 coins.

Playing with Distributions

First we will enable two optional features: printing fractions by default, and using __str__() as __repr__(). Be careful using either of these options, they can incur significant performance hits on some distributions.

In [1]: dit.ditParams['print.exact'] = dit.ditParams['repr.print'] = True

We next construct a six-sided die:

In [2]: from dit.example_dists import uniform

In [3]: d6 = uniform(1, 7)

In [4]: d6
Out[4]: 
Class:    Distribution
Alphabet: (1, 2, 3, 4, 5, 6)
Base:     linear

x   p(x)
1   1/6
2   1/6
3   1/6
4   1/6
5   1/6
6   1/6

We can perform standard mathematical operations with scalars, such as adding, subtracting from or by, multiplying, taking the modulo, or testing inequalities.

In [5]: d6 + 3
Out[5]: 
Class:    Distribution
Alphabet: (4, 5, 6, 7, 8, 9)
Base:     linear

x   p(x)
4   1/6
5   1/6
6   1/6
7   1/6
8   1/6
9   1/6

In [6]: d6 - 1
Out[6]: 
Class:    Distribution
Alphabet: (0, 1, 2, 3, 4, 5)
Base:     linear

x   p(x)
0   1/6
1   1/6
2   1/6
3   1/6
4   1/6
5   1/6

In [7]: 10 - d6
Out[7]: 
Class:    Distribution
Alphabet: (4, 5, 6, 7, 8, 9)
Base:     linear

x   p(x)
4   1/6
5   1/6
6   1/6
7   1/6
8   1/6
9   1/6

In [8]: 2 * d6
Out[8]: 
Class:    Distribution
Alphabet: (2, 4, 6, 8, 10, 12)
Base:     linear

x    p(x)
2    1/6
4    1/6
6    1/6
8    1/6
10   1/6
12   1/6

In [9]: d6 % 2
Out[9]: 
Class:    Distribution
Alphabet: (0, 1)
Base:     linear

x   p(x)
0   1/2
1   1/2

In [10]: (d6 % 2).is_approx_equal(d6 <= 3)
Out[10]: True

Furthermore, we can perform such operations with two distributions:

In [11]: d6 + d6
Out[11]: 
Class:    Distribution
Alphabet: (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
Base:     linear

x    p(x)
2    1/36
3    1/18
4    1/12
5    1/9
6    5/36
7    1/6
8    5/36
9    1/9
10   1/12
11   1/18
12   1/36

In [12]: (d6 + d6) % 4
Out[12]: 
Class:    Distribution
Alphabet: (0, 1, 2, 3)
Base:     linear

x   p(x)
0   1/4
1   2/9
2   1/4
3   5/18

In [13]: d6 // d6
Out[13]: 
Class:    Distribution
Alphabet: (0, 1, 2, 3, 4, 5, 6)
Base:     linear

x   p(x)
0   5/12
1   1/3
2   1/9
3   1/18
4   1/36
5   1/36
6   1/36

In [14]:  d6 % (d6 % 2 + 1)
Out[14]: 
Class:    Distribution
Alphabet: (0, 1)
Base:     linear

x   p(x)
0   3/4
1   1/4

There are also statistical functions which can be applied to Distributions:

In [15]: from dit.algorithms.stats import *

In [16]: median(d6+d6)
Out[16]: 7.0

In [17]: from dit.example_dists import binomial

In [18]: d = binomial(10, 1/3)

In [19]: d
Out[19]: 
Class:    Distribution
Alphabet: (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
Base:     linear

x    p(x)
0    409/23585
1    4302/49615
2    1280/6561
3    5120/19683
4    4480/19683
5    896/6561
6    1120/19683
7    320/19683
8    20/6561
9    9/26572
10   1/59046

In [20]: mean(d)
Out[20]: 3.3333333333333335

In [21]: median(d)
Out[21]: 3.0

In [22]: standard_deviation(d)
Out[22]: 1.4907119849998596

API

Distribution.__init__(data, pmf=None, rv_names=None, free_vars=None, given_vars=None, base='linear', sample_space=None, sparse=True, trim=True, sort=True, validate=True, prng=None)[source]

Initialize an Distribution.

There are three construction modes:

DataArray – pass an xr.DataArray directly (original API).
Outcomes + pmf – pass a sequence of outcomes and a sequence of probabilities, matching the dit.Distribution signature.
Dict – pass a dict mapping outcomes to probabilities.

Parameters:

data (xr.DataArray, sequence, or dict) – If an xr.DataArray, used directly as the probability data. If a dict, keys are outcomes and values are probabilities. Otherwise, treated as a sequence of outcomes (each outcome is an indexable container whose length equals the number of random variables).
pmf (sequence of float, optional) – Probability values corresponding to data when data is a sequence of outcomes. Ignored when data is a DataArray or dict.
rv_names (list of str, optional) – Names for each random variable. Only used when data is outcomes or a dict. Defaults to 'X0', 'X1', …
free_vars (set-like of str, optional) – Names of the free (joint) variables. If both free_vars and given_vars are None, all dimensions are treated as free.
given_vars (set-like of str, optional) – Names of the conditioned variables.
base (str, float, or None) – The probability base. 'linear' (default) for raw probabilities, 2, 'e', or any positive float for log probabilities. If None, auto-detected (linear if the pmf sums to ~1, else ditParams['base']).
sample_space (sequence or CartesianProduct, optional) – Explicit sample space. If provided, used to determine the full set of possible outcomes.
sparse (bool) – If True, outcomes and pmf only report non-zero entries.
trim (bool) – Ignored (kept for API compatibility).
sort (bool) – Ignored (alphabets are always sorted).
validate (bool) – If True, validate normalisation after construction.
prng (random state, optional) – Pseudo-random number generator. Defaults to dit.math.prng.

Examples

From outcomes and pmf (like dit.Distribution):

>>> xrd = Distribution(['00','01','10','11'],
...                      [.25, .25, .25, .25],
...                      rv_names=['X', 'Y'])

From a dict:

>>> xrd = Distribution({'00': .5, '11': .5}, rv_names=['X', 'Y'])

From a DataArray (original API):

>>> xrd = Distribution(my_dataarray, free_vars={'X', 'Y'})