Numpy-based ScalarDistribution

ScalarDistributions are used to represent distributions over real numbers, for example a six-sided die or the number of heads when flipping 100 coins.

Playing with ScalarDistributions

First we will enable two optional features: printing fractions by default, and using __str__() as __repr__(). Be careful using either of these options, they can incur significant performance hits on some distributions.

In [1]: dit.ditParams['print.exact'] = dit.ditParams['repr.print'] = True

We next construct a six-sided die:

In [2]: from dit.example_dists import uniform

In [3]: d6 = uniform(1, 7)

In [4]: d6
Out[4]: 
AttributeErrorTraceback (most recent call last)
/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/IPython/core/formatters.pyc in __call__(self, obj)
    336             method = get_real_method(obj, self.print_method)
    337             if method is not None:
--> 338                 return method()
    339             return None
    340         else:

/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/dit/distribution.pyc in _repr_html_(self)
    344             An HTML representation.
    345         """
--> 346         return self.to_html()
    347 
    348     def __reversed__(self):

/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/dit/distribution.pyc in to_html(self, digits, exact, tol)
    649         header = '<table border="1">{}</table>'.format(infos)
    650 
--> 651         rv_names = self.get_rv_names()
    652         if rv_names is None:
    653             rv_names = ["x[{}]".format(i) for i in range(self.outcome_length())]

AttributeError: 'ScalarDistribution' object has no attribute 'get_rv_names'

Class:    ScalarDistribution
Alphabet: (1, 2, 3, 4, 5, 6)
Base:     linear

x   p(x)
1   1/6
2   1/6
3   1/6
4   1/6
5   1/6
6   1/6

We can perform standard mathematical operations with scalars, such as adding, subtracting from or by, multiplying, taking the modulo, or testing inequalities.

In [5]: d6 + 3
Out[5]: 
AttributeErrorTraceback (most recent call last)
/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/IPython/core/formatters.pyc in __call__(self, obj)
    336             method = get_real_method(obj, self.print_method)
    337             if method is not None:
--> 338                 return method()
    339             return None
    340         else:

/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/dit/distribution.pyc in _repr_html_(self)
    344             An HTML representation.
    345         """
--> 346         return self.to_html()
    347 
    348     def __reversed__(self):

/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/dit/distribution.pyc in to_html(self, digits, exact, tol)
    649         header = '<table border="1">{}</table>'.format(infos)
    650 
--> 651         rv_names = self.get_rv_names()
    652         if rv_names is None:
    653             rv_names = ["x[{}]".format(i) for i in range(self.outcome_length())]

AttributeError: 'ScalarDistribution' object has no attribute 'get_rv_names'

Class:    ScalarDistribution
Alphabet: (4, 5, 6, 7, 8, 9)
Base:     linear

x   p(x)
4   1/6
5   1/6
6   1/6
7   1/6
8   1/6
9   1/6

In [6]: d6 - 1
Out[6]: 
AttributeErrorTraceback (most recent call last)
/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/IPython/core/formatters.pyc in __call__(self, obj)
    336             method = get_real_method(obj, self.print_method)
    337             if method is not None:
--> 338                 return method()
    339             return None
    340         else:

/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/dit/distribution.pyc in _repr_html_(self)
    344             An HTML representation.
    345         """
--> 346         return self.to_html()
    347 
    348     def __reversed__(self):

/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/dit/distribution.pyc in to_html(self, digits, exact, tol)
    649         header = '<table border="1">{}</table>'.format(infos)
    650 
--> 651         rv_names = self.get_rv_names()
    652         if rv_names is None:
    653             rv_names = ["x[{}]".format(i) for i in range(self.outcome_length())]

AttributeError: 'ScalarDistribution' object has no attribute 'get_rv_names'

Class:    ScalarDistribution
Alphabet: (0, 1, 2, 3, 4, 5)
Base:     linear

x   p(x)
0   1/6
1   1/6
2   1/6
3   1/6
4   1/6
5   1/6

In [7]: 10 - d6
Out[7]: 
AttributeErrorTraceback (most recent call last)
/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/IPython/core/formatters.pyc in __call__(self, obj)
    336             method = get_real_method(obj, self.print_method)
    337             if method is not None:
--> 338                 return method()
    339             return None
    340         else:

/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/dit/distribution.pyc in _repr_html_(self)
    344             An HTML representation.
    345         """
--> 346         return self.to_html()
    347 
    348     def __reversed__(self):

/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/dit/distribution.pyc in to_html(self, digits, exact, tol)
    649         header = '<table border="1">{}</table>'.format(infos)
    650 
--> 651         rv_names = self.get_rv_names()
    652         if rv_names is None:
    653             rv_names = ["x[{}]".format(i) for i in range(self.outcome_length())]

AttributeError: 'ScalarDistribution' object has no attribute 'get_rv_names'

Class:    ScalarDistribution
Alphabet: (4, 5, 6, 7, 8, 9)
Base:     linear

x   p(x)
4   1/6
5   1/6
6   1/6
7   1/6
8   1/6
9   1/6

In [8]: 2 * d6
Out[8]: 
AttributeErrorTraceback (most recent call last)
/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/IPython/core/formatters.pyc in __call__(self, obj)
    336             method = get_real_method(obj, self.print_method)
    337             if method is not None:
--> 338                 return method()
    339             return None
    340         else:

/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/dit/distribution.pyc in _repr_html_(self)
    344             An HTML representation.
    345         """
--> 346         return self.to_html()
    347 
    348     def __reversed__(self):

/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/dit/distribution.pyc in to_html(self, digits, exact, tol)
    649         header = '<table border="1">{}</table>'.format(infos)
    650 
--> 651         rv_names = self.get_rv_names()
    652         if rv_names is None:
    653             rv_names = ["x[{}]".format(i) for i in range(self.outcome_length())]

AttributeError: 'ScalarDistribution' object has no attribute 'get_rv_names'

Class:    ScalarDistribution
Alphabet: (2, 4, 6, 8, 10, 12)
Base:     linear

x    p(x)
2    1/6
4    1/6
6    1/6
8    1/6
10   1/6
12   1/6

In [9]: d6 % 2
Out[9]: 
AttributeErrorTraceback (most recent call last)
/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/IPython/core/formatters.pyc in __call__(self, obj)
    336             method = get_real_method(obj, self.print_method)
    337             if method is not None:
--> 338                 return method()
    339             return None
    340         else:

/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/dit/distribution.pyc in _repr_html_(self)
    344             An HTML representation.
    345         """
--> 346         return self.to_html()
    347 
    348     def __reversed__(self):

/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/dit/distribution.pyc in to_html(self, digits, exact, tol)
    649         header = '<table border="1">{}</table>'.format(infos)
    650 
--> 651         rv_names = self.get_rv_names()
    652         if rv_names is None:
    653             rv_names = ["x[{}]".format(i) for i in range(self.outcome_length())]

AttributeError: 'ScalarDistribution' object has no attribute 'get_rv_names'

Class:    ScalarDistribution
Alphabet: (0, 1)
Base:     linear

x   p(x)
0   1/2
1   1/2

In [10]: (d6 % 2).is_approx_equal(d6 <= 3)
Out[10]: True

Furthermore, we can perform such operations with two distributions:

In [11]: d6 + d6
Out[11]: 
AttributeErrorTraceback (most recent call last)
/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/IPython/core/formatters.pyc in __call__(self, obj)
    336             method = get_real_method(obj, self.print_method)
    337             if method is not None:
--> 338                 return method()
    339             return None
    340         else:

/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/dit/distribution.pyc in _repr_html_(self)
    344             An HTML representation.
    345         """
--> 346         return self.to_html()
    347 
    348     def __reversed__(self):

/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/dit/distribution.pyc in to_html(self, digits, exact, tol)
    649         header = '<table border="1">{}</table>'.format(infos)
    650 
--> 651         rv_names = self.get_rv_names()
    652         if rv_names is None:
    653             rv_names = ["x[{}]".format(i) for i in range(self.outcome_length())]

AttributeError: 'ScalarDistribution' object has no attribute 'get_rv_names'

Class:    ScalarDistribution
Alphabet: (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
Base:     linear

x    p(x)
2    1/36
3    1/18
4    1/12
5    1/9
6    5/36
7    1/6
8    5/36
9    1/9
10   1/12
11   1/18
12   1/36

In [12]: (d6 + d6) % 4
Out[12]: 
AttributeErrorTraceback (most recent call last)
/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/IPython/core/formatters.pyc in __call__(self, obj)
    336             method = get_real_method(obj, self.print_method)
    337             if method is not None:
--> 338                 return method()
    339             return None
    340         else:

/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/dit/distribution.pyc in _repr_html_(self)
    344             An HTML representation.
    345         """
--> 346         return self.to_html()
    347 
    348     def __reversed__(self):

/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/dit/distribution.pyc in to_html(self, digits, exact, tol)
    649         header = '<table border="1">{}</table>'.format(infos)
    650 
--> 651         rv_names = self.get_rv_names()
    652         if rv_names is None:
    653             rv_names = ["x[{}]".format(i) for i in range(self.outcome_length())]

AttributeError: 'ScalarDistribution' object has no attribute 'get_rv_names'

Class:    ScalarDistribution
Alphabet: (0, 1, 2, 3)
Base:     linear

x   p(x)
0   1/4
1   2/9
2   1/4
3   5/18

In [13]: d6 // d6
Out[13]: 
AttributeErrorTraceback (most recent call last)
/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/IPython/core/formatters.pyc in __call__(self, obj)
    336             method = get_real_method(obj, self.print_method)
    337             if method is not None:
--> 338                 return method()
    339             return None
    340         else:

/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/dit/distribution.pyc in _repr_html_(self)
    344             An HTML representation.
    345         """
--> 346         return self.to_html()
    347 
    348     def __reversed__(self):

/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/dit/distribution.pyc in to_html(self, digits, exact, tol)
    649         header = '<table border="1">{}</table>'.format(infos)
    650 
--> 651         rv_names = self.get_rv_names()
    652         if rv_names is None:
    653             rv_names = ["x[{}]".format(i) for i in range(self.outcome_length())]

AttributeError: 'ScalarDistribution' object has no attribute 'get_rv_names'

Class:    ScalarDistribution
Alphabet: (0, 1, 2, 3, 4, 5, 6)
Base:     linear

x   p(x)
0   5/12
1   1/3
2   1/9
3   1/18
4   1/36
5   1/36
6   1/36

In [14]:  d6 % (d6 % 2 + 1)
Out[14]: 
AttributeErrorTraceback (most recent call last)
/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/IPython/core/formatters.pyc in __call__(self, obj)
    336             method = get_real_method(obj, self.print_method)
    337             if method is not None:
--> 338                 return method()
    339             return None
    340         else:

/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/dit/distribution.pyc in _repr_html_(self)
    344             An HTML representation.
    345         """
--> 346         return self.to_html()
    347 
    348     def __reversed__(self):

/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/dit/distribution.pyc in to_html(self, digits, exact, tol)
    649         header = '<table border="1">{}</table>'.format(infos)
    650 
--> 651         rv_names = self.get_rv_names()
    652         if rv_names is None:
    653             rv_names = ["x[{}]".format(i) for i in range(self.outcome_length())]

AttributeError: 'ScalarDistribution' object has no attribute 'get_rv_names'

Class:    ScalarDistribution
Alphabet: (0, 1)
Base:     linear

x   p(x)
0   3/4
1   1/4

There are also statistical functions which can be applied to ScalarDistributions:

In [15]: from dit.algorithms.stats import *

In [16]: median(d6+d6)
Out[16]: 7.0

In [17]: from dit.example_dists import binomial

In [18]: d = binomial(10, 1/3)

In [19]: d
Out[19]: 
AttributeErrorTraceback (most recent call last)
/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/IPython/core/formatters.pyc in __call__(self, obj)
    336             method = get_real_method(obj, self.print_method)
    337             if method is not None:
--> 338                 return method()
    339             return None
    340         else:

/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/dit/distribution.pyc in _repr_html_(self)
    344             An HTML representation.
    345         """
--> 346         return self.to_html()
    347 
    348     def __reversed__(self):

/home/docs/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python2.7/site-packages/dit/distribution.pyc in to_html(self, digits, exact, tol)
    649         header = '<table border="1">{}</table>'.format(infos)
    650 
--> 651         rv_names = self.get_rv_names()
    652         if rv_names is None:
    653             rv_names = ["x[{}]".format(i) for i in range(self.outcome_length())]

AttributeError: 'ScalarDistribution' object has no attribute 'get_rv_names'

Class:    ScalarDistribution
Alphabet: (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
Base:     linear

x    p(x)
0    409/23585
1    4302/49615
2    1280/6561
3    5120/19683
4    4480/19683
5    896/6561
6    1120/19683
7    320/19683
8    20/6561
9    9/26572
10   1/59046

In [20]: mean(d)
Out[20]: 3.3333333333333335

In [21]: median(d)
Out[21]: 3.0

In [22]: standard_deviation(d)
Out[22]: 1.4907119849998596

API

ScalarDistribution.__init__(outcomes, pmf=None, sample_space=None, base=None, prng=None, sort=True, sparse=True, trim=True, validate=True)[source]

Initialize the distribution.

Parameters:
  • outcomes (sequence, dict) – The outcomes of the distribution. If outcomes is a dictionary, then the keys are used as outcomes, and the values of the dictionary are used as pmf instead. Note: an outcome is any hashable object (except None) which is equality comparable. If sort is True, then outcomes must also be orderable.
  • pmf (sequence) – The outcome probabilities or log probabilities. If None, then outcomes is treated as the probability mass function and the outcomes are consecutive integers beginning from zero.
  • sample_space (sequence) – A sequence representing the sample space, and corresponding to the complete set of possible outcomes. The order of the sample space is important. If None, then the outcomes are used to determine the sample space instead.
  • base (float, None) – If pmf specifies log probabilities, then base should specify the base of the logarithm. If ‘linear’, then pmf is assumed to represent linear probabilities. If None, then the value for base is taken from ditParams[‘base’].
  • prng (RandomState) – A pseudo-random number generator with a rand method which can generate random numbers. For now, this is assumed to be something with an API compatible to NumPy’s RandomState class. This attribute is initialized to equal dit.math.prng.
  • sort (bool) – If True, then the sample space is sorted before finalizing it. Usually, this is desirable, as it normalizes the behavior of distributions which have the same sample space (when considered as a set). Note that addition and multiplication of distributions is defined only if the sample spaces (as tuples) are equal.
  • sparse (bool) – Specifies the form of the pmf. If True, then outcomes and pmf will only contain entries for non-null outcomes and probabilities, after initialization. The order of these entries will always obey the order of sample_space, even if their number is not equal to the size of the sample space. If False, then the pmf will be dense and every outcome in the sample space will be represented.
  • trim (bool) – Specifies if null-outcomes should be removed from pmf when make_sparse() is called (assuming sparse is True) during initialization.
  • validate (bool) – If True, then validate the distribution. If False, then assume the distribution is valid, and perform no checks.
Raises:
  • InvalidDistribution – If the length of values and outcomes are unequal.
  • See validate() for a list of other potential exceptions.