Numpy-based ScalarDistribution

ScalarDistributions are used to represent distributions over real numbers, for example a six-sided die or the number of heads when flipping 100 coins.

Playing with ScalarDistributions

First we will enable two optional features: printing fractions by default, and using __str__() as __repr__(). Be careful using either of these options, they can incur significant performance hits on some distributions.

In [1]: In [1]: dit.ditParams['print.exact'] = dit.ditParams['repr.print'] = True

We next construct a six-sided die:

In [2]: In [2]: from dit.example_dists import uniform

We can perform standard mathematical operations with scalars, such as adding, subtracting from or by, multiplying, taking the modulo, or testing inequalities.

In [3]: In [5]: d6 + 3
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-e5c29798e750> in <module>
----> 1 d6 + 3

NameError: name 'd6' is not defined

In [4]: Class:    ScalarDistribution
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-4-caea23159169> in <module>
----> 1 Class:    ScalarDistribution

NameError: name 'ScalarDistribution' is not defined

In [5]: Alphabet: (4, 5, 6, 7, 8, 9)

In [6]: Base:     linear
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-6-871a8e26d081> in <module>
----> 1 Base:     linear

NameError: name 'linear' is not defined

In [7]: x   p(x)
   ...: 4   1/6
   ...: 5   1/6
   ...: 6   1/6
   ...: 7   1/6
   ...: 8   1/6
   ...: 9   1/6
   ...: 
  File "<ipython-input-7-a0614afb9279>", line 1
    x   p(x)
        ^
SyntaxError: invalid syntax

Furthermore, we can perform such operations with two distributions:

In [8]: In [11]: d6 + d6
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-8-fb98c4da9ae9> in <module>
----> 1 d6 + d6

NameError: name 'd6' is not defined

In [9]: Class:    ScalarDistribution
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-9-caea23159169> in <module>
----> 1 Class:    ScalarDistribution

NameError: name 'ScalarDistribution' is not defined

In [10]: Alphabet: (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)

In [11]: Base:     linear
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-11-871a8e26d081> in <module>
----> 1 Base:     linear

NameError: name 'linear' is not defined

In [12]: x    p(x)
   ....: 2    1/36
   ....: 3    1/18
   ....: 4    1/12
   ....: 5    1/9
   ....: 6    5/36
   ....: 7    1/6
   ....: 8    5/36
   ....: 9    1/9
   ....: 10   1/12
   ....: 11   1/18
   ....: 12   1/36
   ....: 
  File "<ipython-input-12-8e87dddfff4b>", line 1
    x    p(x)
         ^
SyntaxError: invalid syntax

There are also statistical functions which can be applied to ScalarDistributions:

In [13]: In [15]: from dit.algorithms.stats import *

API

ScalarDistribution.__init__(outcomes, pmf=None, sample_space=None, base=None, prng=None, sort=True, sparse=True, trim=True, validate=True)[source]

Initialize the distribution.

Parameters
  • outcomes (sequence, dict) – The outcomes of the distribution. If outcomes is a dictionary, then the keys are used as outcomes, and the values of the dictionary are used as pmf instead. Note: an outcome is any hashable object (except None) which is equality comparable. If sort is True, then outcomes must also be orderable.

  • pmf (sequence) – The outcome probabilities or log probabilities. If None, then outcomes is treated as the probability mass function and the outcomes are consecutive integers beginning from zero.

  • sample_space (sequence) – A sequence representing the sample space, and corresponding to the complete set of possible outcomes. The order of the sample space is important. If None, then the outcomes are used to determine the sample space instead.

  • base (float, None) – If pmf specifies log probabilities, then base should specify the base of the logarithm. If ‘linear’, then pmf is assumed to represent linear probabilities. If None, then the value for base is taken from ditParams[‘base’].

  • prng (RandomState) – A pseudo-random number generator with a rand method which can generate random numbers. For now, this is assumed to be something with an API compatible to NumPy’s RandomState class. This attribute is initialized to equal dit.math.prng.

  • sort (bool) – If True, then the sample space is sorted before finalizing it. Usually, this is desirable, as it normalizes the behavior of distributions which have the same sample space (when considered as a set). Note that addition and multiplication of distributions is defined only if the sample spaces (as tuples) are equal.

  • sparse (bool) – Specifies the form of the pmf. If True, then outcomes and pmf will only contain entries for non-null outcomes and probabilities, after initialization. The order of these entries will always obey the order of sample_space, even if their number is not equal to the size of the sample space. If False, then the pmf will be dense and every outcome in the sample space will be represented.

  • trim (bool) – Specifies if null-outcomes should be removed from pmf when make_sparse() is called (assuming sparse is True) during initialization.

  • validate (bool) – If True, then validate the distribution. If False, then assume the distribution is valid, and perform no checks.

Raises

InvalidDistribution – If the length of values and outcomes are unequal.

:raises See validate() for a list of other potential exceptions.: