General Information¶

Documentation:

http://docs.dit.io

Downloads:

https://pypi.org/project/dit/

https://anaconda.org/conda-forge/dit

Dependencies:

Python 2.7, 3.3, 3.4, 3.5, or 3.6
boltons
contextlib2
debtcollector
networkx
numpy
prettytable
scipy
six

Optional Dependencies¶

colorama: colored column heads in PID indicating failure modes
cython: faster sampling from distributions
hypothesis: random sampling of distributions
matplotlib, python-ternary: plotting of various information-theoretic expansions
numdifftools: numerical evaluation of gradients and hessians during optimization
pint: add units to informational values
scikit-learn: faster nearest-neighbor lookups during entropy/mutual information estimation from samples

Mailing list:: None
Code and bug tracker:: https://github.com/dit/dit
License:: BSD 3-Clause, see LICENSE.txt for details.

Quickstart¶

The basic usage of dit corresponds to creating distributions, modifying them if need be, and then computing properties of those distributions. First, we import:

In [1]: In [1]: import dit

Suppose we have a really thick coin, one so thick that there is a reasonable chance of it landing on its edge. Here is how we might represent the coin in dit.

In [2]: In [2]: d = dit.Distribution(['H', 'T', 'E'], [.4, .4, .2])

In [3]: In [3]: print(d)
Class:          Distribution
Alphabet:       ('E', 'H', 'T') for all rvs
Base:           linear
Outcome Class:  str
Outcome Length: 1
RV Names:       None

x   p(x)
E   1/5
H   2/5
T   2/5

In [4]: Class:          Distribution

In [5]: Alphabet:       ('E', H', 'T') for all rvs
   ...: Base:           linear
   ...: Outcome Class:  str
   ...: Outcome Length: 1
   ...: RV Names:       None
   ...: 
  File "<ipython-input-5-765b249d398d>", line 1
    Alphabet:       ('E', H', 'T') for all rvs
                              ^
SyntaxError: invalid syntax

Calculate the probability of \(H\) and also of the combination: \(H~\mathbf{or}~T\).

In [6]: In [4]: d['H']
Out[6]: 0.4

In [7]: Out[4]: 0.4

In [8]: In [50]: d.event_probability(['H','T'])
Out[8]: 0.8

In [9]: Out[50]: 0.8

Calculate the Shannon entropy and extropy of the joint distribution.

In [10]: In [10]: dit.shannon.entropy(d)
Out[10]: 1.5219280948873621

In [11]: Out[10]: 1.5219280948873621

In [12]: In [11]: dit.other.extropy(d)
Out[12]: 1.1419011889093373

In [13]: Out[11]: 1.1419011889093373

Create a distribution representing the \(\mathbf{xor}\) logic function. Here, we have two inputs, \(X\) and \(Y\), and then an output \(Z = \mathbf{xor}(X,Y)\).

In [14]: In [6]: import dit.example_dists

Calculate the Shannon mutual informations \(\I[X:Z]\), \(\I[Y:Z]\), and \(\I[X,Y:Z]\).

In [15]: In [12]: dit.shannon.mutual_information(d, ['X'], ['Z'])
---------------------------------------------------------------------------
ditException                              Traceback (most recent call last)
<ipython-input-15-1c8c59aabbb1> in <module>
----> 1 dit.shannon.mutual_information(d, ['X'], ['Z'])

~/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python3.7/site-packages/dit/shannon/shannon.py in mutual_information(dist, rvs_X, rvs_Y, rv_mode)
    157 
    158     """
--> 159     H_X = entropy(dist, rvs_X, rv_mode=rv_mode)
    160     H_Y = entropy(dist, rvs_Y, rv_mode=rv_mode)
    161     # Make sure to union the indexes. This handles the case when X and Y

~/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python3.7/site-packages/dit/shannon/shannon.py in entropy(dist, rvs, rv_mode)
     72             rv_mode = RV_MODES.INDICES
     73 
---> 74         d = dist.marginal(rvs, rv_mode=rv_mode) # pylint: disable=no-member
     75     else:
     76         d = dist

~/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python3.7/site-packages/dit/npdist.py in marginal(self, rvs, rv_mode)
   1288         # We parse the rv_mode now, so that we can reassign their names
   1289         # after coalesce has finished.
-> 1290         rvs, indexes = parse_rvs(self, rvs, rv_mode, unique=True, sort=True)
   1291 
   1292         ## Eventually, add in a method specialized for dense distributions.

~/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python3.7/site-packages/dit/helpers.py in parse_rvs(dist, rvs, rv_mode, unique, sort)
    334         msg = '`rvs` contains invalid random variables, {0}, {1} {2}.'
    335         msg = msg.format(indexes, good_indexes, rv_mode)
--> 336         raise ditException(msg)
    337 
    338     # Sort the random variable names (or indexes) by their index.

ditException: `rvs` contains invalid random variables, ['X'], set() 0.

In [16]: Out[12]: 0.0

In [17]: In [13]: dit.shannon.mutual_information(d, ['Y'], ['Z'])
---------------------------------------------------------------------------
ditException                              Traceback (most recent call last)
<ipython-input-17-90efbc2156b7> in <module>
----> 1 dit.shannon.mutual_information(d, ['Y'], ['Z'])

~/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python3.7/site-packages/dit/shannon/shannon.py in mutual_information(dist, rvs_X, rvs_Y, rv_mode)
    157 
    158     """
--> 159     H_X = entropy(dist, rvs_X, rv_mode=rv_mode)
    160     H_Y = entropy(dist, rvs_Y, rv_mode=rv_mode)
    161     # Make sure to union the indexes. This handles the case when X and Y

~/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python3.7/site-packages/dit/shannon/shannon.py in entropy(dist, rvs, rv_mode)
     72             rv_mode = RV_MODES.INDICES
     73 
---> 74         d = dist.marginal(rvs, rv_mode=rv_mode) # pylint: disable=no-member
     75     else:
     76         d = dist

~/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python3.7/site-packages/dit/npdist.py in marginal(self, rvs, rv_mode)
   1288         # We parse the rv_mode now, so that we can reassign their names
   1289         # after coalesce has finished.
-> 1290         rvs, indexes = parse_rvs(self, rvs, rv_mode, unique=True, sort=True)
   1291 
   1292         ## Eventually, add in a method specialized for dense distributions.

~/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python3.7/site-packages/dit/helpers.py in parse_rvs(dist, rvs, rv_mode, unique, sort)
    334         msg = '`rvs` contains invalid random variables, {0}, {1} {2}.'
    335         msg = msg.format(indexes, good_indexes, rv_mode)
--> 336         raise ditException(msg)
    337 
    338     # Sort the random variable names (or indexes) by their index.

ditException: `rvs` contains invalid random variables, ['Y'], set() 0.

In [18]: Out[13]: 0.0

In [19]: In [14]: dit.shannon.mutual_information(d, ['X', 'Y'], ['Z'])
---------------------------------------------------------------------------
ditException                              Traceback (most recent call last)
<ipython-input-19-1af669dd1aec> in <module>
----> 1 dit.shannon.mutual_information(d, ['X', 'Y'], ['Z'])

~/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python3.7/site-packages/dit/shannon/shannon.py in mutual_information(dist, rvs_X, rvs_Y, rv_mode)
    157 
    158     """
--> 159     H_X = entropy(dist, rvs_X, rv_mode=rv_mode)
    160     H_Y = entropy(dist, rvs_Y, rv_mode=rv_mode)
    161     # Make sure to union the indexes. This handles the case when X and Y

~/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python3.7/site-packages/dit/shannon/shannon.py in entropy(dist, rvs, rv_mode)
     72             rv_mode = RV_MODES.INDICES
     73 
---> 74         d = dist.marginal(rvs, rv_mode=rv_mode) # pylint: disable=no-member
     75     else:
     76         d = dist

~/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python3.7/site-packages/dit/npdist.py in marginal(self, rvs, rv_mode)
   1288         # We parse the rv_mode now, so that we can reassign their names
   1289         # after coalesce has finished.
-> 1290         rvs, indexes = parse_rvs(self, rvs, rv_mode, unique=True, sort=True)
   1291 
   1292         ## Eventually, add in a method specialized for dense distributions.

~/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python3.7/site-packages/dit/helpers.py in parse_rvs(dist, rvs, rv_mode, unique, sort)
    334         msg = '`rvs` contains invalid random variables, {0}, {1} {2}.'
    335         msg = msg.format(indexes, good_indexes, rv_mode)
--> 336         raise ditException(msg)
    337 
    338     # Sort the random variable names (or indexes) by their index.

ditException: `rvs` contains invalid random variables, ['X', 'Y'], set() 0.

In [20]: Out[14]: 1.0

Calculate the marginal distribution \(P(X,Z)\). Then print its probabilities as fractions, showing the mask.

In [21]: In [15]: d2 = d.marginal(['X', 'Z'])
---------------------------------------------------------------------------
ditException                              Traceback (most recent call last)
<ipython-input-21-b067ba4a93be> in <module>
----> 1 d2 = d.marginal(['X', 'Z'])

~/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python3.7/site-packages/dit/npdist.py in marginal(self, rvs, rv_mode)
   1288         # We parse the rv_mode now, so that we can reassign their names
   1289         # after coalesce has finished.
-> 1290         rvs, indexes = parse_rvs(self, rvs, rv_mode, unique=True, sort=True)
   1291 
   1292         ## Eventually, add in a method specialized for dense distributions.

~/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python3.7/site-packages/dit/helpers.py in parse_rvs(dist, rvs, rv_mode, unique, sort)
    334         msg = '`rvs` contains invalid random variables, {0}, {1} {2}.'
    335         msg = msg.format(indexes, good_indexes, rv_mode)
--> 336         raise ditException(msg)
    337 
    338     # Sort the random variable names (or indexes) by their index.

ditException: `rvs` contains invalid random variables, ['X', 'Z'], set() 0.

In [22]: In [16]: print(d2.to_string(show_mask=True, exact=True))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-22-40352d6ba310> in <module>
----> 1 print(d2.to_string(show_mask=True, exact=True))

NameError: name 'd2' is not defined

In [23]: Class:          Distribution

In [24]: Alphabet:       ('0', '1') for all rvs
   ....: Base:           linear
   ....: Outcome Class:  str
   ....: Outcome Length: 2 (mask: 3)
   ....: RV Names:       ('X', 'Z')
   ....: 
  File "<ipython-input-24-6b5343e0ae87>", line 1
    Alphabet:       ('0', '1') for all rvs
                                 ^
SyntaxError: invalid syntax

Convert the distribution probabilities to log (base 3.5) probabilities, and access its probability mass function.

In [25]: In [17]: d2.set_base(3.5)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-25-a4c25fbf4cdd> in <module>
----> 1 d2.set_base(3.5)

NameError: name 'd2' is not defined

In [26]: In [18]: d2.pmf
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-26-1667a2505e35> in <module>
----> 1 d2.pmf

NameError: name 'd2' is not defined

In [27]: array([-1.10658951, -1.10658951, -1.10658951, -1.10658951])
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-27-bbf92d577a74> in <module>
----> 1 array([-1.10658951, -1.10658951, -1.10658951, -1.10658951])

NameError: name 'array' is not defined

Draw 5 random samples from this distribution.

In [28]: In [19]: d2.rand(5)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-28-6015593867b3> in <module>
----> 1 d2.rand(5)

NameError: name 'd2' is not defined

In [29]: Out[19]: ['01', '10', '00', '01', '00']

Enjoy!