# Notation¶

`dit`

is a scientific tool, and so, much of this documentation will contain mathematical expressions. Here we will describe this notation.

## Basic Notation¶

A random variable \(X\) consists of *outcomes* \(x\) from an *alphabet* \(\mathcal{X}\). As such, we write the entropy of a distribution as \(\H{X} = \sum_{x \in \mathcal{X}} p(x) \log_2 p(x)\), where \(p(x)\) denote the probability of the outcome \(x\) occuring.

Many distributions are *joint* distribution. In the absence of variable names, we index each random variable with a subscript. For example, a distribution over three variables is written \(X_0X_1X_2\). As a shorthand, we also denote those random variables as \(X_{0:3}\), meaning start with \(X_0\) and go through, but not including \(X_3\) — just like python slice notation.

If a set of variables \(X_{0:n}\) are independent, we will write \(\ind X_{0:n}\). If a set of variables \(X_{0:n}\) are independent conditioned on \(V\), we write \(\ind X_{0:n} \mid V\).

If we ever need to describe an infinitely long chain of variables we drop the index from the side that is infinite. So \(X_{:0} = \ldots X_{-3}X_{-2}X_{-1}\) and \(X_{0:} = X_0X_1X_2\ldots\). For an arbitrary set of indices \(A\), the corresponding collection of random variables is denoted \(X_A\). For example, if \(A = \{0,2,4\}\), then \(X_A = X_0 X_2 X_4\). The complement of \(A\) (with respect to some universal set) is denoted \(\overline{A}\).

Furthermore, we define \(0 \log_2 0 = 0\).

## Advanced Notation¶

When there exists a function \(Y = f(X)\) we write \(X \imore Y\) meaning that \(X\) is *informationally richer* than \(Y\). Similarly, if \(f(Y) = X\) then we write \(X \iless Y\) and say that \(X\) is *informationally poorer* than \(Y\). If \(X \iless Y\) and \(X \imore Y\) then we write \(X \ieq Y\) and say that \(X\) is *informationally equivalent* to \(Y\). Of all the variables that are poorer than both \(X\) and \(Y\), there is a richest one. This variable is known as the *meet* of \(X\) and \(Y\) and is denoted \(X \meet Y\). By definition, \(\forall Z s.t. Z \iless X\) and \(Z \iless Y, Z \iless X \meet Y\). Similarly of all variables richer than both \(X\) and \(Y\), there is a poorest. This variable is known as the *join* of \(X\) and \(Y\) and is denoted \(X \join Y\). The joint random variable \((X,Y)\) and the join are informationally equivalent: \((X,Y) \ieq X \join Y\).

Lastly, we use \(X \mss Y\) to denote the minimal sufficient statistic of \(X\) about the random variable \(Y\).