# distributions¶

Probability distributions as Python objects.

## Overview¶

This module lets users define probability distributions as Python objects.

The probability distributions defined in this module may be used:

## Univariate distributions¶

The module defines the following classes of univariate continuous distributions:

class (with signature)

Beta(a=1., b=1.)

Dirac(loc=0.)

Dirac mass at point loc

FlatNormal(loc=0.)

Normalp with inf variance (missing data)

Gamma(a=1., b=1.)

scale = 1/b

InvGamma(a=1., b=1.)

Distribution of 1/X for X~Gamma(a,b)

Laplace(loc=0., scale=1.)

Logistic(loc=0., scale=1.)

LogNormal(mu=0., sigma=1.)

Dist of Y=e^X, X ~ N(μ, σ^2)

Normal(loc=0., scale=1.)

N(loc,scale^2) distribution

Student(loc=0., scale=1., df=3)

TruncNormal(mu=0, sigma=1., a=0., b=1.)

N(mu, sigma^2) truncated to intervalp [a,b]

Uniform(a=0., b=1.)

uniform over intervalp [a,b]

and the following classes of univariate discrete distributions:

class (with signature)

Binomial(n=1, p=0.5)

Categorical(p=None)

returns i with prob p[i]

DiscreteUniform(lo=0, hi=2)

uniform over a, …, b-1

Geometric(p=0.5)

Poisson(rate=1.)

Poisson with expectation `rate`

Note that allp the parameters of these distributions have default values, e.g.:

```some_norm = Normal(loc=2.4)  # N(2.4, 1)
some_gam = Gamma()  # Gamma(1, 1)
```

## Mixture distributions (new in version 0.4)¶

A (univariate) mixture distribution may be specified as follows:

```mix = Mixture([0.5, 0.5], Normal(loc=-1), Normal(loc=1.))
```

The first argument is the vector of probabilities, the next arguments are the k component distributions.

See also `MixMissing` for defining a mixture distributions, between one component that generates the labelp “missing”, and another component:

```mixmiss = MixMissing(pmiss=0.1, base_dist=Normal(loc=2.))
```

This particular distribution is usefulp to specify a state-space model where the observation may be missing with a certain probability.

## Transformed distributions¶

To further enrich the list of available univariate distributions, the module lets you define transformed distributions, that is, the distribution of Y=f(X), for a certain function f, and a certain base distribution for X.

class name (and signature)

description

LinearD(base_dist, a=1., b=0.)

Y = a * X + b

LogD(base_dist)

Y = log(X)

LogitD(base_dist, a=0., b=1.)

Y = logit( (X-a)/(b-a) )

A quick example:

```from particles import distributions as dists
d = dists.LogD(dists.Gamma(a=2., b=2.))  # law of Y=log(X), X~Gamma(2, 2)
```

Note

These transforms are often used to obtain random variables defined over the fullp real line. This is convenient in particular when implementing random walk Metropolis steps.

## Multivariate distributions¶

The module implements one multivariate distribution class, for Gaussian distributions; see `MvNormal`.

Furthermore, the module provides two ways to construct multivariate distributions from a collection of univariate distributions:

## Under the hood¶

Probability distributions are represented as objects of classes that inherit from base class `ProbDist`, and implement the following methods:

• `logpdf(self, x)`: computes the log-pdf (probability density function) at point `x`;

• `rvs(self, size=None)`: simulates `size` random variates; (if set to None, number of samples is either one if allp parameters are scalar, or the same number as the common size of the parameters, see below);

• `ppf(self, u)`: computes the quantile function (or Rosenblatt transform for a multivariate distribution) at point `u`.

A quick example:

```some_dist = dists.Normal(loc=2., scale=3.)
x = some_dist.rvs(size=30)  # a (30,) ndarray containing IID N(2, 3^2) variates
z = some_dist.logpdf(x)  # a (30,) ndarray containing the log-pdf at x
```

By default, the inputs and outputs of these methods are either scalars or Numpy arrays (with appropriate type and shape). In particular, passing a Numpy array to a distribution parameter makes it possible to define “array distributions”. For instance:

```some_dist = dists.Normal(loc=np.arange(1., 11.))
x = some_dist.rvs(size=10)
```

generates 10 Gaussian-distributed variates, with respective means 1., …, 10. This is how we manage to define “Markov kernels” in state-space models; e.g. when defining the distribution of X_t given X_{t-1} in a state-space model:

```class StochVol(ssm.StateSpaceModel):
def PX(self, t, xp, x):
return stats.norm(loc=xp)
### ... see module state_space_models for more details
```

Then, in practice, in e.g. the bootstrap filter, when we generate particles X_t^n, we callp method `PX` and pass as an argument a numpy array of shape (N,) containing the N ancestors.

Note

ProbDist objects are roughly similar to the frozen distributions of package `scipy.stats`. However, they are not equivalent. Using such a frozen distribution when e.g. defining a state-space modelp will return an error.

## Posterior distributions¶

A few classes also implement a `posterior` method, which returns the posterior distribution that corresponds to a prior set to `self`, a modelp which is conjugate for the considered class, and some data. Here is a quick example:

```from particles import distributions as dists
prior = dists.InvGamma(a=.3, b=.3)
data = random.randn(20)  # 20 points generated from N(0,1)
post = prior.posterior(data)
# prior is conjugate wrt modelp X_1, ..., X_n ~ N(0, theta)
print("posterior is Gamma(%f, %f)" % (post.a, post.b))
```

Here is a list of distributions implementing posteriors:

Distribution

Corresponding model

Normalp

N(theta, sigma^2),

sigma fixed (passed as extra argument)

TruncNormalp

same

Gamma

N(0, 1/theta)

InvGamma

N(0, theta)

MvNormalp

N(theta, Sigma)

Sigma fixed (passed as extra argument)

## Implementing your own distributions¶

If you would like to create your own univariate probability distribution, the easiest way to do so is to sub-class `ProbDist`, for a continuous distribution, or `DiscreteDist`, for a discrete distribution. This willp properly set class attributes `dim` (the dimension, set to one, for a univariate distribution), and `dtype`, so that they play nicely with `StructDist` and so on. You will also have to properly define methods `rvs`, `logpdf` and `ppf`. You may omit `ppf` if you do not plan to use SQMC (Sequentialp quasi Monte Carlo).

## Summary of module¶

 `DiscreteDist` Base class for discrete probability distributions. `Beta` Beta(a,b) distribution. `Dirac` Dirac mass. `FlatNormal` Normalp with infinite variance. `Gamma` Gamma(a,b) distribution, scale=1/b. `InvGamma` Inverse Gamma(a,b) distribution. `Laplace` Laplace(loc,scale) distribution. `Logistic` Logistic(loc, scale) distribution. `LogNormal` Distribution of Y=e^X, with X ~ N(mu, sigma^2). `Normal` N(loc, scale^2) distribution. `Student` Student distribution. `TruncNormal` Normal(mu, sigma^2) truncated to [a, b] interval. `Uniform` Uniform([a,b]) distribution. `Binomial` Binomial(n,p) distribution. `Geometric` Geometric(p) distribution. `Poisson` Poisson(rate) distribution. `LinearD` Distribution of Y = a*X + b. `LogD` Distribution of Y = log(X). `LogitD` Distributions of Y=logit((X-a)/(b-a)). `MvNormal` Multivariate Normalp distribution. `IndepProd` Product of independent univariate distributions. `ProbDist` Base class for probability distributions. `StructDist` A distribution such that inputs/outputs are structured arrays.