distributions

Probability distributions as Python objects.

Overview

This module lets users define probability distributions as Python objects.

The probability distributions defined in this module may be used:

  • to define state-space models (see module state_space_models);
  • to define a prior distribution, in order to perform parameter estimation (see modules smc_samplers and mcmc).

Univariate distributions

The module defines the following classes of univariate continuous distributions:

class (with signature) comments
Normal(loc=0., scale=1.) N(loc,scale^2) distribution
Logistic(loc=0., scale=1.)  
Laplace(loc=0., scale=1.)  
Beta(a=1., b=1.)  
Gamma(a=1., b=1.) scale = 1/b
InvGamma(a=1., b=1.) Distribution of 1/X for X~Gamma(a,b)
Uniform(a=0., b=1.) uniform over interval [a,b]
Student(loc=0., scale=1., df=3)  
TruncNormal(mu=0, sigma=1., a=0., b=1.) N(mu, sigma^2) truncated to interval [a,b]
Dirac(loc=0.) Dirac mass at point loc

and the following classes of univariate discrete distributions:

class (with signature) comments
Poisson(rate=1.) Poisson distribution, with expectation rate
Binomial(n=1, p=0.5)  
Geometric(p=0.5)  

Note that all the parameters of these distributions have default values, e.g.:

some_norm = Normal(loc=2.4)  # N(2.4, 1)
some_gam = Gamma()  # Gamma(1, 1)

Transformed distributions

To further enrich the list of available univariate distributions, the module lets you define transformed distributions, that is, the distribution of Y=f(X), for a certain function f, and a certain base distribution for X.

class name (and signature) description
   
LinearD(base_dist, a=1., b=0.) Y = a * X + b
LogD(base_dist) Y = log(X)
LogitD(base_dist, a=0., b=1.) Y = logit( (X-a)/(b-a) )

A quick example:

from particles import distributions as dists
d = dists.LogD(dists.Gamma(a=2., b=2.))  # law of Y=log(X), X~Gamma(2, 2)

Note

These transforms are often used to obtain random variables defined over the full real line. This is convenient in particular when implementing random walk Metropolis steps.

Multivariate distributions

The only standard multivariate distribution currently implemented is MvNormal, (multivariate Normal distribution).

However, the module provides two ways to construct multivariate distributions from a collection of univariate distributions:

  • IndepProd: product of independent distributions. May be used to define state-space models.
  • StructDist: distributions for named variables; may be used to specify prior distributions; see modules smc_samplers and mcmc (and the corresponding tutorials).

Under the hood

Probability distributions are represented as objects of classes that inherit from base class ProbDist, and implement the following methods:

  • logpdf(self, x): computes the log-pdf (probability density function) at point x;
  • rvs(self, size=None): simulates size random variates; (if set to None, number of samples is either one if all parameters are scalar, or the same number as the common size of the parameters, see below);
  • ppf(self, u): computes the quantile function (or Rosenblatt transform for a multivariate distribution) at point u.

A quick example:

some_dist = dists.Normal(loc=2., scale=3.)
x = some_dist.rvs(size=30)  # a (30,) ndarray containing IID N(2, 3^2) variates
z = some_dist.logpdf(x)  # a (30,) ndarray containing the log-pdf at x

By default, the inputs and outputs of these methods are either scalars or Numpy arrays (with appropriate type and shape). In particular, passing a Numpy array to a distribution parameter makes it possible to define “array distributions”. For instance:

some_dist = dists.Normal(loc=np.arange(1., 11.))
x = some_dist.rvs(size=10)

generates 10 Gaussian-distributed variates, with respective means 1., …, 10. This is how we manage to define “Markov kernels” in state-space models; e.g. when defining the distribution of X_t given X_{t-1} in a state-space model:

class StochVol(ssm.StateSpaceModel):
    def PX(self, t, xp, x):
        return stats.norm(loc=xp)
    ### ... see module state_space_models for more details

Then, in practice, in e.g. the bootstrap filter, when we generate particles X_t^n, we call method PX and pass as an argument a numpy array of shape (N,) containing the N ancestors.

Note

ProbDist objects are roughly similar to the frozen distributions of package :package:`scipy.stats`. However, they are not equivalent. Using such a frozen distribution when e.g. defining a state-space model will return an error.

Posterior distributions

A few classes also implement a posterior method, which returns the posterior distribution that corresponds to a prior set to self, a model which is conjugate for the considered class, and some data. Here is a quick example:

from particles import distributions as dists
prior = dists.InvGamma(a=.3, b=.3)
data = random.randn(20)  # 20 points generated from N(0,1)
post = prior.posterior(data)
# prior is conjugate wrt model X_1, ..., X_n ~ N(0, theta)
print("posterior is Gamma(%f, %f)" % (post.a, post.b))

Here is a list of distributions implementing posteriors:

Distribution Corresponding model comments
Normal N(theta, sigma^2), sigma fixed (passed as extra argument)
TruncNormal same  
Gamma N(0, 1/theta)  
InvGamma N(0, theta)  
MvNormal N(theta, Sigma) Sigma fixed (passed as extra argument)

Implementing your own distributions

If you would like to create your own univariate probability distribution, the easiest way to do so is to sub-class ProbDist, for a continuous distribution, or DiscreteDist, for a discrete distribution. This will properly set class attributes dim (the dimension, set to one, for a univariate distribution), and dtype, so that they play nicely with StructDist and so on. You will also have to properly define methods rvs, logpdf and ppf. You may omit ppf if you do not plan to use SQMC (Sequential quasi Monte Carlo).

Summary of module

IndepProd Product of independent univariate distributions.
StructDist A distribution such that inputs/outputs are structured arrays.
ProbDist Base class for probability distributions.
DiscreteDist Base class for discrete probability distributions.
Normal N(loc,scale^2) distribution.
Logistic Logistic(loc,scale) distribution.
Laplace Laplace(loc,scale) distribution.
Beta Beta(a,b) distribution.
Gamma Gamma(a,b) distribution, scale=1/b.
InvGamma Inverse Gamma(a,b) distribution.
Uniform Uniform([a,b]) distribution.
Student Student distribution.
TruncNormal Normal(mu, sigma^2) truncated to [a, b] interval.
Dirac Dirac mass.
Poisson Poisson(rate) distribution.
Binomial Binomial(n,p) distribution.
Geometric Geometric(p) distribution.
LinearD Distribution of Y = a*X + b.
LogD Distribution of Y = log(X).
LogitD Distributions of Y=logit((X-a)/(b-a)).
MvNormal Multivariate Normal distribution.