Where datasets live.

This module gives access to several useful datasets. A dataset is represented as a class that inherits from base class Dataset. When instantiating such a class, you get an object with attributes:

  • raw_data: data in the original file;
  • data : data obtained after a pre-processing step was applied to the raw data.

The pre-processing step is performed by method preprocess of the class. For instance, for a regression dataset, the pre-processing steps normalises the predictors and adds an intercept. The pre-processing step of base class Dataset does nothing (raw_data and data point to the same object).

Here a quick example:

from particles import datasets as dts

dataset = dts.Pima()
help(dataset)  # basic info on dataset
help(dataset.preprocess)  # info on how data was pre-processed
data =  # typically a numpy array

And here is a table of the available datasets; see the documentation of each sub-class for more details on the preprocessing step.

Dataset parent class typical use/model
Boston RegressionDataset regression
Eeg BinaryRegDataset logistic regression
GBP_vs_USD_9798 LogReturnsDataset stochastic volatility
Nutria Dataset population ecology
Pima BinaryRegDataset logistic regression
Sonar BinaryRegDataset logistic regression
Neuro Dataset neuroscience ssm

See also utility function prepare_predictors, which prepares (rescales, adds an intercept) predictors/features for a regression or classification task.

Module summary

prepare_predictors Rescale predictors and (optionally) add an intercept.
Dataset Base class for datasets.
Nutria Nutria dataset.
Neuro Neuroscience experiment data from Temereanca et al (2008).
RegressionDataset Regression dataset.
Boston Boston house-price data of Harrison et al (1978).
BinaryRegDataset Binary regression (classification) dataset.
Eeg EEG dataset from UCI repository.
Pima Pima Indians Diabetes.
Sonar Sonar dataset from UCI repository.
LogReturnsDataset Log returns dataset.
GBP_vs_USD_9798 GBP vs USD daily rates in 1997-98.