particles.datasets

Where datasets live.

This module gives access to several useful datasets. A dataset is represented as a class that inherits from base class Dataset. When instantiating such a class, you get an object with attributes:

  • raw_data: data in the original file;

  • data : data obtained after a pre-processing step was applied to the raw data.

The pre-processing step is performed by method preprocess of the class. For instance, for a regression dataset, the pre-processing steps normalises the predictors and adds an intercept. The pre-processing step of base class Dataset does nothing (raw_data and data point to the same object).

Here a quick example:

from particles import datasets as dts

dataset = dts.Pima()
help(dataset)  # basic info on dataset
help(dataset.preprocess)  # info on how data was pre-processed
data = dataset.data  # typically a numpy array

And here is a table of the available datasets; see the documentation of each sub-class for more details on the preprocessing step.

Dataset

parent class

typical use/model

Boston

RegressionDataset

regression

Eeg

BinaryRegDataset

binary regression

GBP_vs_USD_9798

LogReturnsDataset

stochastic volatility

Liver

BinaryRegDataset

binary regression

Nutria

Dataset

population ecology

Pima

BinaryRegDataset

binary regression

Sonar

BinaryRegDataset

binary regression

Neuro

Dataset

neuroscience ssm

See also utility function prepare_predictors, which prepares (rescales, adds an intercept) predictors/features for a regression or classification task.

Functions

get_path(file_name)

prepare_predictors(predictors[, ...])

Rescale predictors and (optionally) add an intercept.

Classes

BinaryRegDataset(**kwargs)

Binary regression (classification) dataset.

Boston(**kwargs)

Boston house-price data of Harrison et al (1978).

Concrete(**kwargs)

Concrete compressive strength data of Yeh (1998).

Dataset(**kwargs)

Base class for datasets.

Eeg(**kwargs)

EEG dataset from UCI repository.

GBP_vs_USD_9798(**kwargs)

GBP vs USD daily rates in 1997-98.

Liver(**kwargs)

Indian liver patient dataset (ILPD).

LogReturnsDataset(**kwargs)

Log returns dataset.

Neuro(**kwargs)

Neuroscience experiment data from Temereanca et al (2008).

Nutria(**kwargs)

Nutria dataset.

Pima(**kwargs)

Pima Indians Diabetes.

RegressionDataset(**kwargs)

Regression dataset.

Sonar(**kwargs)

Sonar dataset from UCI repository.