particles.datasets¶

Where datasets live.

This module gives access to several useful datasets. A dataset is represented as a class that inherits from base class Dataset. When instantiating such a class, you get an object with attributes:

raw_data: data in the original file;
data : data obtained after a pre-processing step was applied to the raw data.

The pre-processing step is performed by method preprocess of the class. For instance, for a regression dataset, the pre-processing steps normalises the predictors and adds an intercept. The pre-processing step of base class Dataset does nothing (raw_data and data point to the same object).

Here a quick example:

from particles import datasets as dts

dataset = dts.Pima()
help(dataset)  # basic info on dataset
help(dataset.preprocess)  # info on how data was pre-processed
data = dataset.data  # typically a numpy array

And here is a table of the available datasets; see the documentation of each sub-class for more details on the preprocessing step.

Dataset	parent class	typical use/model
`Boston`	`RegressionDataset`	regression
`Eeg`	`BinaryRegDataset`	binary regression
`GBP_vs_USD_9798`	`LogReturnsDataset`	stochastic volatility
`Liver`	`BinaryRegDataset`	binary regression
`Nutria`	`Dataset`	population ecology
`Pima`	`BinaryRegDataset`	binary regression
`Sonar`	`BinaryRegDataset`	binary regression
`Neuro`	`Dataset`	neuroscience ssm

See also utility function prepare_predictors, which prepares (rescales, adds an intercept) predictors/features for a regression or classification task.

Functions

`get_path`(file_name)
`prepare_predictors`(predictors[, ...])	Rescale predictors and (optionally) add an intercept.

Classes

`BinaryRegDataset`(**kwargs)	Binary regression (classification) dataset.
`Boston`(**kwargs)	Boston house-price data of Harrison et al (1978).
`Concrete`(**kwargs)	Concrete compressive strength data of Yeh (1998).
`Dataset`(**kwargs)	Base class for datasets.
`Eeg`(**kwargs)	EEG dataset from UCI repository.
`GBP_vs_USD_9798`(**kwargs)	GBP vs USD daily rates in 1997-98.
`Liver`(**kwargs)	Indian liver patient dataset (ILPD).
`LogReturnsDataset`(**kwargs)	Log returns dataset.
`Neuro`(**kwargs)	Neuroscience experiment data from Temereanca et al (2008).
`Nutria`(**kwargs)	Nutria dataset.
`Pima`(**kwargs)	Pima Indians Diabetes.
`RegressionDataset`(**kwargs)	Regression dataset.
`Sonar`(**kwargs)	Sonar dataset from UCI repository.

particles.datasets¶

Previous topic

Next topic

This Page