datasets¶
Where datasets live.
This module gives access to several useful datasets. A dataset is represented
as a class that inherits from base class Dataset
. When instantiating such a
class, you get an object with attributes:
raw_data
: data in the original file;data
: data obtained after a pre-processing step was applied to the raw data.
The pre-processing step is performed by method preprocess
of the class. For
instance, for a regression dataset, the pre-processing steps normalises the
predictors and adds an intercept. The pre-processing step of base class
Dataset
does nothing (raw_data
and data
point to the same object).
Here a quick example:
from particles import datasets as dts
dataset = dts.Pima()
help(dataset) # basic info on dataset
help(dataset.preprocess) # info on how data was pre-processed
data = dataset.data # typically a numpy array
And here is a table of the available datasets; see the documentation of each sub-class for more details on the preprocessing step.
Dataset | parent class | typical use/model |
---|---|---|
Boston |
RegressionDataset |
regression |
Eeg |
BinaryRegDataset |
logistic regression |
GBP_vs_USD_9798 |
LogReturnsDataset |
stochastic volatility |
Nutria |
Dataset |
population ecology |
Pima |
BinaryRegDataset |
logistic regression |
Sonar |
BinaryRegDataset |
logistic regression |
Neuro |
Dataset |
neuroscience ssm |
See also utility function prepare_predictors
, which prepares (rescales,
adds an intercept) predictors/features for a regression or classification task.
Module summary¶
prepare_predictors |
Rescale predictors and (optionally) add an intercept. |
Dataset |
Base class for datasets. |
Nutria |
Nutria dataset. |
Neuro |
Neuroscience experiment data from Temereanca et al (2008). |
RegressionDataset |
Regression dataset. |
Boston |
Boston house-price data of Harrison et al (1978). |
BinaryRegDataset |
Binary regression (classification) dataset. |
Eeg |
EEG dataset from UCI repository. |
Pima |
Pima Indians Diabetes. |
Sonar |
Sonar dataset from UCI repository. |
LogReturnsDataset |
Log returns dataset. |
GBP_vs_USD_9798 |
GBP vs USD daily rates in 1997-98. |