particles.datasets¶
Where datasets live.
This module gives access to several useful datasets. A dataset is represented
as a class that inherits from base class Dataset
. When instantiating such a
class, you get an object with attributes:
raw_data
: data in the original file;data
: data obtained after a pre-processing step was applied to the raw data.
The pre-processing step is performed by method preprocess
of the class. For
instance, for a regression dataset, the pre-processing steps normalises the
predictors and adds an intercept. The pre-processing step of base class
Dataset
does nothing (raw_data
and data
point to the same object).
Here a quick example:
from particles import datasets as dts
dataset = dts.Pima()
help(dataset) # basic info on dataset
help(dataset.preprocess) # info on how data was pre-processed
data = dataset.data # typically a numpy array
And here is a table of the available datasets; see the documentation of each sub-class for more details on the preprocessing step.
Dataset |
parent class |
typical use/model |
---|---|---|
|
|
regression |
|
|
binary regression |
|
|
stochastic volatility |
|
|
binary regression |
|
|
population ecology |
|
|
binary regression |
|
|
binary regression |
|
|
neuroscience ssm |
See also utility function prepare_predictors
, which prepares (rescales,
adds an intercept) predictors/features for a regression or classification task.
Functions
|
|
|
Rescale predictors and (optionally) add an intercept. |
Classes
|
Binary regression (classification) dataset. |
|
Boston house-price data of Harrison et al (1978). |
|
Concrete compressive strength data of Yeh (1998). |
|
Base class for datasets. |
|
EEG dataset from UCI repository. |
|
GBP vs USD daily rates in 1997-98. |
|
Indian liver patient dataset (ILPD). |
|
Log returns dataset. |
|
Neuroscience experiment data from Temereanca et al (2008). |
|
Nutria dataset. |
|
Pima Indians Diabetes. |
|
Regression dataset. |
|
Sonar dataset from UCI repository. |