skmultiflow.core.Pipeline¶

class skmultiflow.core.Pipeline(steps)[source]¶

[Experimental] Holds a set of sequential operation (transforms), followed by a single estimator.

It allows for easy manipulation of datasets that may require several transformation processes before being used by a learner. Also allows for the cross-validation of several steps.

Each of the intermediate steps should be an extension of the BaseTransform class, or at least implement the transform and partial_fit functions or the partial_fit_transform.

The last step should be an estimator (learner), so it should implement partial_fit, and predict at least.

Since it has an estimator as the last step, the Pipeline will act like an estimator itself, in a way that it can be directly passed to evaluation objects, as if it was a learner.

Parameters

steps: list of tuple: Tuple list containing the set of transforms and the final estimator. It doesn’t need to contain a transform type object, but the estimator is required. Each tuple should be of the format (‘name’, estimator).

Raises

TypeError: If the intermediate steps or the final estimator do not implement
the necessary functions for the pipeline to work, a TypeError is raised.
NotImplementedError: Some of the functions are yet to be implemented.

Notes

This code is an experimental feature. Use with caution.

Examples

>>> # Imports
>>> from skmultiflow.lazy import KNNADWINClassifier
>>> from skmultiflow.core import Pipeline
>>> from skmultiflow.data import FileStream
>>> from skmultiflow.evaluation import EvaluatePrequential
>>> from skmultiflow.transform import OneHotToCategorical
>>> # Setting up the stream
>>> stream = FileStream("https://raw.githubusercontent.com/scikit-multiflow/"
...                     "streaming-datasets/master/covtype.csv")
>>> transform = OneHotToCategorical([[10, 11, 12, 13],
... [14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
... 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53]])
>>> # Setting up the classifier
>>> classifier = KNNADWINClassifier(n_neighbors=8, max_window_size=2000, leaf_size=40)
>>> # Setup the pipeline
>>> pipe = Pipeline([('transform', transform), ('passive_aggressive', classifier)])
>>> # Setup the evaluator
>>> evaluator = EvaluatePrequential(show_plot=True, pretrain_size=1000, max_samples=500000)
>>> # Evaluate
>>> evaluator.evaluate(stream=stream, model=pipe)

Methods

`fit`(self, X, y)	Sequentially fit and transform data in all but last step, then fit the model in last step.
`get_info`(self)	Collects and returns the information about the configuration of the estimator
`get_params`(self[, deep])	Get parameters for this estimator.
`named_steps`(self)	Generates a dictionary to access all the steps’ properties.
`partial_fit`(self, X, y[, classes])	Sequentially partial fit and transform data in all but last step, then partial fit data in last step.
`partial_fit_predict`(self, X, y)	Partial fits and transforms data in all but last step, then partial fits and predicts in the last step
`partial_fit_transform`(self, X[, y])	Partial fits and transforms data in all but last step, then partial_fit in last step
`predict`(self, X)	Sequentially applies all transforms and then predict with last step.
`reset`(self)	Resets the estimator to its initial state.
`set_params`(self, **params)	Set the parameters of this estimator.

fit(self, X, y)[source]¶

Sequentially fit and transform data in all but last step, then fit the model in last step.

Parameters

X: numpy.ndarray of shape (n_samples, n_features): The data upon which the transforms/estimator will create their model.
y: An array_like object of length n_samples: Contains the true class labels for all the samples in X.

Returns

Pipeline: self

get_info(self)[source]¶

Collects and returns the information about the configuration of the estimator

Returns

string: Configuration of the estimator.

get_params(self, deep=True)[source]¶

Get parameters for this estimator.

Parameters

deepboolean, optional: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsmapping of string to any: Parameter names mapped to their values.

named_steps(self)[source]¶

Generates a dictionary to access all the steps’ properties.

Returns

dictionary: A steps dictionary, so that each step can be accessed by name.

partial_fit(self, X, y, classes=None)[source]¶

Sequentially partial fit and transform data in all but last step, then partial fit data in last step.

Parameters

Xnumpy.ndarray of shape (n_samples, n_features): The features to train the model.
y: numpy.ndarray of shape (n_samples): An array-like with the class labels of all samples in X.
classes: numpy.ndarray: Array with all possible/known class labels. This is an optional parameter, except for the first partial_fit call where it is compulsory.

Returns

Pipeline: self

partial_fit_predict(self, X, y)[source]¶

Partial fits and transforms data in all but last step, then partial fits and predicts in the last step

Parameters

X: numpy.ndarray of shape (n_samples, n_features): All the samples we want to predict the label for.
y: An array_like object of length n_samples: Contains the true class labels for all the samples in X

Returns

list: The predicted class label for all the samples in X.

partial_fit_transform(self, X, y=None)[source]¶

Partial fits and transforms data in all but last step, then partial_fit in last step

Parameters

X: numpy.ndarray of shape (n_samples, n_features): The data upon which the transforms/estimator will create their model.
y: An array_like object of length n_samples: Contains the true class labels for all the samples in X

Returns

Pipeline: self

predict(self, X)[source]¶

Sequentially applies all transforms and then predict with last step.

Parameters

X: numpy.ndarray of shape (n_samples, n_features): All the samples we want to predict the label for.

Returns

list: The predicted class label for all the samples in X.

reset(self)[source]¶

Resets the estimator to its initial state.

Returns

self

set_params(self, **params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

self