skmultiflow.core.Pipeline

class skmultiflow.core.Pipeline(steps)[source]

[Experimental] Holds a set of sequential operation (transforms), followed by a single estimator.

It allows for easy manipulation of datasets that may require several transformation processes before being used by a learner. Also allows for the cross-validation of several steps.

Each of the intermediate steps should be an extension of the BaseTransform class, or at least implement the transform and partial_fit functions or the partial_fit_transform.

The last step should be an estimator (learner), so it should implement partial_fit, and predict at least.

Since it has an estimator as the last step, the Pipeline will act like an estimator itself, in a way that it can be directly passed to evaluation objects, as if it was a learner.

Parameters
steps: list of tuple

Tuple list containing the set of transforms and the final estimator. It doesn’t need to contain a transform type object, but the estimator is required. Each tuple should be of the format (‘name’, estimator).

Raises
TypeError: If the intermediate steps or the final estimator do not implement
the necessary functions for the pipeline to work, a TypeError is raised.
NotImplementedError: Some of the functions are yet to be implemented.

Notes

This code is an experimental feature. Use with caution.

Examples

>>> # Imports
>>> from skmultiflow.lazy import KNNADWINClassifier
>>> from skmultiflow.core import Pipeline
>>> from skmultiflow.data import FileStream
>>> from skmultiflow.evaluation import EvaluatePrequential
>>> from skmultiflow.transform import OneHotToCategorical
>>> # Setting up the stream
>>> stream = FileStream("https://raw.githubusercontent.com/scikit-multiflow/"
...                     "streaming-datasets/master/covtype.csv")
>>> transform = OneHotToCategorical([[10, 11, 12, 13],
... [14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
... 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53]])
>>> # Setting up the classifier
>>> classifier = KNNADWINClassifier(n_neighbors=8, max_window_size=2000, leaf_size=40)
>>> # Setup the pipeline
>>> pipe = Pipeline([('transform', transform), ('passive_aggressive', classifier)])
>>> # Setup the evaluator
>>> evaluator = EvaluatePrequential(show_plot=True, pretrain_size=1000, max_samples=500000)
>>> # Evaluate
>>> evaluator.evaluate(stream=stream, model=pipe)

Methods

fit(self, X, y)

Sequentially fit and transform data in all but last step, then fit the model in last step.

get_info(self)

Collects and returns the information about the configuration of the estimator

get_params(self[, deep])

Get parameters for this estimator.

named_steps(self)

Generates a dictionary to access all the steps’ properties.

partial_fit(self, X, y[, classes])

Sequentially partial fit and transform data in all but last step, then partial fit data in last step.

partial_fit_predict(self, X, y)

Partial fits and transforms data in all but last step, then partial fits and predicts in the last step

partial_fit_transform(self, X[, y])

Partial fits and transforms data in all but last step, then partial_fit in last step

predict(self, X)

Sequentially applies all transforms and then predict with last step.

reset(self)

Resets the estimator to its initial state.

set_params(self, **params)

Set the parameters of this estimator.

fit(self, X, y)[source]

Sequentially fit and transform data in all but last step, then fit the model in last step.

Parameters
X: numpy.ndarray of shape (n_samples, n_features)

The data upon which the transforms/estimator will create their model.

y: An array_like object of length n_samples

Contains the true class labels for all the samples in X.

Returns
Pipeline

self

get_info(self)[source]

Collects and returns the information about the configuration of the estimator

Returns
string

Configuration of the estimator.

get_params(self, deep=True)[source]

Get parameters for this estimator.

Parameters
deepboolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

named_steps(self)[source]

Generates a dictionary to access all the steps’ properties.

Returns
dictionary

A steps dictionary, so that each step can be accessed by name.

partial_fit(self, X, y, classes=None)[source]

Sequentially partial fit and transform data in all but last step, then partial fit data in last step.

Parameters
Xnumpy.ndarray of shape (n_samples, n_features)

The features to train the model.

y: numpy.ndarray of shape (n_samples)

An array-like with the class labels of all samples in X.

classes: numpy.ndarray

Array with all possible/known class labels. This is an optional parameter, except for the first partial_fit call where it is compulsory.

Returns
Pipeline

self

partial_fit_predict(self, X, y)[source]

Partial fits and transforms data in all but last step, then partial fits and predicts in the last step

Parameters
X: numpy.ndarray of shape (n_samples, n_features)

All the samples we want to predict the label for.

y: An array_like object of length n_samples

Contains the true class labels for all the samples in X

Returns
list

The predicted class label for all the samples in X.

partial_fit_transform(self, X, y=None)[source]

Partial fits and transforms data in all but last step, then partial_fit in last step

Parameters
X: numpy.ndarray of shape (n_samples, n_features)

The data upon which the transforms/estimator will create their model.

y: An array_like object of length n_samples

Contains the true class labels for all the samples in X

Returns
Pipeline

self

predict(self, X)[source]

Sequentially applies all transforms and then predict with last step.

Parameters
X: numpy.ndarray of shape (n_samples, n_features)

All the samples we want to predict the label for.

Returns
list

The predicted class label for all the samples in X.

reset(self)[source]

Resets the estimator to its initial state.

Returns
self
set_params(self, **params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns
self