skmultiflow.meta.LearnPPNSEClassifier

class skmultiflow.meta.LearnPPNSEClassifier(base_estimator=DecisionTreeClassifier(), window_size=250, slope=0.5, crossing_point=10, n_estimators=15, pruning=None)[source]

Learn++.NSE ensemble classifier.

Learn++.NSE [1] is an ensemble of classifiers for incremental learning from non-stationary environments (NSEs) where the underlying data distributions change over time. It learns from consecutive batches of data that experience constant or variable rate of drift, addition or deletion of concept classes, as well as cyclical drift.

Parameters
base_estimator: StreamModel or sklearn.BaseEstimator (default=DecisionTreeClassifier)

Each member of the ensemble is an instance of the base estimator.

n_estimators: int (default=15)

The number of base estimators in the ensemble.

window_size: int (default=250)

The size of the training window (batch), in other words, how many instances are kept for training.

crossing_point: float (default=0.5)

Halfway crossing point of the sigmoid function controlling the number of previous periods taken into account during weighting.

slope: float (default=0.5)

Slope of the sigmoid function controlling the number of previous periods taken into account during weighting.

pruning: string (default=None)

Classifiers pruning strategy to be used. pruning=None: Don’t prune classifiers pruning=’age’: Age-based pruning=’error’: Error-based

References

1

Ryan Elwell and Robi Polikar. Incremental learning of concept drift in non-stationary environments. IEEE Transactions on Neural Networks, 22(10):1517-1531, October 2011. ISSN 1045-9227. URL http://dx.doi.org/10.1109/TNN.2011.2160459

Examples

>>> # Imports
>>> from skmultiflow.data import SEAGenerator
>>> from skmultiflow.meta import LearnPPNSEClassifier
>>>
>>> # Setup a data stream
>>> stream = SEAGenerator(random_state=1)
>>>
>>> # Setup Learn++.NSE Classifier
>>> learn_pp_nse = LearnPPNSEClassifier()
>>>
>>> # Setup variables to control loop and track performance
>>> n_samples = 0
>>> correct_cnt = 0
>>> max_samples = 200
>>>
>>> # Train the classifier with the samples provided by the data stream
>>> while n_samples < max_samples and stream.has_more_samples():
>>>     X, y = stream.next_sample()
>>>     y_pred = learn_pp_nse.predict(X)
>>>     if y[0] == y_pred[0]:
>>>         correct_cnt += 1
>>>     learn_pp_nse.partial_fit(X, y, classes=stream.target_values)
>>>     n_samples += 1
>>>
>>> # Display results
>>> print('{} samples analyzed.'.format(n_samples))
>>> print('Learn++.NSE classifier accuracy: {}'.format(correct_cnt / n_samples))

Methods

fit(self, X, y[, classes, sample_weight])

Fit the model.

get_info(self)

Collects and returns the information about the configuration of the estimator

get_params(self[, deep])

Get parameters for this estimator.

partial_fit(self, X[, y, classes, sample_weight])

Partially fits the model, based on the X and y matrix.

predict(self, X)

Predicts the class for a given sample by majority vote from all the members of the ensemble.

predict_proba(self, X)

Predicts the probability of each sample belonging to each one of the known classes.

reset(self)

Resets the estimator to its initial state.

score(self, X, y[, sample_weight])

Returns the mean accuracy on the given test data and labels.

set_params(self, **params)

Set the parameters of this estimator.

fit(self, X, y, classes=None, sample_weight=None)[source]

Fit the model.

Parameters
Xnumpy.ndarray of shape (n_samples, n_features)

The features to train the model.

y: numpy.ndarray of shape (n_samples, n_targets)

An array-like with the class labels of all samples in X.

classes: numpy.ndarray, optional (default=None)

Contains all possible/known class labels. Usage varies depending on the learning method.

sample_weight: numpy.ndarray, optional (default=None)

Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

Returns
self
get_info(self)[source]

Collects and returns the information about the configuration of the estimator

Returns
string

Configuration of the estimator.

get_params(self, deep=True)[source]

Get parameters for this estimator.

Parameters
deepboolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

partial_fit(self, X, y=None, classes=None, sample_weight=None)[source]

Partially fits the model, based on the X and y matrix.

Parameters
X: numpy.ndarray of shape (n_samples, n_features)

Features matrix used for partially updating the model.

y: Array-like

An array-like of all the class labels for the samples in X.

classes: numpy.ndarray, optional (default=None)

Array with all possible/known class labels. This is an optional parameter, except for the first partial_fit call where it is compulsory.

sample_weight: NOT used (default=None)
Returns
LearnPPNSEClassifier

self

Raises
RuntimeError:

A RuntimeError is raised if the ‘classes’ parameter is not passed in the first partial_fit call, or if they are passed in further calls but differ from the initial classes list passed. A RuntimeError is raised if the base_estimator is too weak. In other word, it has too low accuracy on the dataset.

predict(self, X)[source]

Predicts the class for a given sample by majority vote from all the members of the ensemble.

Parameters
X: numpy.ndarray of shape (n_samples, n_features)

A matrix of the samples we want to predict.

Returns
——-
numpy.ndarray

A numpy.ndarray with the label prediction for all the samples in X.

predict_proba(self, X)[source]

Predicts the probability of each sample belonging to each one of the known classes.

Parameters
X: numpy.ndarray of shape (n_samples, n_features)

A matrix of the samples we want to predict.

Returns
numpy.ndarray

An array of shape (n_samples, n_features), in which each outer entry is associated with the X entry of the same index. And where the list in index [i] contains len(self.target_values) elements, each of which represents the probability that the i-th sample of X belongs to a certain label.

reset(self)[source]

Resets the estimator to its initial state.

Returns
self
score(self, X, y, sample_weight=None)[source]

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters
Xarray-like, shape = (n_samples, n_features)

Test samples.

yarray-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like, shape = [n_samples], optional

Sample weights.

Returns
scorefloat

Mean accuracy of self.predict(X) wrt. y.

set_params(self, **params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns
self