skmultiflow.meta.AdditiveExpertEnsembleClassifier

class skmultiflow.meta.AdditiveExpertEnsembleClassifier(n_estimators=5, base_estimator=NaiveBayes(nominal_attributes=None), beta=0.8, gamma=0.1, pruning='weakest')[source]

Additive Expert ensemble classifier.

Parameters
n_estimators: int (default=5)

Maximum number of estimators to hold.

base_estimator: skmultiflow.core.BaseSKMObject or sklearn.BaseEstimator (default=NaiveBayes)

Each member of the ensemble is an instance of the base estimator.

beta: float (default=0.8)

Factor for which to decrease weights by.

gamma: float (default=0.1)

Weight of new experts in ratio to total ensemble weight.

pruning: ‘oldest’ or ‘weakest’ (default=’weakest’)

Pruning strategy to use.

Notes

The Additive Expert Ensemble (AddExp) [1] is a general method for using any online learner for drifting concepts. Using the ‘oldest’ pruning strategy leads to known mistake and error bounds, but using ‘weakest’ is generally better performing.

Bound on mistakes when using ‘oldest’ pruning strategy (theorem 3.1 in the paper): Let \(W_i\) denote the total weight of the ensemble at time step \(i\), and \(M_i\) the number of mistakes of the ensemble at all time steps up to \(i-1\); then for any time step \(t_1 < t_2\), and if we stipulate that \(\beta + 2 * \gamma < 1\) then

\(M_2 - M_1 \leq log(W_1 - W_2) / log(2 / (1 + \beta + 2 * \gamma ))\)

References

1

Kolter and Maloof. Using additive expert ensembles to cope with Concept drift. Proc. 22 International Conference on Machine Learning, 2005.

Examples

>>> # Imports
>>> from skmultiflow.data import SEAGenerator
>>> from skmultiflow.meta import AdditiveExpertEnsembleClassifier
>>>
>>> # Setup a data stream
>>> stream = SEAGenerator(random_state=1)
>>>
>>> # Setup Additive Expert Ensemble Classifier
>>> add_exp = AdditiveExpertEnsembleClassifier()
>>>
>>> # Setup variables to control loop and track performance
>>> n_samples = 0
>>> correct_cnt = 0
>>> max_samples = 200
>>>
>>> # Train the classifier with the samples provided by the data stream
>>> while n_samples < max_samples and stream.has_more_samples():
>>>     X, y = stream.next_sample()
>>>     y_pred = add_exp.predict(X)
>>>     if y[0] == y_pred[0]:
>>>         correct_cnt += 1
>>>     add_exp.partial_fit(X, y)
>>>     n_samples += 1
>>>
>>> # Display results
>>> print('{} samples analyzed'.format(n_samples))
>>> print('Additive Expert Ensemble accuracy: {}'.format(correct_cnt / n_samples))

Methods

fit(self, X, y[, classes, sample_weight])

Fit the model.

fit_single_sample(self, X, y[, classes, …])

Predict + update weights + modify experts + train on new sample.

get_expert_predictions(self, X)

Returns predictions of each class for each expert.

get_info(self)

Collects and returns the information about the configuration of the estimator

get_params(self[, deep])

Get parameters for this estimator.

partial_fit(self, X, y[, classes, sample_weight])

Partially fits the model on the supplied X and y matrices.

predict(self, X)

Predicts the class labels of X in a general classification setting.

predict_proba(self, X)

Not implemented for this method.

reset(self)

Resets the estimator to its initial state.

score(self, X, y[, sample_weight])

Returns the mean accuracy on the given test data and labels.

set_params(self, **params)

Set the parameters of this estimator.

class WeightedExpert(estimator, weight)[source]

Wrapper that includes an estimator and its weight.

Parameters
estimator: StreamModel or sklearn.BaseEstimator

The estimator to wrap.

weight: float

The estimator’s weight.

fit(self, X, y, classes=None, sample_weight=None)[source]

Fit the model.

Parameters
Xnumpy.ndarray of shape (n_samples, n_features)

The features to train the model.

y: numpy.ndarray of shape (n_samples, n_targets)

An array-like with the class labels of all samples in X.

classes: numpy.ndarray, optional (default=None)

Contains all possible/known class labels. Usage varies depending on the learning method.

sample_weight: numpy.ndarray, optional (default=None)

Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

Returns
self
fit_single_sample(self, X, y, classes=None, sample_weight=None)[source]

Predict + update weights + modify experts + train on new sample. (As described in the original paper.)

get_expert_predictions(self, X)[source]

Returns predictions of each class for each expert. In shape: (n_experts,)

get_info(self)[source]

Collects and returns the information about the configuration of the estimator

Returns
string

Configuration of the estimator.

get_params(self, deep=True)[source]

Get parameters for this estimator.

Parameters
deepboolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

partial_fit(self, X, y, classes=None, sample_weight=None)[source]

Partially fits the model on the supplied X and y matrices.

Since it’s an ensemble learner, if X and y matrix of more than one sample are passed, the algorithm will partial fit the model one sample at a time.

Parameters
X: numpy.ndarray of shape (n_samples, n_features)

Features matrix used for partially updating the model.

y: Array-like

An array-like of all the class labels for the samples in X.

classes: numpy.ndarray (default=None)

Array with all possible/known class labels.

sample_weight: Not used (default=None)
Returns
AdditiveExpertEnsembleClassifier

self

predict(self, X)[source]

Predicts the class labels of X in a general classification setting.

The predict function will take an average of the predictions of its learners, weighted by their respective weights, and return the most likely class.

Parameters
X: numpy.ndarray of shape (n_samples, n_features)

A matrix of the samples we want to predict.

Returns
numpy.ndarray

A numpy.ndarray with the label prediction for all the samples in X.

predict_proba(self, X)[source]

Not implemented for this method.

reset(self)[source]

Resets the estimator to its initial state.

Returns
self
score(self, X, y, sample_weight=None)[source]

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters
Xarray-like, shape = (n_samples, n_features)

Test samples.

yarray-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like, shape = [n_samples], optional

Sample weights.

Returns
scorefloat

Mean accuracy of self.predict(X) wrt. y.

set_params(self, **params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns
self