skmultiflow.meta.AdditiveExpertEnsembleClassifier¶

class skmultiflow.meta.AdditiveExpertEnsembleClassifier(n_estimators=5, base_estimator=NaiveBayes(nominal_attributes=None), beta=0.8, gamma=0.1, pruning='weakest')[source]¶

Additive Expert ensemble classifier.

Parameters

n_estimators: int (default=5): Maximum number of estimators to hold.
base_estimator: skmultiflow.core.BaseSKMObject or sklearn.BaseEstimator (default=NaiveBayes): Each member of the ensemble is an instance of the base estimator.
beta: float (default=0.8): Factor for which to decrease weights by.
gamma: float (default=0.1): Weight of new experts in ratio to total ensemble weight.
pruning: ‘oldest’ or ‘weakest’ (default=’weakest’): Pruning strategy to use.

Notes

The Additive Expert Ensemble (AddExp) [1] is a general method for using any online learner for drifting concepts. Using the ‘oldest’ pruning strategy leads to known mistake and error bounds, but using ‘weakest’ is generally better performing.

Bound on mistakes when using ‘oldest’ pruning strategy (theorem 3.1 in the paper): Let \(W_i\) denote the total weight of the ensemble at time step \(i\), and \(M_i\) the number of mistakes of the ensemble at all time steps up to \(i-1\); then for any time step \(t_1 < t_2\), and if we stipulate that \(\beta + 2 * \gamma < 1\) then

\(M_2 - M_1 \leq log(W_1 - W_2) / log(2 / (1 + \beta + 2 * \gamma ))\)

References

1: Kolter and Maloof. Using additive expert ensembles to cope with Concept drift. Proc. 22 International Conference on Machine Learning, 2005.

Examples

>>> # Imports
>>> from skmultiflow.data import SEAGenerator
>>> from skmultiflow.meta import AdditiveExpertEnsembleClassifier
>>>
>>> # Setup a data stream
>>> stream = SEAGenerator(random_state=1)
>>>
>>> # Setup Additive Expert Ensemble Classifier
>>> add_exp = AdditiveExpertEnsembleClassifier()
>>>
>>> # Setup variables to control loop and track performance
>>> n_samples = 0
>>> correct_cnt = 0
>>> max_samples = 200
>>>
>>> # Train the classifier with the samples provided by the data stream
>>> while n_samples < max_samples and stream.has_more_samples():
>>>     X, y = stream.next_sample()
>>>     y_pred = add_exp.predict(X)
>>>     if y[0] == y_pred[0]:
>>>         correct_cnt += 1
>>>     add_exp.partial_fit(X, y)
>>>     n_samples += 1
>>>
>>> # Display results
>>> print('{} samples analyzed'.format(n_samples))
>>> print('Additive Expert Ensemble accuracy: {}'.format(correct_cnt / n_samples))

Methods

`fit`(self, X, y[, classes, sample_weight])	Fit the model.
`fit_single_sample`(self, X, y[, classes, …])	Predict + update weights + modify experts + train on new sample.
`get_expert_predictions`(self, X)	Returns predictions of each class for each expert.
`get_info`(self)	Collects and returns the information about the configuration of the estimator
`get_params`(self[, deep])	Get parameters for this estimator.
`partial_fit`(self, X, y[, classes, sample_weight])	Partially fits the model on the supplied X and y matrices.
`predict`(self, X)	Predicts the class labels of X in a general classification setting.
`predict_proba`(self, X)	Not implemented for this method.
`reset`(self)	Resets the estimator to its initial state.
`score`(self, X, y[, sample_weight])	Returns the mean accuracy on the given test data and labels.
`set_params`(self, **params)	Set the parameters of this estimator.

class WeightedExpert(estimator, weight)[source]¶

Wrapper that includes an estimator and its weight.

Parameters

estimator: StreamModel or sklearn.BaseEstimator: The estimator to wrap.
weight: float: The estimator’s weight.

fit(self, X, y, classes=None, sample_weight=None)[source]¶

Fit the model.

Parameters

Xnumpy.ndarray of shape (n_samples, n_features): The features to train the model.
y: numpy.ndarray of shape (n_samples, n_targets): An array-like with the class labels of all samples in X.
classes: numpy.ndarray, optional (default=None): Contains all possible/known class labels. Usage varies depending on the learning method.
sample_weight: numpy.ndarray, optional (default=None): Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

Returns

self

fit_single_sample(self, X, y, classes=None, sample_weight=None)[source]¶: Predict + update weights + modify experts + train on new sample. (As described in the original paper.)

get_expert_predictions(self, X)[source]¶: Returns predictions of each class for each expert. In shape: (n_experts,)

get_info(self)[source]¶

Collects and returns the information about the configuration of the estimator

Returns

string: Configuration of the estimator.

get_params(self, deep=True)[source]¶

Get parameters for this estimator.

Parameters

deepboolean, optional: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsmapping of string to any: Parameter names mapped to their values.

partial_fit(self, X, y, classes=None, sample_weight=None)[source]¶

Partially fits the model on the supplied X and y matrices.

Since it’s an ensemble learner, if X and y matrix of more than one sample are passed, the algorithm will partial fit the model one sample at a time.

Parameters

X: numpy.ndarray of shape (n_samples, n_features): Features matrix used for partially updating the model.
y: Array-like: An array-like of all the class labels for the samples in X.
classes: numpy.ndarray (default=None): Array with all possible/known class labels.
sample_weight: Not used (default=None)

Returns

AdditiveExpertEnsembleClassifier: self

predict(self, X)[source]¶

Predicts the class labels of X in a general classification setting.

The predict function will take an average of the predictions of its learners, weighted by their respective weights, and return the most likely class.

Parameters

X: numpy.ndarray of shape (n_samples, n_features): A matrix of the samples we want to predict.

Returns

numpy.ndarray: A numpy.ndarray with the label prediction for all the samples in X.

predict_proba(self, X)[source]¶: Not implemented for this method.

reset(self)[source]¶

Resets the estimator to its initial state.

Returns

self

score(self, X, y, sample_weight=None)[source]¶

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

Xarray-like, shape = (n_samples, n_features): Test samples.
yarray-like, shape = (n_samples) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like, shape = [n_samples], optional: Sample weights.

Returns

scorefloat: Mean accuracy of self.predict(X) wrt. y.

set_params(self, **params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

self