skmultiflow.meta.DynamicWeightedMajorityClassifier

class skmultiflow.meta.DynamicWeightedMajorityClassifier(n_estimators=5, base_estimator=NaiveBayes(nominal_attributes=None), period=50, beta=0.5, theta=0.01)[source]

Dynamic Weighted Majority ensemble classifier.

Parameters
n_estimators: int (default=5)

Maximum number of estimators to hold.

base_estimator: StreamModel or sklearn.BaseEstimator (default=NaiveBayes)

Each member of the ensemble is an instance of the base estimator.

period: int (default=50)

Period between expert removal, creation, and weight update.

beta: float (default=0.5)

Factor for which to decrease weights by.

theta: float (default=0.01)

Minimum fraction of weight per model.

Notes

The dynamic weighted majority (DWM) [1], uses four mechanisms to cope with concept drift: It trains online learners of the ensemble, it weights those learners based on their performance, it removes them, also based on their performance, and it adds new experts based on the global performance of the ensemble.

References

1

Kolter and Maloof. Dynamic weighted majority: An ensemble method for drifting concepts. The Journal of Machine Learning Research, 8:2755-2790, December 2007. ISSN 1532-4435.

Examples

>>> # Imports
>>> from skmultiflow.data import SEAGenerator
>>>from skmultiflow.meta import DynamicWeightedMajorityClassifier
>>>
>>> # Setup a data stream
>>> stream = SEAGenerator(random_state=1)
>>>
>>> # Setup Dynamic Weighted Majority Ensemble Classifier
>>> dwm = DynamicWeightedMajorityClassifier()
>>>
>>> # Setup variables to control loop and track performance
>>> n_samples = 0
>>> correct_cnt = 0
>>> max_samples = 200
>>>
>>> # Train the classifier with the samples provided by the data stream
>>> while n_samples < max_samples and stream.has_more_samples():
>>>     X, y = stream.next_sample()
>>>     y_pred = dwm.predict(X)
>>>     if y[0] == y_pred[0]:
>>>         correct_cnt += 1
>>>     dwm.partial_fit(X, y)
>>>     n_samples += 1
>>>
>>> # Display results
>>> print('{} samples analyzed.'.format(n_samples))
>>> print('Dynamic Weighted Majority accuracy: {}'.format(correct_cnt / n_samples))

Methods

fit(self, X, y[, classes, sample_weight])

Fit the model.

fit_single_sample(self, X, y[, classes, …])

Fits a single sample of shape X.shape=(1, n_attributes) and y.shape=(1)

get_expert_predictions(self, X)

Returns predictions of each class for each expert.

get_info(self)

Collects and returns the information about the configuration of the estimator

get_params(self[, deep])

Get parameters for this estimator.

partial_fit(self, X, y[, classes, sample_weight])

Partially fits the model on the supplied X and y matrices.

predict(self, X)

The predict function will take an average of the predictions of its learners, weighted by their respective weights, and return the most likely class.

predict_proba(self, X)

Estimates the probability of each sample in X belonging to each of the class-labels.

reset(self)

Reset this ensemble learner.

score(self, X, y[, sample_weight])

Returns the mean accuracy on the given test data and labels.

set_params(self, **params)

Set the parameters of this estimator.

class WeightedExpert(estimator, weight)[source]

Wrapper that includes an estimator and its weight.

Parameters
estimator: StreamModel or sklearn.BaseEstimator

The estimator to wrap.

weight: float

The estimator’s weight.

fit(self, X, y, classes=None, sample_weight=None)[source]

Fit the model.

Parameters
Xnumpy.ndarray of shape (n_samples, n_features)

The features to train the model.

y: numpy.ndarray of shape (n_samples, n_targets)

An array-like with the class labels of all samples in X.

classes: numpy.ndarray, optional (default=None)

Contains all possible/known class labels. Usage varies depending on the learning method.

sample_weight: numpy.ndarray, optional (default=None)

Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

Returns
self
fit_single_sample(self, X, y, classes=None, sample_weight=None)[source]

Fits a single sample of shape X.shape=(1, n_attributes) and y.shape=(1)

Aggregates all experts’ predictions, diminishes weight of experts whose predictions were wrong, and may create or remove experts every _period_ samples.

Finally, trains each individual expert on the provided data.

Train loop as described by Kolter and Maloof in the original paper.

Parameters
X: numpy.ndarray of shape (n_samples, n_features)

Features matrix used for partially updating the model.

y: Array-like

An array-like of all the class labels for the samples in X.

classes: list

List of all existing classes. This is an optional parameter.

sample_weight: numpy.ndarray of shape (n_samples), optional (default=None)

Samples weight. If not provided, uniform weights are assumed. Applicability depends on the base estimator.

get_expert_predictions(self, X)[source]

Returns predictions of each class for each expert. In shape: (n_experts, n_samples)

get_info(self)[source]

Collects and returns the information about the configuration of the estimator

Returns
string

Configuration of the estimator.

get_params(self, deep=True)[source]

Get parameters for this estimator.

Parameters
deepboolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

partial_fit(self, X, y, classes=None, sample_weight=None)[source]

Partially fits the model on the supplied X and y matrices.

Since it’s an ensemble learner, if X and y matrix of more than one sample are passed, the algorithm will partial fit the model one sample at a time.

Parameters
Xnumpy.ndarray of shape (n_samples, n_features)

The features to train the model.

y: numpy.ndarray of shape (n_samples)

An array-like with the class labels of all samples in X.

classes: numpy.ndarray, optional (default=None)

Array with all possible/known class labels. This is an optional parameter, except for the first partial_fit call where it is compulsory.

sample_weight: numpy.ndarray of shape (n_samples), optional (default=None)

Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the base estimator.

Returns
DynamicWeightedMajorityClassifier

self

predict(self, X)[source]

The predict function will take an average of the predictions of its learners, weighted by their respective weights, and return the most likely class.

Parameters
X: numpy.ndarray of shape (n_samples, n_features)

A matrix of the samples we want to predict.

Returns
numpy.ndarray

A numpy.ndarray with the label prediction for all the samples in X.

predict_proba(self, X)[source]

Estimates the probability of each sample in X belonging to each of the class-labels.

Parameters
Xnumpy.ndarray of shape (n_samples, n_features)

The matrix of samples one wants to predict the class probabilities for.

Returns
A numpy.ndarray of shape (n_samples, n_labels), in which each outer entry is associated
with the X entry of the same index. And where the list in index [i] contains
len(self.target_values) elements, each of which represents the probability that
the i-th sample of X belongs to a certain class-label.
reset(self)[source]

Reset this ensemble learner.

score(self, X, y, sample_weight=None)[source]

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters
Xarray-like, shape = (n_samples, n_features)

Test samples.

yarray-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like, shape = [n_samples], optional

Sample weights.

Returns
scorefloat

Mean accuracy of self.predict(X) wrt. y.

set_params(self, **params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns
self