skmultiflow.meta.OnlineBoostingClassifier¶

class skmultiflow.meta.OnlineBoostingClassifier(base_estimator=KNNADWINClassifier(leaf_size=30, max_window_size=1000, metric='euclidean', n_neighbors=5), n_estimators=10, drift_detection=True, random_state=None)[source]¶

Online Boosting ensemble classifier.

Online Boosting [1] is the online version of the boosting ensemble method (AdaBoost).

AdaBoost focuses more on difficult examples. The misclassified examples by the current classifier \(h_m\) are given more weights in the training set of the following learner \(h_{m+1}\).

In the online context, since there is no training dataset, but a stream of samples, the drawing of samples with replacement can’t be trivially executed. The strategy adopted by the Online Boosting algorithm is to simulate this task by training each arriving sample K times, which is drawn by the binomial distribution. Since we can consider the data stream to be infinite, and knowing that with infinite samples the binomial distribution \(Binomial(p, N)\) tends to a \(Poisson(\lambda)\) distribution, where \(\lambda = Np\). \(\lambda\) is computed by tracking the total weights of the correctly and misclassified examples.

This online ensemble learner method is improved by the addition of an ADWIN change detector.

ADWIN stands for Adaptive Windowing. It works by keeping updated statistics of a variable sized window, so it can detect changes and perform cuts in its window to better adapt the learning algorithms.

Parameters

base_estimator: skmultiflow.core.BaseSKMObject or sklearn.BaseEstimator (default=KNNADWINClassifier)

Each member of the ensemble is an instance of the base estimator.

n_estimators: int, optional (default=10)

The size of the ensemble, in other words, how many classifiers to train.

drift_detection: bool, optional (default=True)

A drift detector (ADWIN) can be used by the method to track the performance: of the classifiers and adapt when a drift is detected.

random_state: int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Raises

NotImplementedError: A few of the functions described here are not
implemented since they have no application in this context.
ValueError: A ValueError is raised if the ‘classes’ parameter is
not passed in the first partial_fit call.

References

1: B. Wang and J. Pineau, “Online Bagging and Boosting for Imbalanced Data Streams,” in IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 12, pp. 3353-3366, 1 Dec. 2016. doi: 10.1109/TKDE.2016.2609424

Examples

>>> # Imports
>>> from skmultiflow.data import SEAGenerator
>>> from skmultiflow.meta import OnlineBoostingClassifier
>>>
>>> # Setup a data stream
>>> stream = SEAGenerator(random_state=1)
>>>
>>> # Setup variables to control loop and track performance
>>> n_samples = 0
>>> correct_cnt = 0
>>> max_samples = 200
>>>
>>> # Setup the Online Boosting Classifier
>>> online_boosting = OnlineBoostingClassifier()
>>>
>>> # Train the classifier with the samples provided by the data stream
>>> while n_samples < max_samples and stream.has_more_samples():
>>>     X, y = stream.next_sample()
>>>     y_pred = online_boosting.predict(X)
>>>     if y[0] == y_pred[0]:
>>>         correct_cnt += 1
>>>     online_boosting.partial_fit(X, y)
>>>     n_samples += 1
>>>
>>> # Display results
>>> print('{} samples analyzed.'.format(n_samples))
>>> print('Online Boosting performance: {}'.format(correct_cnt / n_samples))

Methods

`fit`(self, X, y[, classes, sample_weight])	Fit the model.
`get_info`(self)	Collects and returns the information about the configuration of the estimator
`get_params`(self[, deep])	Get parameters for this estimator.
`partial_fit`(self, X, y[, classes, sample_weight])	Partially fits the model, based on the X and y matrix.
`predict`(self, X)	The predict function will average the predictions from all its learners to find the most likely prediction for the sample matrix X.
`predict_proba`(self, X)	Predicts the probability of each sample belonging to each one of the known classes.
`reset`(self)	Resets the estimator to its initial state.
`score`(self, X, y[, sample_weight])	Returns the mean accuracy on the given test data and labels.
`set_params`(self, **params)	Set the parameters of this estimator.

fit(self, X, y, classes=None, sample_weight=None)[source]¶

Fit the model.

Parameters

Xnumpy.ndarray of shape (n_samples, n_features): The features to train the model.
y: numpy.ndarray of shape (n_samples, n_targets): An array-like with the class labels of all samples in X.
classes: numpy.ndarray, optional (default=None): Contains all possible/known class labels. Usage varies depending on the learning method.
sample_weight: numpy.ndarray, optional (default=None): Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

Returns

self

get_info(self)[source]¶

Collects and returns the information about the configuration of the estimator

Returns

string: Configuration of the estimator.

get_params(self, deep=True)[source]¶

Get parameters for this estimator.

Parameters

deepboolean, optional: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsmapping of string to any: Parameter names mapped to their values.

partial_fit(self, X, y, classes=None, sample_weight=None)[source]¶

Partially fits the model, based on the X and y matrix.

Since it’s an ensemble learner, if X and y matrix of more than one sample are passed, the algorithm will partial fit the model one sample at a time.

Each sample is trained by each classifier a total of K times, where K is drawn by a Poisson(l) distribution. l is updated after every example using \(lambda_{sc}\) if th estimator correctly classifies the example or \(lambda_{sw}\) in the other case.

Parameters

Xnumpy.ndarray of shape (n_samples, n_features): The features to train the model.
y: numpy.ndarray of shape (n_samples): An array-like with the class labels of all samples in X.
classes: numpy.ndarray, optional (default=None): Array with all possible/known class labels. This is an optional parameter, except for the first partial_fit call where it is compulsory.
sample_weight: Array-like: Instance weight. If not provided, uniform weights are assumed. Usage varies depending on the base estimator.

Returns

self

Raises

ValueError: A ValueError is raised if the ‘classes’ parameter is not
passed in the first partial_fit call, or if they are passed in further
calls but differ from the initial classes list passed.

predict(self, X)[source]¶

The predict function will average the predictions from all its learners to find the most likely prediction for the sample matrix X.

Parameters

X: Numpy.ndarray of shape (n_samples, n_features): A matrix of the samples we want to predict.

Returns

numpy.ndarray: A numpy.ndarray with the label prediction for all the samples in X.

predict_proba(self, X)[source]¶

Predicts the probability of each sample belonging to each one of the known classes.

Parameters

X: Numpy.ndarray of shape (n_samples, n_features): A matrix of the samples we want to predict.

Returns

numpy.ndarray: An array of shape (n_samples, n_features), in which each outer entry is associated with the X entry of the same index. And where the list in index [i] contains len(self.target_values) elements, each of which represents the probability that the i-th sample of X belongs to a certain label.

Raises

ValueError: A ValueError is raised if the number of classes in the base_estimator
learner differs from that of the ensemble learner.

reset(self)[source]¶

Resets the estimator to its initial state.

Returns

self

score(self, X, y, sample_weight=None)[source]¶

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

Xarray-like, shape = (n_samples, n_features): Test samples.
yarray-like, shape = (n_samples) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like, shape = [n_samples], optional: Sample weights.

Returns

scorefloat: Mean accuracy of self.predict(X) wrt. y.

set_params(self, **params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

self