skmultiflow.meta.
OnlineCSB2Classifier
Online CSB2 ensemble classifier.
Online CSB2 [1] is the online version of the ensemble learner CSB2.
CSB2 algorithm is a compromise between AdaBoost and AdaC2. For correctly classified examples, CSB2 treats them in the same way as AdaBoost, while for misclassified examples, it does the same as AdaC2. In addition, the voting weight of each base learner in CSB2 is the same as AdaBoost.
This online ensemble learner method is improved by the addition of an ADWIN change detector.
ADWIN stands for Adaptive Windowing. It works by keeping updated statistics of a variable sized window, so it can detect changes and perform cuts in its window to better adapt the learning algorithms.
Each member of the ensemble is an instance of the base estimator.
The size of the ensemble, in other words, how many classifiers to train.
The cost of misclassifying a positive sample.
The cost of misclassifying a negative sample.
of the classifiers and adapt when a drift is detected.
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
References
B. Wang and J. Pineau, “Online Bagging and Boosting for Imbalanced Data Streams,” in IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 12, pp. 3353-3366, 1 Dec. 2016. doi: 10.1109/TKDE.2016.2609424
Examples
>>> # Imports >>> from skmultiflow.data import SEAGenerator >>> from skmultiflow.meta import OnlineCSB2Classifier >>> >>> # Setup a data stream >>> stream = SEAGenerator(random_state=1) >>> >>> # Setup variables to control loop and track performance >>> n_samples = 0 >>> correct_cnt = 0 >>> max_samples = 200 >>> >>> # Setup the Online CSB2 Classifier >>> online_csb2 = OnlineCSB2Classifier() >>> >>> # Train the classifier with the samples provided by the data stream >>> while n_samples < max_samples and stream.has_more_samples(): >>> X, y = stream.next_sample() >>> y_pred = online_csb2.predict(X) >>> if y[0] == y_pred[0]: >>> correct_cnt += 1 >>> online_csb2.partial_fit(X, y) >>> n_samples += 1 >>> >>> # Display results >>> print('{} samples analyzed.'.format(n_samples)) >>> print('Online CSB2 performance: {}'.format(correct_cnt / n_samples))
Methods
fit(self, X, y[, classes, sample_weight])
fit
Fit the model.
get_info(self)
get_info
Collects and returns the information about the configuration of the estimator
get_params(self[, deep])
get_params
Get parameters for this estimator.
partial_fit(self, X, y[, classes, sample_weight])
partial_fit
Partially fits the model, based on the X and y matrix.
predict(self, X)
predict
The predict function will average the predictions from all its learners to find the most likely prediction for the sample matrix X.
predict_proba(self, X)
predict_proba
Predicts the probability of each sample belonging to each one of the known classes.
reset(self)
reset
Resets the estimator to its initial state.
score(self, X, y[, sample_weight])
score
Returns the mean accuracy on the given test data and labels.
set_params(self, **params)
set_params
Set the parameters of this estimator.
The features to train the model.
An array-like with the class labels of all samples in X.
Contains all possible/known class labels. Usage varies depending on the learning method.
Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.
Configuration of the estimator.
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Parameter names mapped to their values.
Since it’s an ensemble learner, if X and y matrix of more than one sample are passed, the algorithm will partial fit the model one sample at a time.
Each sample is trained by each classifier a total of K times, where K is drawn by a Poisson(l) distribution. l is updated after every example using \(lambda_{sc}\) if th estimator correctly classifies the example or \(lambda_{sw}\) in the other case.
Array with all possible/known class labels. This is an optional parameter, except for the first partial_fit call where it is compulsory.
Instance weight. If not provided, uniform weights are assumed. Usage varies depending on the base estimator.
A matrix of the samples we want to predict.
A numpy.ndarray with the label prediction for all the samples in X.
An array of shape (n_samples, n_features), in which each outer entry is associated with the X entry of the same index. And where the list in index [i] contains len(self.target_values) elements, each of which represents the probability that the i-th sample of X belongs to a certain label.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Test samples.
True labels for X.
Sample weights.
Mean accuracy of self.predict(X) wrt. y.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
<component>__<parameter>