skmultiflow.lazy.
KNNADWINClassifier
K-Nearest Neighbors classifier with ADWIN change detector.
This Classifier is an improvement from the regular KNNClassifier, as it is resistant to concept drift. It utilises the ADWIN change detector to decide which samples to keep and which ones to forget, and by doing so it regulates the sample window size.
To know more about the ADWIN change detector, please see skmultiflow.drift_detection.ADWIN
skmultiflow.drift_detection.ADWIN
It uses the regular KNNClassifier as a base class, with the major difference that this class keeps a variable size window, instead of a fixed size one and also it updates the adwin algorithm at each partial_fit call.
The number of nearest neighbors to search for.
The maximum size of the window storing the last viewed samples.
The maximum number of samples that can be stored in one leaf node, which determines from which point the algorithm will switch for a brute-force approach. The bigger this number the faster the tree construction time, but the slower the query time will be.
sklearn.KDTree parameter. The distance metric to use for the KDTree. Default=’euclidean’. KNNClassifier.valid_metrics() gives a list of the metrics which are valid for KDTree.
Notes
This estimator is not optimal for a mixture of categorical and numerical features. This implementation treats all features from a given stream as numerical.
Examples
>>> # Imports >>> from skmultiflow.lazy import KNNADWINClassifier >>> from skmultiflow.data import ConceptDriftStream >>> # Setting up the stream >>> stream = ConceptDriftStream(position=2500, width=100, random_state=1) >>> # Setting up the KNNAdwin classifier >>> knn_adwin = KNNADWINClassifier(n_neighbors=8, leaf_size=40, max_window_size=1000) >>> # Keep track of sample count and correct prediction count >>> n_samples = 0 >>> corrects = 0 >>> while n_samples < 5000: ... X, y = stream.next_sample() ... pred = knn_adwin.predict(X) ... if y[0] == pred[0]: ... corrects += 1 ... knn_adwin = knn_adwin.partial_fit(X, y) ... n_samples += 1 >>> >>> # Displaying the results >>> print('KNNClassifier usage example') >>> print(str(n_samples) + ' samples analyzed.') 5000 samples analyzed. >>> print("KNNADWINClassifier's performance: " + str(corrects/n_samples)) KNNAdwin's performance: 0.5714
Methods
fit(self, X, y[, classes, sample_weight])
fit
Fit the model.
get_info(self)
get_info
Collects and returns the information about the configuration of the estimator
get_params(self[, deep])
get_params
Get parameters for this estimator.
partial_fit(self, X, y[, classes, sample_weight])
partial_fit
Partially (incrementally) fit the model.
predict(self, X)
predict
Predict the class label for sample X
predict_proba(self, X)
predict_proba
Estimate the probability of X belonging to each class-labels.
reset(self)
reset
Reset the estimator.
score(self, X, y[, sample_weight])
score
Returns the mean accuracy on the given test data and labels.
set_params(self, **params)
set_params
Set the parameters of this estimator.
valid_metrics()
valid_metrics
Get valid distance metrics for the KDTree.
The features to train the model.
An array-like with the class labels of all samples in X.
Contains all possible/known class labels. Usage varies depending on the learning method.
Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.
Configuration of the estimator.
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Parameter names mapped to their values.
The data upon which the algorithm will create its model.
An array-like containing the classification targets for all samples in X.
Array with all possible/known classes.
self
Partially fits the model by updating the window with new samples while also updating the ADWIN algorithm. IF ADWIN detects a change, the window is split in such a wat that samples from the previous concept are dropped.
All the samples we want to predict the label for.
A 1D array of shape (, n_samples), containing the predicted class labels for all instances in X.
A 2D array of shape (n_samples, n_classes). Where each i-th row contains len(self.target_value) elements, representing the probability that the i-th sample of X belongs to a certain class label.
Resets the ADWIN Drift detector as well as the KNN model.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Test samples.
True labels for X.
Sample weights.
Mean accuracy of self.predict(X) wrt. y.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
<component>__<parameter>