skmultiflow.lazy.KNNADWINClassifier

class skmultiflow.lazy.KNNADWINClassifier(n_neighbors=5, max_window_size=1000, leaf_size=30, metric='euclidean')[source]

K-Nearest Neighbors classifier with ADWIN change detector.

This Classifier is an improvement from the regular KNNClassifier, as it is resistant to concept drift. It utilises the ADWIN change detector to decide which samples to keep and which ones to forget, and by doing so it regulates the sample window size.

To know more about the ADWIN change detector, please see skmultiflow.drift_detection.ADWIN

It uses the regular KNNClassifier as a base class, with the major difference that this class keeps a variable size window, instead of a fixed size one and also it updates the adwin algorithm at each partial_fit call.

Parameters
n_neighbors: int (default=5)

The number of nearest neighbors to search for.

max_window_size: int (default=1000)

The maximum size of the window storing the last viewed samples.

leaf_size: int (default=30)

The maximum number of samples that can be stored in one leaf node, which determines from which point the algorithm will switch for a brute-force approach. The bigger this number the faster the tree construction time, but the slower the query time will be.

metric: string or sklearn.DistanceMetric object

sklearn.KDTree parameter. The distance metric to use for the KDTree. Default=’euclidean’. KNNClassifier.valid_metrics() gives a list of the metrics which are valid for KDTree.

Notes

This estimator is not optimal for a mixture of categorical and numerical features. This implementation treats all features from a given stream as numerical.

Examples

>>> # Imports
>>> from skmultiflow.lazy import KNNADWINClassifier
>>> from skmultiflow.data import ConceptDriftStream
>>> # Setting up the stream
>>> stream = ConceptDriftStream(position=2500, width=100, random_state=1)
>>> # Setting up the KNNAdwin classifier
>>> knn_adwin = KNNADWINClassifier(n_neighbors=8, leaf_size=40, max_window_size=1000)
>>> # Keep track of sample count and correct prediction count
>>> n_samples = 0
>>> corrects = 0
>>> while n_samples < 5000:
...     X, y = stream.next_sample()
...     pred = knn_adwin.predict(X)
...     if y[0] == pred[0]:
...         corrects += 1
...     knn_adwin = knn_adwin.partial_fit(X, y)
...     n_samples += 1
>>>
>>> # Displaying the results
>>> print('KNNClassifier usage example')
>>> print(str(n_samples) + ' samples analyzed.')
5000 samples analyzed.
>>> print("KNNADWINClassifier's performance: " + str(corrects/n_samples))
KNNAdwin's performance: 0.5714

Methods

fit(self, X, y[, classes, sample_weight])

Fit the model.

get_info(self)

Collects and returns the information about the configuration of the estimator

get_params(self[, deep])

Get parameters for this estimator.

partial_fit(self, X, y[, classes, sample_weight])

Partially (incrementally) fit the model.

predict(self, X)

Predict the class label for sample X

predict_proba(self, X)

Estimate the probability of X belonging to each class-labels.

reset(self)

Reset the estimator.

score(self, X, y[, sample_weight])

Returns the mean accuracy on the given test data and labels.

set_params(self, **params)

Set the parameters of this estimator.

valid_metrics()

Get valid distance metrics for the KDTree.

fit(self, X, y, classes=None, sample_weight=None)[source]

Fit the model.

Parameters
Xnumpy.ndarray of shape (n_samples, n_features)

The features to train the model.

y: numpy.ndarray of shape (n_samples, n_targets)

An array-like with the class labels of all samples in X.

classes: numpy.ndarray, optional (default=None)

Contains all possible/known class labels. Usage varies depending on the learning method.

sample_weight: numpy.ndarray, optional (default=None)

Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

Returns
self
get_info(self)[source]

Collects and returns the information about the configuration of the estimator

Returns
string

Configuration of the estimator.

get_params(self, deep=True)[source]

Get parameters for this estimator.

Parameters
deepboolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

partial_fit(self, X, y, classes=None, sample_weight=None)[source]

Partially (incrementally) fit the model.

Parameters
X: Numpy.ndarray of shape (n_samples, n_features)

The data upon which the algorithm will create its model.

y: Array-like

An array-like containing the classification targets for all samples in X.

classes: numpy.ndarray, optional (default=None)

Array with all possible/known classes.

sample_weight: Not used.
Returns
KNNADWINClassifier

self

Notes

Partially fits the model by updating the window with new samples while also updating the ADWIN algorithm. IF ADWIN detects a change, the window is split in such a wat that samples from the previous concept are dropped.

predict(self, X)[source]

Predict the class label for sample X

Parameters
X: Numpy.ndarray of shape (n_samples, n_features)

All the samples we want to predict the label for.

Returns
numpy.ndarray

A 1D array of shape (, n_samples), containing the predicted class labels for all instances in X.

predict_proba(self, X)[source]

Estimate the probability of X belonging to each class-labels.

Parameters
X: Numpy.ndarray of shape (n_samples, n_features)
Returns
numpy.ndarray

A 2D array of shape (n_samples, n_classes). Where each i-th row contains len(self.target_value) elements, representing the probability that the i-th sample of X belongs to a certain class label.

reset(self)[source]

Reset the estimator.

Resets the ADWIN Drift detector as well as the KNN model.

Returns
KNNADWINClassifier

self

score(self, X, y, sample_weight=None)[source]

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters
Xarray-like, shape = (n_samples, n_features)

Test samples.

yarray-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like, shape = [n_samples], optional

Sample weights.

Returns
scorefloat

Mean accuracy of self.predict(X) wrt. y.

set_params(self, **params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns
self
static valid_metrics()[source]

Get valid distance metrics for the KDTree.