# skmultiflow.lazy.KNNClassifier¶

class skmultiflow.lazy.KNNClassifier(n_neighbors=5, max_window_size=1000, leaf_size=30, metric='euclidean')[source]

k-Nearest Neighbors classifier.

This non-parametric classification method keeps track of the last max_window_size training samples. The predicted class-label for a given query sample is obtained in two steps:

1. Find the closest n_neighbors to the query sample in the data window.

2. Aggregate the class-labels of the n_neighbors to define the predicted class for the query sample.

Parameters
n_neighbors: int (default=5)

The number of nearest neighbors to search for.

max_window_size: int (default=1000)

The maximum size of the window storing the last observed samples.

leaf_size: int (default=30)

sklearn.KDTree parameter. The maximum number of samples that can be stored in one leaf node, which determines from which point the algorithm will switch for a brute-force approach. The bigger this number the faster the tree construction time, but the slower the query time will be.

metric: string or sklearn.DistanceMetric object

sklearn.KDTree parameter. The distance metric to use for the KDTree. Default=’euclidean’. KNNClassifier.valid_metrics() gives a list of the metrics which are valid for KDTree.

Notes

This estimator is not optimal for a mixture of categorical and numerical features. This implementation treats all features from a given stream as numerical.

Examples

>>> # Imports
>>> from skmultiflow.lazy import KNNClassifier
>>> from skmultiflow.data import SEAGenerator
>>> # Setting up the stream
>>> stream = SEAGenerator(random_state=1, noise_percentage=.1)
>>> knn = KNNClassifier(n_neighbors=8, max_window_size=2000, leaf_size=40)
>>> # Keep track of sample count and correct prediction count
>>> n_samples = 0
>>> corrects = 0
>>> while n_samples < 5000:
...     X, y = stream.next_sample()
...     my_pred = knn.predict(X)
...     if y[0] == my_pred[0]:
...         corrects += 1
...     knn = knn.partial_fit(X, y)
...     n_samples += 1
>>>
>>> # Displaying results
>>> print('KNNClassifier usage example')
>>> print('{} samples analyzed.'.format(n_samples))
5000 samples analyzed.
>>> print("KNNClassifier's performance: {}".format(corrects/n_samples))
KNN's performance: 0.8776


Methods

 fit(self, X, y[, classes, sample_weight]) Fit the model. get_info(self) Collects and returns the information about the configuration of the estimator get_params(self[, deep]) Get parameters for this estimator. partial_fit(self, X, y[, classes, sample_weight]) Partially (incrementally) fit the model. predict(self, X) Predict the class label for sample X predict_proba(self, X) Estimate the probability of X belonging to each class-labels. reset(self) Reset estimator. score(self, X, y[, sample_weight]) Returns the mean accuracy on the given test data and labels. set_params(self, **params) Set the parameters of this estimator. Get valid distance metrics for the KDTree.
fit(self, X, y, classes=None, sample_weight=None)[source]

Fit the model.

Parameters
Xnumpy.ndarray of shape (n_samples, n_features)

The features to train the model.

y: numpy.ndarray of shape (n_samples, n_targets)

An array-like with the class labels of all samples in X.

classes: numpy.ndarray, optional (default=None)

Contains all possible/known class labels. Usage varies depending on the learning method.

sample_weight: numpy.ndarray, optional (default=None)

Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

Returns
self
get_info(self)[source]

Collects and returns the information about the configuration of the estimator

Returns
string

Configuration of the estimator.

get_params(self, deep=True)[source]

Get parameters for this estimator.

Parameters
deepboolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

partial_fit(self, X, y, classes=None, sample_weight=None)[source]

Partially (incrementally) fit the model.

Parameters
X: Numpy.ndarray of shape (n_samples, n_features)

The data upon which the algorithm will create its model.

y: Array-like

An array-like containing the classification targets for all samples in X.

classes: numpy.ndarray, optional (default=None)

Array with all possible/known classes.

sample_weight: Not used.
Returns
KNNClassifier

self

Notes

For the K-Nearest Neighbors Classifier, fitting the model is the equivalent of inserting the newer samples in the observed window, and if the size_limit is reached, removing older results. To store the viewed samples we use a InstanceWindow object. For this class’ documentation please visit skmultiflow.core.utils.data_structures

predict(self, X)[source]

Predict the class label for sample X

Parameters
X: Numpy.ndarray of shape (n_samples, n_features)

All the samples we want to predict the label for.

Returns
numpy.ndarray

A 1D array of shape (, n_samples), containing the predicted class labels for all instances in X.

predict_proba(self, X)[source]

Estimate the probability of X belonging to each class-labels.

Parameters
X: Numpy.ndarray of shape (n_samples, n_features)
Returns
numpy.ndarray

A 2D array of shape (n_samples, n_classes). Where each i-th row contains len(self.target_value) elements, representing the probability that the i-th sample of X belongs to a certain class label.

reset(self)[source]

Reset estimator.

score(self, X, y, sample_weight=None)[source]

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters
Xarray-like, shape = (n_samples, n_features)

Test samples.

yarray-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like, shape = [n_samples], optional

Sample weights.

Returns
scorefloat

Mean accuracy of self.predict(X) wrt. y.

set_params(self, **params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns
self
static valid_metrics()[source]

Get valid distance metrics for the KDTree.