skmultiflow.lazy.KNNClassifier¶

class skmultiflow.lazy.KNNClassifier(n_neighbors=5, max_window_size=1000, leaf_size=30, metric='euclidean')[source]¶

k-Nearest Neighbors classifier.

This non-parametric classification method keeps track of the last max_window_size training samples. The predicted class-label for a given query sample is obtained in two steps:

Find the closest n_neighbors to the query sample in the data window.
Aggregate the class-labels of the n_neighbors to define the predicted class for the query sample.

Parameters

n_neighbors: int (default=5): The number of nearest neighbors to search for.
max_window_size: int (default=1000): The maximum size of the window storing the last observed samples.
leaf_size: int (default=30): sklearn.KDTree parameter. The maximum number of samples that can be stored in one leaf node, which determines from which point the algorithm will switch for a brute-force approach. The bigger this number the faster the tree construction time, but the slower the query time will be.
metric: string or sklearn.DistanceMetric object: sklearn.KDTree parameter. The distance metric to use for the KDTree. Default=’euclidean’. KNNClassifier.valid_metrics() gives a list of the metrics which are valid for KDTree.

Notes

This estimator is not optimal for a mixture of categorical and numerical features. This implementation treats all features from a given stream as numerical.

Examples

>>> # Imports
>>> from skmultiflow.lazy import KNNClassifier
>>> from skmultiflow.data import SEAGenerator
>>> # Setting up the stream
>>> stream = SEAGenerator(random_state=1, noise_percentage=.1)
>>> knn = KNNClassifier(n_neighbors=8, max_window_size=2000, leaf_size=40)
>>> # Keep track of sample count and correct prediction count
>>> n_samples = 0
>>> corrects = 0
>>> while n_samples < 5000:
...     X, y = stream.next_sample()
...     my_pred = knn.predict(X)
...     if y[0] == my_pred[0]:
...         corrects += 1
...     knn = knn.partial_fit(X, y)
...     n_samples += 1
>>>
>>> # Displaying results
>>> print('KNNClassifier usage example')
>>> print('{} samples analyzed.'.format(n_samples))
5000 samples analyzed.
>>> print("KNNClassifier's performance: {}".format(corrects/n_samples))
KNN's performance: 0.8776

Methods

`fit`(self, X, y[, classes, sample_weight])	Fit the model.
`get_info`(self)	Collects and returns the information about the configuration of the estimator
`get_params`(self[, deep])	Get parameters for this estimator.
`partial_fit`(self, X, y[, classes, sample_weight])	Partially (incrementally) fit the model.
`predict`(self, X)	Predict the class label for sample X
`predict_proba`(self, X)	Estimate the probability of X belonging to each class-labels.
`reset`(self)	Reset estimator.
`score`(self, X, y[, sample_weight])	Returns the mean accuracy on the given test data and labels.
`set_params`(self, **params)	Set the parameters of this estimator.
`valid_metrics`()	Get valid distance metrics for the KDTree.

fit(self, X, y, classes=None, sample_weight=None)[source]¶

Fit the model.

Parameters

Xnumpy.ndarray of shape (n_samples, n_features): The features to train the model.
y: numpy.ndarray of shape (n_samples, n_targets): An array-like with the class labels of all samples in X.
classes: numpy.ndarray, optional (default=None): Contains all possible/known class labels. Usage varies depending on the learning method.
sample_weight: numpy.ndarray, optional (default=None): Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

Returns

self

get_info(self)[source]¶

Collects and returns the information about the configuration of the estimator

Returns

string: Configuration of the estimator.

get_params(self, deep=True)[source]¶

Get parameters for this estimator.

Parameters

deepboolean, optional: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsmapping of string to any: Parameter names mapped to their values.

partial_fit(self, X, y, classes=None, sample_weight=None)[source]¶

Partially (incrementally) fit the model.

Parameters

X: Numpy.ndarray of shape (n_samples, n_features): The data upon which the algorithm will create its model.
y: Array-like: An array-like containing the classification targets for all samples in X.
classes: numpy.ndarray, optional (default=None): Array with all possible/known classes.
sample_weight: Not used.

Returns

KNNClassifier: self

Notes

For the K-Nearest Neighbors Classifier, fitting the model is the equivalent of inserting the newer samples in the observed window, and if the size_limit is reached, removing older results. To store the viewed samples we use a InstanceWindow object. For this class’ documentation please visit skmultiflow.core.utils.data_structures

predict(self, X)[source]¶

Predict the class label for sample X

Parameters

X: Numpy.ndarray of shape (n_samples, n_features): All the samples we want to predict the label for.

Returns

numpy.ndarray: A 1D array of shape (, n_samples), containing the predicted class labels for all instances in X.

predict_proba(self, X)[source]¶

Estimate the probability of X belonging to each class-labels.

Parameters

X: Numpy.ndarray of shape (n_samples, n_features)

Returns

numpy.ndarray: A 2D array of shape (n_samples, n_classes). Where each i-th row contains len(self.target_value) elements, representing the probability that the i-th sample of X belongs to a certain class label.

reset(self)[source]¶: Reset estimator.

score(self, X, y, sample_weight=None)[source]¶

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

Xarray-like, shape = (n_samples, n_features): Test samples.
yarray-like, shape = (n_samples) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like, shape = [n_samples], optional: Sample weights.

Returns

scorefloat: Mean accuracy of self.predict(X) wrt. y.

set_params(self, **params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

self

static valid_metrics()[source]¶: Get valid distance metrics for the KDTree.