skmultiflow.lazy.KNNRegressor

class skmultiflow.lazy.KNNRegressor(n_neighbors=5, max_window_size=1000, leaf_size=30, metric='euclidean', aggregation_method='mean')[source]

k-Nearest Neighbors regressor.

This non-parametric regression method keeps track of the last max_window_size training samples. Predictions are obtained by aggregating the values of the closest n_neighbors stored-samples with respect to a query sample.

Parameters
n_neighbors: int (default=5)

The number of nearest neighbors to search for.

max_window_size: int (default=1000)

The maximum size of the window storing the last observed samples.

leaf_size: int (default=30)

sklearn.KDTree parameter. The maximum number of samples that can be stored in one leaf node, which determines from which point the algorithm will switch for a brute-force approach. The bigger this number the faster the tree construction time, but the slower the query time will be.

metric: string or sklearn.DistanceMetric object

sklearn.KDTree parameter. The distance metric to use for the KDTree. Default=’euclidean’. KNNRegressor.valid_metrics() gives a list of the metrics which are valid for KDTree.

aggregation_method: str (default=’mean’)
The method to aggregate the target values of neighbors.
‘mean’
‘median’

Notes

This estimator is not optimal for a mixture of categorical and numerical features. This implementation treats all features from a given stream as numerical.

Examples

>>> # Imports
>>> from skmultiflow.data import RegressionGenerator
>>> from skmultiflow.lazy import KNNRegressor
>>> import numpy as np
>>>
>>> # Setup the data stream
>>> stream = RegressionGenerator(random_state=1)
>>> # Setup the estimator
>>> knn = KNNRegressor()
>>>
>>> # Auxiliary variables to control loop and track performance
>>> n_samples = 0
>>> correct_cnt = 0
>>> max_samples = 2000
>>> y_pred = np.zeros(max_samples)
>>> y_true = np.zeros(max_samples)
>>>
>>> # Run test-then-train loop for max_samples or while there is data in the stream
>>> while n_samples < max_samples and stream.has_more_samples():
...     X, y = stream.next_sample()
...     y_true[n_samples] = y[0]
...     y_pred[n_samples] = knn.predict(X)[0]
...     knn.partial_fit(X, y)
...     n_samples += 1
>>>
>>> # Display results
>>> print('{} samples analyzed.'.format(n_samples))
2000 samples analyzed
>>> print('KNN regressor mean absolute error: {}'.format(np.mean(np.abs(y_true - y_pred))))
KNN regressor mean absolute error: 144.5672450178514

Methods

fit(self, X, y[, sample_weight])

Fit the model.

get_info(self)

Collects and returns the information about the configuration of the estimator

get_params(self[, deep])

Get parameters for this estimator.

partial_fit(self, X, y[, sample_weight])

Partially (incrementally) fit the model.

predict(self, X)

Predict the target value for sample X

predict_proba(self, X)

Estimates the probability for probabilistic/bayesian regressors

reset(self)

Reset estimator.

score(self, X, y[, sample_weight])

Returns the coefficient of determination R^2 of the prediction.

set_params(self, **params)

Set the parameters of this estimator.

valid_metrics()

Get valid distance metrics for the KDTree.

fit(self, X, y, sample_weight=None)[source]

Fit the model.

Parameters
Xnumpy.ndarray of shape (n_samples, n_features)

The features to train the model.

y: numpy.ndarray of shape (n_samples, n_targets)

An array-like with the target values of all samples in X.

sample_weight: numpy.ndarray, optional (default=None)

Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

Returns
self
get_info(self)[source]

Collects and returns the information about the configuration of the estimator

Returns
string

Configuration of the estimator.

get_params(self, deep=True)[source]

Get parameters for this estimator.

Parameters
deepboolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

partial_fit(self, X, y, sample_weight=None)[source]

Partially (incrementally) fit the model.

Parameters
X: numpy.ndarray of shape (n_samples, n_features)

The data upon which the algorithm will create its model.

y: numpy.ndarray of shape (n_samples)

An array-like containing the target values for all samples in X.

sample_weight: Not used.
Returns
KNNRegressor

self

Notes

For the K-Nearest Neighbors regressor, fitting the model is the equivalent of inserting the newer samples in the observed window, and if the size_limit is reached, removing older results.

predict(self, X)[source]

Predict the target value for sample X

Search the KDTree for the n_neighbors nearest neighbors.

Parameters
X: Numpy.ndarray of shape (n_samples, n_features)

All the samples we want to predict the target value for.

Returns
np.ndarray

An array containing the predicted target values for each sample in X.

predict_proba(self, X)[source]

Estimates the probability for probabilistic/bayesian regressors

Parameters
Xnumpy.ndarray of shape (n_samples, n_features)

The matrix of samples one wants to predict the probabilities for.

Returns
numpy.ndarray
reset(self)[source]

Reset estimator.

score(self, X, y, sample_weight=None)[source]

Returns the coefficient of determination R^2 of the prediction.

The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

Parameters
Xarray-like, shape = (n_samples, n_features)

Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator.

yarray-like, shape = (n_samples) or (n_samples, n_outputs)

True values for X.

sample_weightarray-like, shape = [n_samples], optional

Sample weights.

Returns
scorefloat

R^2 of self.predict(X) wrt. y.

Notes

The R2 score used when calling score on a regressor will use multioutput='uniform_average' from version 0.23 to keep consistent with metrics.r2_score. This will influence the score method of all the multioutput regressors (except for multioutput.MultiOutputRegressor). To specify the default value manually and avoid the warning, please either call metrics.r2_score directly or make a custom scorer with metrics.make_scorer (the built-in scorer 'r2' uses multioutput='uniform_average').

set_params(self, **params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns
self
static valid_metrics()[source]

Get valid distance metrics for the KDTree.