skmultiflow.lazy.KNNRegressor¶

class skmultiflow.lazy.KNNRegressor(n_neighbors=5, max_window_size=1000, leaf_size=30, metric='euclidean', aggregation_method='mean')[source]¶

k-Nearest Neighbors regressor.

This non-parametric regression method keeps track of the last max_window_size training samples. Predictions are obtained by aggregating the values of the closest n_neighbors stored-samples with respect to a query sample.

Parameters

n_neighbors: int (default=5): The number of nearest neighbors to search for.
max_window_size: int (default=1000): The maximum size of the window storing the last observed samples.
leaf_size: int (default=30): sklearn.KDTree parameter. The maximum number of samples that can be stored in one leaf node, which determines from which point the algorithm will switch for a brute-force approach. The bigger this number the faster the tree construction time, but the slower the query time will be.
metric: string or sklearn.DistanceMetric object: sklearn.KDTree parameter. The distance metric to use for the KDTree. Default=’euclidean’. KNNRegressor.valid_metrics() gives a list of the metrics which are valid for KDTree.
aggregation_method: str (default=’mean’): The method to aggregate the target values of neighbors.

‘mean’

‘median’

Notes

This estimator is not optimal for a mixture of categorical and numerical features. This implementation treats all features from a given stream as numerical.

Examples

>>> # Imports
>>> from skmultiflow.data import RegressionGenerator
>>> from skmultiflow.lazy import KNNRegressor
>>> import numpy as np
>>>
>>> # Setup the data stream
>>> stream = RegressionGenerator(random_state=1)
>>> # Setup the estimator
>>> knn = KNNRegressor()
>>>
>>> # Auxiliary variables to control loop and track performance
>>> n_samples = 0
>>> correct_cnt = 0
>>> max_samples = 2000
>>> y_pred = np.zeros(max_samples)
>>> y_true = np.zeros(max_samples)
>>>
>>> # Run test-then-train loop for max_samples or while there is data in the stream
>>> while n_samples < max_samples and stream.has_more_samples():
...     X, y = stream.next_sample()
...     y_true[n_samples] = y[0]
...     y_pred[n_samples] = knn.predict(X)[0]
...     knn.partial_fit(X, y)
...     n_samples += 1
>>>
>>> # Display results
>>> print('{} samples analyzed.'.format(n_samples))
2000 samples analyzed
>>> print('KNN regressor mean absolute error: {}'.format(np.mean(np.abs(y_true - y_pred))))
KNN regressor mean absolute error: 144.5672450178514

Methods

`fit`(self, X, y[, sample_weight])	Fit the model.
`get_info`(self)	Collects and returns the information about the configuration of the estimator
`get_params`(self[, deep])	Get parameters for this estimator.
`partial_fit`(self, X, y[, sample_weight])	Partially (incrementally) fit the model.
`predict`(self, X)	Predict the target value for sample X
`predict_proba`(self, X)	Estimates the probability for probabilistic/bayesian regressors
`reset`(self)	Reset estimator.
`score`(self, X, y[, sample_weight])	Returns the coefficient of determination R^2 of the prediction.
`set_params`(self, **params)	Set the parameters of this estimator.
`valid_metrics`()	Get valid distance metrics for the KDTree.

fit(self, X, y, sample_weight=None)[source]¶

Fit the model.

Parameters

Xnumpy.ndarray of shape (n_samples, n_features): The features to train the model.
y: numpy.ndarray of shape (n_samples, n_targets): An array-like with the target values of all samples in X.
sample_weight: numpy.ndarray, optional (default=None): Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

Returns

self

get_info(self)[source]¶

Collects and returns the information about the configuration of the estimator

Returns

string: Configuration of the estimator.

get_params(self, deep=True)[source]¶

Get parameters for this estimator.

Parameters

deepboolean, optional: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsmapping of string to any: Parameter names mapped to their values.

partial_fit(self, X, y, sample_weight=None)[source]¶

Partially (incrementally) fit the model.

Parameters

X: numpy.ndarray of shape (n_samples, n_features): The data upon which the algorithm will create its model.
y: numpy.ndarray of shape (n_samples): An array-like containing the target values for all samples in X.
sample_weight: Not used.

Returns

KNNRegressor: self

Notes

For the K-Nearest Neighbors regressor, fitting the model is the equivalent of inserting the newer samples in the observed window, and if the size_limit is reached, removing older results.

predict(self, X)[source]¶

Predict the target value for sample X

Search the KDTree for the n_neighbors nearest neighbors.

Parameters

X: Numpy.ndarray of shape (n_samples, n_features): All the samples we want to predict the target value for.

Returns

np.ndarray: An array containing the predicted target values for each sample in X.

predict_proba(self, X)[source]¶

Estimates the probability for probabilistic/bayesian regressors

Parameters

Xnumpy.ndarray of shape (n_samples, n_features): The matrix of samples one wants to predict the probabilities for.

Returns

numpy.ndarray

reset(self)[source]¶: Reset estimator.

score(self, X, y, sample_weight=None)[source]¶

Returns the coefficient of determination R^2 of the prediction.

The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

Parameters

Xarray-like, shape = (n_samples, n_features): Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator.
yarray-like, shape = (n_samples) or (n_samples, n_outputs): True values for X.
sample_weightarray-like, shape = [n_samples], optional: Sample weights.

Returns

scorefloat: R^2 of self.predict(X) wrt. y.

Notes

The R2 score used when calling score on a regressor will use multioutput='uniform_average' from version 0.23 to keep consistent with metrics.r2_score. This will influence the score method of all the multioutput regressors (except for multioutput.MultiOutputRegressor). To specify the default value manually and avoid the warning, please either call metrics.r2_score directly or make a custom scorer with metrics.make_scorer (the built-in scorer 'r2' uses multioutput='uniform_average').

set_params(self, **params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

self

static valid_metrics()[source]¶: Get valid distance metrics for the KDTree.