skmultiflow.lazy.KNNRegressor¶

class
skmultiflow.lazy.
KNNRegressor
(n_neighbors=5, max_window_size=1000, leaf_size=30, metric='euclidean', aggregation_method='mean')[source]¶ kNearest Neighbors regressor.
This nonparametric regression method keeps track of the last
max_window_size
training samples. Predictions are obtained by aggregating the values of the closest n_neighbors storedsamples with respect to a query sample. Parameters
 n_neighbors: int (default=5)
The number of nearest neighbors to search for.
 max_window_size: int (default=1000)
The maximum size of the window storing the last observed samples.
 leaf_size: int (default=30)
sklearn.KDTree parameter. The maximum number of samples that can be stored in one leaf node, which determines from which point the algorithm will switch for a bruteforce approach. The bigger this number the faster the tree construction time, but the slower the query time will be.
 metric: string or sklearn.DistanceMetric object
sklearn.KDTree parameter. The distance metric to use for the KDTree. Default=’euclidean’. KNNRegressor.valid_metrics() gives a list of the metrics which are valid for KDTree.
 aggregation_method: str (default=’mean’)
 The method to aggregate the target values of neighbors.‘mean’‘median’
Notes
This estimator is not optimal for a mixture of categorical and numerical features. This implementation treats all features from a given stream as numerical.
Examples
>>> # Imports >>> from skmultiflow.data import RegressionGenerator >>> from skmultiflow.lazy import KNNRegressor >>> import numpy as np >>> >>> # Setup the data stream >>> stream = RegressionGenerator(random_state=1) >>> # Setup the estimator >>> knn = KNNRegressor() >>> >>> # Auxiliary variables to control loop and track performance >>> n_samples = 0 >>> correct_cnt = 0 >>> max_samples = 2000 >>> y_pred = np.zeros(max_samples) >>> y_true = np.zeros(max_samples) >>> >>> # Run testthentrain loop for max_samples or while there is data in the stream >>> while n_samples < max_samples and stream.has_more_samples(): ... X, y = stream.next_sample() ... y_true[n_samples] = y[0] ... y_pred[n_samples] = knn.predict(X)[0] ... knn.partial_fit(X, y) ... n_samples += 1 >>> >>> # Display results >>> print('{} samples analyzed.'.format(n_samples)) 2000 samples analyzed >>> print('KNN regressor mean absolute error: {}'.format(np.mean(np.abs(y_true  y_pred)))) KNN regressor mean absolute error: 144.5672450178514
Methods
fit
(self, X, y[, sample_weight])Fit the model.
get_info
(self)Collects and returns the information about the configuration of the estimator
get_params
(self[, deep])Get parameters for this estimator.
partial_fit
(self, X, y[, sample_weight])Partially (incrementally) fit the model.
predict
(self, X)Predict the target value for sample X
predict_proba
(self, X)Estimates the probability for probabilistic/bayesian regressors
reset
(self)Reset estimator.
score
(self, X, y[, sample_weight])Returns the coefficient of determination R^2 of the prediction.
set_params
(self, **params)Set the parameters of this estimator.
Get valid distance metrics for the KDTree.

fit
(self, X, y, sample_weight=None)[source]¶ Fit the model.
 Parameters
 Xnumpy.ndarray of shape (n_samples, n_features)
The features to train the model.
 y: numpy.ndarray of shape (n_samples, n_targets)
An arraylike with the target values of all samples in X.
 sample_weight: numpy.ndarray, optional (default=None)
Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.
 Returns
 self

get_info
(self)[source]¶ Collects and returns the information about the configuration of the estimator
 Returns
 string
Configuration of the estimator.

get_params
(self, deep=True)[source]¶ Get parameters for this estimator.
 Parameters
 deepboolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
 Returns
 paramsmapping of string to any
Parameter names mapped to their values.

partial_fit
(self, X, y, sample_weight=None)[source]¶ Partially (incrementally) fit the model.
 Parameters
 X: numpy.ndarray of shape (n_samples, n_features)
The data upon which the algorithm will create its model.
 y: numpy.ndarray of shape (n_samples)
An arraylike containing the target values for all samples in X.
 sample_weight: Not used.
 Returns
 KNNRegressor
self
Notes
For the KNearest Neighbors regressor, fitting the model is the equivalent of inserting the newer samples in the observed window, and if the size_limit is reached, removing older results.

predict
(self, X)[source]¶ Predict the target value for sample X
Search the KDTree for the n_neighbors nearest neighbors.
 Parameters
 X: Numpy.ndarray of shape (n_samples, n_features)
All the samples we want to predict the target value for.
 Returns
 np.ndarray
An array containing the predicted target values for each sample in X.

predict_proba
(self, X)[source]¶ Estimates the probability for probabilistic/bayesian regressors
 Parameters
 Xnumpy.ndarray of shape (n_samples, n_features)
The matrix of samples one wants to predict the probabilities for.
 Returns
 numpy.ndarray

score
(self, X, y, sample_weight=None)[source]¶ Returns the coefficient of determination R^2 of the prediction.
The coefficient R^2 is defined as (1  u/v), where u is the residual sum of squares ((y_true  y_pred) ** 2).sum() and v is the total sum of squares ((y_true  y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.
 Parameters
 Xarraylike, shape = (n_samples, n_features)
Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator.
 yarraylike, shape = (n_samples) or (n_samples, n_outputs)
True values for X.
 sample_weightarraylike, shape = [n_samples], optional
Sample weights.
 Returns
 scorefloat
R^2 of self.predict(X) wrt. y.
Notes
The R2 score used when calling
score
on a regressor will usemultioutput='uniform_average'
from version 0.23 to keep consistent with metrics.r2_score. This will influence thescore
method of all the multioutput regressors (except for multioutput.MultiOutputRegressor). To specify the default value manually and avoid the warning, please either call metrics.r2_score directly or make a custom scorer with metrics.make_scorer (the builtin scorer'r2'
usesmultioutput='uniform_average'
).

set_params
(self, **params)[source]¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object. Returns
 self