skmultiflow.lazy.
KNNRegressor
k-Nearest Neighbors regressor.
This non-parametric regression method keeps track of the last max_window_size training samples. Predictions are obtained by aggregating the values of the closest n_neighbors stored-samples with respect to a query sample.
max_window_size
The number of nearest neighbors to search for.
The maximum size of the window storing the last observed samples.
sklearn.KDTree parameter. The maximum number of samples that can be stored in one leaf node, which determines from which point the algorithm will switch for a brute-force approach. The bigger this number the faster the tree construction time, but the slower the query time will be.
sklearn.KDTree parameter. The distance metric to use for the KDTree. Default=’euclidean’. KNNRegressor.valid_metrics() gives a list of the metrics which are valid for KDTree.
Notes
This estimator is not optimal for a mixture of categorical and numerical features. This implementation treats all features from a given stream as numerical.
Examples
>>> # Imports >>> from skmultiflow.data import RegressionGenerator >>> from skmultiflow.lazy import KNNRegressor >>> import numpy as np >>> >>> # Setup the data stream >>> stream = RegressionGenerator(random_state=1) >>> # Setup the estimator >>> knn = KNNRegressor() >>> >>> # Auxiliary variables to control loop and track performance >>> n_samples = 0 >>> correct_cnt = 0 >>> max_samples = 2000 >>> y_pred = np.zeros(max_samples) >>> y_true = np.zeros(max_samples) >>> >>> # Run test-then-train loop for max_samples or while there is data in the stream >>> while n_samples < max_samples and stream.has_more_samples(): ... X, y = stream.next_sample() ... y_true[n_samples] = y[0] ... y_pred[n_samples] = knn.predict(X)[0] ... knn.partial_fit(X, y) ... n_samples += 1 >>> >>> # Display results >>> print('{} samples analyzed.'.format(n_samples)) 2000 samples analyzed >>> print('KNN regressor mean absolute error: {}'.format(np.mean(np.abs(y_true - y_pred)))) KNN regressor mean absolute error: 144.5672450178514
Methods
fit(self, X, y[, sample_weight])
fit
Fit the model.
get_info(self)
get_info
Collects and returns the information about the configuration of the estimator
get_params(self[, deep])
get_params
Get parameters for this estimator.
partial_fit(self, X, y[, sample_weight])
partial_fit
Partially (incrementally) fit the model.
predict(self, X)
predict
Predict the target value for sample X
predict_proba(self, X)
predict_proba
Estimates the probability for probabilistic/bayesian regressors
reset(self)
reset
Reset estimator.
score(self, X, y[, sample_weight])
score
Returns the coefficient of determination R^2 of the prediction.
set_params(self, **params)
set_params
Set the parameters of this estimator.
valid_metrics()
valid_metrics
Get valid distance metrics for the KDTree.
The features to train the model.
An array-like with the target values of all samples in X.
Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.
Configuration of the estimator.
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Parameter names mapped to their values.
The data upon which the algorithm will create its model.
An array-like containing the target values for all samples in X.
self
For the K-Nearest Neighbors regressor, fitting the model is the equivalent of inserting the newer samples in the observed window, and if the size_limit is reached, removing older results.
Search the KDTree for the n_neighbors nearest neighbors.
All the samples we want to predict the target value for.
An array containing the predicted target values for each sample in X.
The matrix of samples one wants to predict the probabilities for.
The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.
Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator.
True values for X.
Sample weights.
R^2 of self.predict(X) wrt. y.
The R2 score used when calling score on a regressor will use multioutput='uniform_average' from version 0.23 to keep consistent with metrics.r2_score. This will influence the score method of all the multioutput regressors (except for multioutput.MultiOutputRegressor). To specify the default value manually and avoid the warning, please either call metrics.r2_score directly or make a custom scorer with metrics.make_scorer (the built-in scorer 'r2' uses multioutput='uniform_average').
multioutput='uniform_average'
'r2'
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
<component>__<parameter>