skmultiflow.meta.
AdaptiveRandomForestRegressor
Adaptive Random Forest regressor.
Number of trees in the ensemble.
max_features
int(max_features * n_features)
max_features=sqrt(n_features)
max_features=log2(n_features)
max_features=n_features
The lambda value for bagging (lambda=6 corresponds to Leverage Bagging).
aggregation_method='mean'
Drift Detection method. Set to None to disable Drift detection.
Warning Detection method. Set to None to disable warning detection.
(ARFHoeffdingTreeRegressor parameter) Maximum memory consumed by the tree.
(ARFHoeffdingTreeRegressor parameter) Number of instances between memory consumption checks.
(ARFHoeffdingTreeRegressor parameter) Number of instances a leaf should observe between split attempts.
(ARFHoeffdingTreeRegressor parameter) Allowed error in split decision, a value closer to 0 takes longer to decide.
(ARFHoeffdingTreeRegressor parameter) Threshold below which a split will be forced to break ties.
(ARFHoeffdingTreeRegressor parameter) If True, only allow binary splits.
(ARFHoeffdingTreeRegressor parameter) If True, stop growing as soon as memory limit is hit.
(ARFHoeffdingTreeRegressor parameter) If True, disable poor attributes.
(ARFHoeffdingTreeRegressor parameter) If True, disable pre-pruning.
(ARFHoeffdingTreeRegressor parameter) List of Nominal attributes. If emtpy, then assume that all attributes are numerical.
(ARFHoeffdingTreeRegressor parameter) The learning rate of the perceptron.
(ARFHoeffdingTreeRegressor parameter) Decay multiplier for the learning rate of the perceptron
(ARFHoeffdingTreeRegressor parameter) If False the learning ratio will decay with the number of examples seen.
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used when leaf_prediction is ‘perceptron’.
Notes
The 3 most important aspects of Adaptive Random Forest [1] are: (1) inducing diversity through re-sampling; (2) inducing diversity through randomly selecting subsets of features for node splits (see skmultiflow.trees.arf_hoeffding_tree); (3) drift detectors per base tree, which cause selective resets in response to drifts. It also allows training background trees, which start training if a warning is detected and replace the active tree if the warning escalates to a drift.
Notice that this implementation is slightly different from the original algorithm proposed in [2]. The HoeffdingTreeRegressor is used as base learner, instead of FIMT-DD. It also adds a new strategy to monitor the incoming data and check for concept drifts. The monitored data (either the trees’ errors or their predictions) are centered and scaled (z-score normalization) to have zero mean and unit standard deviation. Transformed values are then again normalized in the [0, 1] range to fulfil ADWIN’s requirements. We assume that the data subjected to the z-score normalization lies within the interval of the mean \(\pm3\sigma\), as it occurs in normal distributions.
References
Gomes, H.M., Bifet, A., Read, J., Barddal, J.P., Enembreck, F., Pfharinger, B., Holmes, G. and Abdessalem, T., 2017. Adaptive random forests for evolving data stream classification. Machine Learning, 106(9-10), pp.1469-1495.
Gomes, H.M., Barddal, J.P., Boiko, L.E., Bifet, A., 2018. Adaptive random forests for data stream regression. ESANN 2018.
Examples
>>> # Imports >>> from skmultiflow.data import RegressionGenerator >>> from skmultiflow.meta import AdaptiveRandomForestRegressor >>> import numpy as np >>> >>> # Setup a data stream >>> stream = RegressionGenerator(random_state=1, n_samples=200) >>> # Prepare stream for use >>> >>> # Setup the Adaptive Random Forest regressor >>> arf_reg = AdaptiveRandomForestRegressor(random_state=123456) >>> >>> # Auxiliary variables to control loop and track performance >>> n_samples = 0 >>> max_samples = 200 >>> y_pred = np.zeros(max_samples) >>> y_true = np.zeros(max_samples) >>> >>> # Run test-then-train loop for max_samples and while there is data >>> while n_samples < max_samples and stream.has_more_samples(): >>> X, y = stream.next_sample() >>> y_true[n_samples] = y[0] >>> y_pred[n_samples] = arf_reg.predict(X)[0] >>> arf_reg.partial_fit(X, y) >>> n_samples += 1 >>> >>> # Display results >>> print('Adaptive Random Forest regressor example') >>> print('{} samples analyzed.'.format(n_samples)) >>> print('Mean absolute error: {}'.format(np.mean(np.abs(y_true - y_pred))))
Methods
fit(self, X, y[, sample_weight])
fit
Fit the model.
get_info(self)
get_info
Collects and returns the information about the configuration of the estimator
get_params(self[, deep])
get_params
Get parameters for this estimator.
get_votes_for_instance(self, X)
get_votes_for_instance
partial_fit(self, X, y[, sample_weight])
partial_fit
Partially (incrementally) fit the model.
predict(self, X)
predict
Predict target values for the passed data.
predict_proba(self, X)
predict_proba
Not implemented for this method.
reset(self)
reset
Reset ARFR.
score(self, X, y[, sample_weight])
score
Returns the coefficient of determination R^2 of the prediction.
set_params(self, **params)
set_params
Set the parameters of this estimator.
The features to train the model.
An array-like with the target values of all samples in X.
Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.
Configuration of the estimator.
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Parameter names mapped to their values.
This parameter it is not used in AdaptiveRandomForestRegressor since the ensemble algorithm internally assign different weights to the incoming instances. Kept for method’s signature compatibility purpose only.
The set of data samples for which to predict the target value.
The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.
Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator.
True values for X.
Sample weights.
R^2 of self.predict(X) wrt. y.
The R2 score used when calling score on a regressor will use multioutput='uniform_average' from version 0.23 to keep consistent with metrics.r2_score. This will influence the score method of all the multioutput regressors (except for multioutput.MultiOutputRegressor). To specify the default value manually and avoid the warning, please either call metrics.r2_score directly or make a custom scorer with metrics.make_scorer (the built-in scorer 'r2' uses multioutput='uniform_average').
multioutput='uniform_average'
'r2'
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
<component>__<parameter>