skmultiflow.trees.
StackedSingleTargetHoeffdingTreeRegressor
Stacked Single-target Hoeffding Tree regressor.
Implementation of the Stacked Single-target Hoeffding Tree (SST-HT) method for multi-target regression as proposed by S. M. Mastelini, S. Barbon Jr., and A. C. P. L. F. de Carvalho [1].
Maximum memory consumed by the tree.
Number of instances between memory consumption checks.
Number of instances a leaf should observe between split attempts.
Allowed error in split decision, a value closer to 0 takes longer to decide.
Threshold below which a split will be forced to break ties.
If True, only allow binary splits.
If True, stop growing as soon as memory limit is hit.
If True, disable poor attributes.
If True, disable pre-pruning.
Number of instances a leaf should observe before allowing Naive Bayes.
List of Nominal attributes. If emtpy, then assume that all attributes are numerical.
The learning rate of the perceptron.
Decay multiplier for the learning rate of the perceptron
If False the learning ratio will decay with the number of examples seen
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used when leaf_prediction is ‘perceptron’.
References
Mastelini, S. M., Barbon Jr, S., de Carvalho, A. C. P. L. F. (2019). “Online Multi-target regression trees with stacked leaf models”. arXiv preprint arXiv:1903.12483.
Examples
>>> # Imports >>> from skmultiflow.data import RegressionGenerator >>> from skmultiflow.trees import StackedSingleTargetHoeffdingTreeRegressor >>> import numpy as np >>> >>> # Setup a data stream >>> n_targets = 3 >>> stream = RegressionGenerator(n_targets=n_targets, random_state=1, n_samples=200) >>> >>> # Setup the Stacked Single-target Hoeffding Tree Regressor >>> sst_ht = StackedSingleTargetHoeffdingTreeRegressor() >>> >>> # Auxiliary variables to control loop and track performance >>> n_samples = 0 >>> max_samples = 200 >>> y_pred = np.zeros((max_samples, n_targets)) >>> y_true = np.zeros((max_samples, n_targets)) >>> >>> # Run test-then-train loop for max_samples and while there is data >>> while n_samples < max_samples and stream.has_more_samples(): >>> X, y = stream.next_sample() >>> y_true[n_samples] = y[0] >>> y_pred[n_samples] = sst_ht.predict(X)[0] >>> sst_ht.partial_fit(X, y) >>> n_samples += 1 >>> >>> # Display results >>> print('Stacked Single-target Hoeffding Tree regressor example') >>> print('{} samples analyzed.'.format(n_samples)) >>> print('Mean absolute error: {}'.format(np.mean(np.abs(y_true - y_pred))))
Methods
fit(X, y[, sample_weight])
fit
Fit the model.
get_info()
get_info
Collects and returns the information about the configuration of the estimator
get_model_description()
get_model_description
Walk the tree and return its structure in a buffer.
get_model_rules()
get_model_rules
Returns list of rules describing the tree.
get_params([deep])
get_params
Get parameters for this estimator.
get_rules_description()
get_rules_description
Prints the description of tree using rules.
measure_byte_size()
measure_byte_size
Calculate the size of the tree.
normalize_sample(X)
normalize_sample
Normalize the features in order to have the same influence during the process of training.
normalize_target_value(y)
normalize_target_value
Normalize the targets in order to have the same influence during the process of training.
partial_fit(X, y[, sample_weight])
partial_fit
Incrementally trains the model.
predict(X)
predict
Predicts the target value using mean class or the perceptron.
predict_proba(X)
predict_proba
Not implemented for this method
reset()
reset
Reset the Hoeffding Tree to default values.
score(X, y[, sample_weight])
score
Returns the coefficient of determination R^2 of the prediction.
set_params(**params)
set_params
Set the parameters of this estimator.
Attributes
leaf_prediction
model_measurements
Collect metrics corresponding to the current status of the tree.
split_criterion
The features to train the model.
An array-like with the target values of all samples in X.
Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.
Configuration of the estimator.
The description of the model.
list of the rules describing the tree
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Parameter names mapped to their values.
Size of the tree in bytes.
A string buffer containing the measurements of the tree.
features.
normalized samples
targets.
normalized targets values
Incrementally trains the model. Train samples (instances) are composed of X attributes and their corresponding targets y.
Tasks performed before training:
Verify instance weight. if not provided, uniform weights (1.0) are assumed.
If more than one instance is passed, loop through X and pass instances one at a time.
Update weight seen by model.
Training tasks:
If the tree is empty, create a leaf node as the root.
If the tree is already initialized, find the corresponding leaf for the instance and update the leaf node statistics.
If growth is allowed and the number of instances that the leaf has observed between split attempts exceed the grace period then attempt to split.
Instance attributes.
Target values.
Samples weight. If not provided, uniform weights are assumed.
Samples for which we want to predict the labels.
Predicted target values.
The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.
Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator.
True values for X.
Sample weights.
R^2 of self.predict(X) wrt. y.
Notes
The R2 score used when calling score on a regressor will use multioutput='uniform_average' from version 0.23 to keep consistent with metrics.r2_score. This will influence the score method of all the multioutput regressors (except for multioutput.MultiOutputRegressor). To specify the default value manually and avoid the warning, please either call metrics.r2_score directly or make a custom scorer with metrics.make_scorer (the built-in scorer 'r2' uses multioutput='uniform_average').
multioutput='uniform_average'
'r2'
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
<component>__<parameter>