skmultiflow.trees.StackedSingleTargetHoeffdingTreeRegressor

class skmultiflow.trees.StackedSingleTargetHoeffdingTreeRegressor(max_byte_size=33554432, memory_estimate_period=1000000, grace_period=200, split_confidence=1e-07, tie_threshold=0.05, binary_split=False, stop_mem_management=False, remove_poor_atts=False, leaf_prediction='perceptron', no_preprune=False, nb_threshold=0, nominal_attributes=None, learning_ratio_perceptron=0.02, learning_ratio_decay=0.001, learning_ratio_const=True, random_state=None)[source]

Stacked Single-target Hoeffding Tree regressor.

Implementation of the Stacked Single-target Hoeffding Tree (SST-HT) method for multi-target regression as proposed by S. M. Mastelini, S. Barbon Jr., and A. C. P. L. F. de Carvalho [1].

Parameters
max_byte_size: int (default=33554432)

Maximum memory consumed by the tree.

memory_estimate_period: int (default=1000000)

Number of instances between memory consumption checks.

grace_period: int (default=200)

Number of instances a leaf should observe between split attempts.

split_confidence: float (default=0.0000001)

Allowed error in split decision, a value closer to 0 takes longer to decide.

tie_threshold: float (default=0.05)

Threshold below which a split will be forced to break ties.

binary_split: boolean (default=False)

If True, only allow binary splits.

stop_mem_management: boolean (default=False)

If True, stop growing as soon as memory limit is hit.

remove_poor_atts: boolean (default=False)

If True, disable poor attributes.

no_preprune: boolean (default=False)

If True, disable pre-pruning.

leaf_prediction: string (default=’perceptron’)
Prediction mechanism used at leafs.
‘perceptron’ - Stacked perceptron
‘adaptive’ - Adaptively chooses between the best predictor (mean, perceptron or stacked perceptron)
nb_threshold: int (default=0)

Number of instances a leaf should observe before allowing Naive Bayes.

nominal_attributes: list, optional

List of Nominal attributes. If emtpy, then assume that all attributes are numerical.

learning_ratio_perceptron: float

The learning rate of the perceptron.

learning_ratio_decay: float

Decay multiplier for the learning rate of the perceptron

learning_ratio_const: Bool

If False the learning ratio will decay with the number of examples seen

random_state: int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used when leaf_prediction is ‘perceptron’.

References

1

Mastelini, S. M., Barbon Jr, S., de Carvalho, A. C. P. L. F. (2019). “Online Multi-target regression trees with stacked leaf models”. arXiv preprint arXiv:1903.12483.

Examples

>>> # Imports
>>> from skmultiflow.data import RegressionGenerator
>>> from skmultiflow.trees import StackedSingleTargetHoeffdingTreeRegressor
>>> import numpy as np
>>>
>>> # Setup a data stream
>>> n_targets = 3
>>> stream = RegressionGenerator(n_targets=n_targets, random_state=1, n_samples=200)
>>>
>>> # Setup the Stacked Single-target Hoeffding Tree Regressor
>>> sst_ht = StackedSingleTargetHoeffdingTreeRegressor()
>>>
>>> # Auxiliary variables to control loop and track performance
>>> n_samples = 0
>>> max_samples = 200
>>> y_pred = np.zeros((max_samples, n_targets))
>>> y_true = np.zeros((max_samples, n_targets))
>>>
>>> # Run test-then-train loop for max_samples and while there is data
>>> while n_samples < max_samples and stream.has_more_samples():
>>>     X, y = stream.next_sample()
>>>     y_true[n_samples] = y[0]
>>>     y_pred[n_samples] = sst_ht.predict(X)[0]
>>>     sst_ht.partial_fit(X, y)
>>>     n_samples += 1
>>>
>>> # Display results
>>> print('Stacked Single-target Hoeffding Tree regressor example')
>>> print('{} samples analyzed.'.format(n_samples))
>>> print('Mean absolute error: {}'.format(np.mean(np.abs(y_true - y_pred))))

Methods

compute_hoeffding_bound(range_val, confidence, n)

Compute the Hoeffding bound, used to decide how many samples are necessary at each node.

deactivate_all_leaves(self)

Deactivate all leaves.

enforce_tracker_limit(self)

Track the size of the tree and disable/enable nodes if required.

estimate_model_byte_size(self)

Calculate the size of the model and trigger tracker function if the actual model size exceeds the max size in the configuration.

fit(self, X, y[, sample_weight])

Fit the model.

get_info(self)

Collects and returns the information about the configuration of the estimator

get_model_description(self)

Walk the tree and return its structure in a buffer.

get_model_rules(self)

Returns list of list describing the tree.

get_params(self[, deep])

Get parameters for this estimator.

get_rules_description(self)

Prints the the description of tree using rules.

get_votes_for_instance(self, X)

Get class votes for a single instance.

get_weights_for_instance(self, X)

Get class votes for a single instance.

measure_byte_size(self)

Calculate the size of the tree.

measure_tree_depth(self)

Calculate the depth of the tree.

new_split_node(self, split_test, …)

Create a new split node.

normalize_sample(self, X)

Normalize the features in order to have the same influence during the process of training.

normalize_target_value(self, y)

Normalize the targets in order to have the same influence during the process of training.

partial_fit(self, X, y[, sample_weight])

Incrementally trains the model.

predict(self, X)

Predicts the target value using mean class or the perceptron.

predict_proba(self, X)

Not implemented for this method

reset(self)

Reset the Hoeffding Tree to default values.

score(self, X, y[, sample_weight])

Returns the coefficient of determination R^2 of the prediction.

set_params(self, **params)

Set the parameters of this estimator.

Attributes

binary_split

classes

get_model_measurements

Collect metrics corresponding to the current status of the tree.

grace_period

leaf_prediction

max_byte_size

memory_estimate_period

nb_threshold

no_preprune

nominal_attributes

remove_poor_atts

split_confidence

split_criterion

stop_mem_management

tie_threshold

static compute_hoeffding_bound(range_val, confidence, n)[source]

Compute the Hoeffding bound, used to decide how many samples are necessary at each node.

Parameters
range_val: float

Range value.

confidence: float

Confidence of choosing the correct attribute.

n: int or float

Number of samples.

Returns
float

The Hoeffding bound.

Notes

The Hoeffding bound is defined as:

\[\epsilon = \sqrt{\frac{R^2\ln(1/\delta))}{2n}}\]

where:

\(\epsilon\): Hoeffding bound.

\(R\): Range of a random variable. For a probability the range is 1, and for an information gain the range is log c, where c is the number of classes.

\(\delta\): Confidence. 1 minus the desired probability of choosing the correct attribute at any given node.

\(n\): Number of samples.

deactivate_all_leaves(self)[source]

Deactivate all leaves.

enforce_tracker_limit(self)[source]

Track the size of the tree and disable/enable nodes if required.

estimate_model_byte_size(self)[source]

Calculate the size of the model and trigger tracker function if the actual model size exceeds the max size in the configuration.

fit(self, X, y, sample_weight=None)[source]

Fit the model.

Parameters
Xnumpy.ndarray of shape (n_samples, n_features)

The features to train the model.

y: numpy.ndarray of shape (n_samples, n_targets)

An array-like with the target values of all samples in X.

sample_weight: numpy.ndarray, optional (default=None)

Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

Returns
self
get_info(self)[source]

Collects and returns the information about the configuration of the estimator

Returns
string

Configuration of the estimator.

get_model_description(self)[source]

Walk the tree and return its structure in a buffer.

Returns
string

The description of the model.

property get_model_measurements

Collect metrics corresponding to the current status of the tree.

Returns
string

A string buffer containing the measurements of the tree.

get_model_rules(self)[source]

Returns list of list describing the tree.

Returns
list (Rule)

list of the rules describing the tree

get_params(self, deep=True)[source]

Get parameters for this estimator.

Parameters
deepboolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

get_rules_description(self)[source]

Prints the the description of tree using rules.

get_votes_for_instance(self, X)[source]

Get class votes for a single instance.

Parameters
X: numpy.ndarray of length equal to the number of features.

Instance attributes.

Returns
dict (class_value, weight)
get_weights_for_instance(self, X)[source]

Get class votes for a single instance.

Parameters
X: numpy.ndarray of length equal to the number of features.

Instance attributes.

Returns
dict (class_value, weight)
measure_byte_size(self)[source]

Calculate the size of the tree.

Returns
int

Size of the tree in bytes.

measure_tree_depth(self)[source]

Calculate the depth of the tree.

Returns
int

Depth of the tree.

new_split_node(self, split_test, class_observations)[source]

Create a new split node.

normalize_sample(self, X)[source]

Normalize the features in order to have the same influence during the process of training.

Parameters
X: np.array

features.

Returns
——-
np.array:

normalized samples

normalize_target_value(self, y)[source]

Normalize the targets in order to have the same influence during the process of training.

Parameters
y: np.array

targets.

Returns
np.array:

normalized targets values

partial_fit(self, X, y, sample_weight=None)[source]

Incrementally trains the model. Train samples (instances) are composed of X attributes and their corresponding targets y.

Tasks performed before training:

  • Verify instance weight. if not provided, uniform weights (1.0) are assumed.

  • If more than one instance is passed, loop through X and pass instances one at a time.

  • Update weight seen by model.

Training tasks:

  • If the tree is empty, create a leaf node as the root.

  • If the tree is already initialized, find the corresponding leaf for the instance and update the leaf node statistics.

  • If growth is allowed and the number of instances that the leaf has observed between split attempts exceed the grace period then attempt to split.

Parameters
X: numpy.ndarray of shape (n_samples, n_features)

Instance attributes.

y: numpy.ndarray of shape (n_samples, n_targets)

Target values.

sample_weight: float or array-like

Samples weight. If not provided, uniform weights are assumed.

predict(self, X)[source]

Predicts the target value using mean class or the perceptron.

Parameters
X: numpy.ndarray of shape (n_samples, n_features)

Samples for which we want to predict the labels.

Returns
list

Predicted target values.

predict_proba(self, X)[source]

Not implemented for this method

reset(self)[source]

Reset the Hoeffding Tree to default values.

score(self, X, y, sample_weight=None)[source]

Returns the coefficient of determination R^2 of the prediction.

The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

Parameters
Xarray-like, shape = (n_samples, n_features)

Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator.

yarray-like, shape = (n_samples) or (n_samples, n_outputs)

True values for X.

sample_weightarray-like, shape = [n_samples], optional

Sample weights.

Returns
scorefloat

R^2 of self.predict(X) wrt. y.

Notes

The R2 score used when calling score on a regressor will use multioutput='uniform_average' from version 0.23 to keep consistent with metrics.r2_score. This will influence the score method of all the multioutput regressors (except for multioutput.MultiOutputRegressor). To specify the default value manually and avoid the warning, please either call metrics.r2_score directly or make a custom scorer with metrics.make_scorer (the built-in scorer 'r2' uses multioutput='uniform_average').

set_params(self, **params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns
self