skmultiflow.trees.
LabelCombinationHoeffdingTreeClassifier
Label Combination Hoeffding Tree for multi-label classification.
Label combination transforms the problem from multi-label to multi-class. For each unique combination of labels it assigns a class and proceeds with training the hoeffding tree normally.
The transformation is done by changing the label set which could be seen as a binary number to an int which will represent the class, and after the prediction the int is converted back to a binary number which is the predicted label-set.
The number of labels need to be provided for the transformation to work.
Maximum memory consumed by the tree.
Number of instances between memory consumption checks.
Number of instances a leaf should observe between split attempts.
Allowed error in split decision, a value closer to 0 takes longer to decide.
Threshold below which a split will be forced to break ties.
If True, only allow binary splits.
If True, stop growing as soon as memory limit is hit.
If True, disable poor attributes.
If True, disable pre-pruning.
Number of instances a leaf should observe before allowing Naive Bayes.
List of Nominal attributes. If emtpy, then assume that all attributes are numerical.
the number of labels the problem has.
Examples
>>> # Imports >>> from skmultiflow.data import MultilabelGenerator >>> from skmultiflow.trees import LabelCombinationHoeffdingTreeClassifier >>> from skmultiflow.metrics import hamming_score >>> >>> # Setting up a data stream >>> stream = MultilabelGenerator(random_state=1, n_samples=200, >>> n_targets=5, n_features=10) >>> >>> # Setup Label Combination Hoeffding Tree classifier >>> lc_ht = LabelCombinationHoeffdingTreeClassifier(n_labels=stream.n_targets) >>> >>> # Setup variables to control loop and track performance >>> n_samples = 0 >>> max_samples = 200 >>> true_labels = [] >>> predicts = [] >>> >>> # Train the estimator with the samples provided by the data stream >>> while n_samples < max_samples and stream.has_more_samples(): >>> X, y = stream.next_sample() >>> y_pred = lc_ht.predict(X) >>> lc_ht.partial_fit(X, y) >>> predicts.extend(y_pred) >>> true_labels.extend(y) >>> n_samples += 1 >>> >>> # Display results >>> perf = hamming_score(true_labels, predicts) >>> print('{} samples analyzed.'.format(n_samples)) >>> print('Label Combination Hoeffding Tree Hamming score: ' + str(perf))
Methods
compute_hoeffding_bound(range_val, confidence, n)
compute_hoeffding_bound
Compute the Hoeffding bound, used to decide how many samples are necessary at each node.
deactivate_all_leaves(self)
deactivate_all_leaves
Deactivate all leaves.
enforce_tracker_limit(self)
enforce_tracker_limit
Track the size of the tree and disable/enable nodes if required.
estimate_model_byte_size(self)
estimate_model_byte_size
Calculate the size of the model and trigger tracker function if the actual model size exceeds the max size in the configuration.
fit(self, X, y[, classes, sample_weight])
fit
Fit the model.
get_info(self)
get_info
Collects and returns the information about the configuration of the estimator
get_model_description(self)
get_model_description
Walk the tree and return its structure in a buffer.
get_model_rules(self)
get_model_rules
Returns list of list describing the tree.
get_params(self[, deep])
get_params
Get parameters for this estimator.
get_rules_description(self)
get_rules_description
Prints the the description of tree using rules.
get_votes_for_instance(self, X)
get_votes_for_instance
Get class votes for a single instance.
measure_byte_size(self)
measure_byte_size
Calculate the size of the tree.
measure_tree_depth(self)
measure_tree_depth
Calculate the depth of the tree.
new_split_node(self, split_test, …)
new_split_node
Create a new split node.
partial_fit(self, X, y[, classes, sample_weight])
partial_fit
Incrementally trains the model. Train samples (instances) are composed of X attributes and their
predict(self, X)
predict
Predicts the label of the X instance(s)
predict_proba(self, X)
predict_proba
Predicts probabilities of all label of the X instance(s)
reset(self)
reset
Reset the Hoeffding Tree to default values.
score(self, X, y[, sample_weight])
score
Returns the mean accuracy on the given test data and labels.
set_params(self, **params)
set_params
Set the parameters of this estimator.
Attributes
binary_split
classes
get_model_measurements
Collect metrics corresponding to the current status of the tree.
grace_period
leaf_prediction
max_byte_size
memory_estimate_period
n_labels
nb_threshold
no_preprune
nominal_attributes
remove_poor_atts
split_confidence
split_criterion
stop_mem_management
tie_threshold
Range value.
Confidence of choosing the correct attribute.
Number of samples.
The Hoeffding bound.
Notes
The Hoeffding bound is defined as:
where:
\(\epsilon\): Hoeffding bound.
\(R\): Range of a random variable. For a probability the range is 1, and for an information gain the range is log c, where c is the number of classes.
\(\delta\): Confidence. 1 minus the desired probability of choosing the correct attribute at any given node.
\(n\): Number of samples.
The features to train the model.
An array-like with the class labels of all samples in X.
Contains all possible/known class labels. Usage varies depending on the learning method.
Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.
Configuration of the estimator.
The description of the model.
A string buffer containing the measurements of the tree.
list of the rules describing the tree
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Parameter names mapped to their values.
Instance attributes.
Size of the tree in bytes.
Depth of the tree.
corresponding targets y.
Classes (targets) for all samples in X.
Samples weight. If not provided, uniform weights are assumed.
Samples for which we want to predict the labels.
Predicted labels for all instances in X.
Predicted the probabilities of all the labels for all instances in X.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Test samples.
True labels for X.
Sample weights.
Mean accuracy of self.predict(X) wrt. y.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
<component>__<parameter>