skmultiflow.trees.LabelCombinationHoeffdingTreeClassifier

class skmultiflow.trees.LabelCombinationHoeffdingTreeClassifier(max_byte_size=33554432, memory_estimate_period=1000000, grace_period=200, split_criterion='info_gain', split_confidence=1e-07, tie_threshold=0.05, binary_split=False, stop_mem_management=False, remove_poor_atts=False, no_preprune=False, leaf_prediction='nba', nb_threshold=0, nominal_attributes=None, n_labels=None)[source]

Label Combination Hoeffding Tree for multi-label classification.

Label combination transforms the problem from multi-label to multi-class. For each unique combination of labels it assigns a class and proceeds with training the hoeffding tree normally.

The transformation is done by changing the label set which could be seen as a binary number to an int which will represent the class, and after the prediction the int is converted back to a binary number which is the predicted label-set.

The number of labels need to be provided for the transformation to work.

Parameters
max_byte_size: int (default=33554432)

Maximum memory consumed by the tree.

memory_estimate_period: int (default=1000000)

Number of instances between memory consumption checks.

grace_period: int (default=200)

Number of instances a leaf should observe between split attempts.

split_criterion: string (default=’info_gain’)
Split criterion to use.
‘gini’ - Gini
‘info_gain’ - Information Gain
split_confidence: float (default=0.0000001)

Allowed error in split decision, a value closer to 0 takes longer to decide.

tie_threshold: float (default=0.05)

Threshold below which a split will be forced to break ties.

binary_split: boolean (default=False)

If True, only allow binary splits.

stop_mem_management: boolean (default=False)

If True, stop growing as soon as memory limit is hit.

remove_poor_atts: boolean (default=False)

If True, disable poor attributes.

no_preprune: boolean (default=False)

If True, disable pre-pruning.

leaf_prediction: string (default=’nba’)
Prediction mechanism used at leafs.
‘mc’ - Majority Class
‘nb’ - Naive Bayes
‘nba’ - Naive Bayes Adaptive
nb_threshold: int (default=0)

Number of instances a leaf should observe before allowing Naive Bayes.

nominal_attributes: list, optional

List of Nominal attributes. If emtpy, then assume that all attributes are numerical.

n_labels: int (default=None)

the number of labels the problem has.

Examples

>>> # Imports
>>> from skmultiflow.data import MultilabelGenerator
>>> from skmultiflow.trees import LabelCombinationHoeffdingTreeClassifier
>>> from skmultiflow.metrics import hamming_score
>>>
>>> # Setting up a data stream
>>> stream = MultilabelGenerator(random_state=1, n_samples=200,
>>>                              n_targets=5, n_features=10)
>>>
>>> # Setup Label Combination Hoeffding Tree classifier
>>> lc_ht = LabelCombinationHoeffdingTreeClassifier(n_labels=stream.n_targets)
>>>
>>> # Setup variables to control loop and track performance
>>> n_samples = 0
>>> max_samples = 200
>>> true_labels = []
>>> predicts = []
>>>
>>> # Train the estimator with the samples provided by the data stream
>>> while n_samples < max_samples and stream.has_more_samples():
>>>     X, y = stream.next_sample()
>>>     y_pred = lc_ht.predict(X)
>>>     lc_ht.partial_fit(X, y)
>>>     predicts.extend(y_pred)
>>>     true_labels.extend(y)
>>>     n_samples += 1
>>>
>>> # Display results
>>> perf = hamming_score(true_labels, predicts)
>>> print('{} samples analyzed.'.format(n_samples))
>>> print('Label Combination Hoeffding Tree Hamming score: ' + str(perf))

Methods

compute_hoeffding_bound(range_val, confidence, n)

Compute the Hoeffding bound, used to decide how many samples are necessary at each node.

deactivate_all_leaves(self)

Deactivate all leaves.

enforce_tracker_limit(self)

Track the size of the tree and disable/enable nodes if required.

estimate_model_byte_size(self)

Calculate the size of the model and trigger tracker function if the actual model size exceeds the max size in the configuration.

fit(self, X, y[, classes, sample_weight])

Fit the model.

get_info(self)

Collects and returns the information about the configuration of the estimator

get_model_description(self)

Walk the tree and return its structure in a buffer.

get_model_rules(self)

Returns list of list describing the tree.

get_params(self[, deep])

Get parameters for this estimator.

get_rules_description(self)

Prints the the description of tree using rules.

get_votes_for_instance(self, X)

Get class votes for a single instance.

measure_byte_size(self)

Calculate the size of the tree.

measure_tree_depth(self)

Calculate the depth of the tree.

new_split_node(self, split_test, …)

Create a new split node.

partial_fit(self, X, y[, classes, sample_weight])

Incrementally trains the model. Train samples (instances) are composed of X attributes and their

predict(self, X)

Predicts the label of the X instance(s)

predict_proba(self, X)

Predicts probabilities of all label of the X instance(s)

reset(self)

Reset the Hoeffding Tree to default values.

score(self, X, y[, sample_weight])

Returns the mean accuracy on the given test data and labels.

set_params(self, **params)

Set the parameters of this estimator.

Attributes

binary_split

classes

get_model_measurements

Collect metrics corresponding to the current status of the tree.

grace_period

leaf_prediction

max_byte_size

memory_estimate_period

n_labels

nb_threshold

no_preprune

nominal_attributes

remove_poor_atts

split_confidence

split_criterion

stop_mem_management

tie_threshold

static compute_hoeffding_bound(range_val, confidence, n)[source]

Compute the Hoeffding bound, used to decide how many samples are necessary at each node.

Parameters
range_val: float

Range value.

confidence: float

Confidence of choosing the correct attribute.

n: int or float

Number of samples.

Returns
float

The Hoeffding bound.

Notes

The Hoeffding bound is defined as:

\[\epsilon = \sqrt{\frac{R^2\ln(1/\delta))}{2n}}\]

where:

\(\epsilon\): Hoeffding bound.

\(R\): Range of a random variable. For a probability the range is 1, and for an information gain the range is log c, where c is the number of classes.

\(\delta\): Confidence. 1 minus the desired probability of choosing the correct attribute at any given node.

\(n\): Number of samples.

deactivate_all_leaves(self)[source]

Deactivate all leaves.

enforce_tracker_limit(self)[source]

Track the size of the tree and disable/enable nodes if required.

estimate_model_byte_size(self)[source]

Calculate the size of the model and trigger tracker function if the actual model size exceeds the max size in the configuration.

fit(self, X, y, classes=None, sample_weight=None)[source]

Fit the model.

Parameters
Xnumpy.ndarray of shape (n_samples, n_features)

The features to train the model.

y: numpy.ndarray of shape (n_samples, n_targets)

An array-like with the class labels of all samples in X.

classes: numpy.ndarray, optional (default=None)

Contains all possible/known class labels. Usage varies depending on the learning method.

sample_weight: numpy.ndarray, optional (default=None)

Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

Returns
self
get_info(self)[source]

Collects and returns the information about the configuration of the estimator

Returns
string

Configuration of the estimator.

get_model_description(self)[source]

Walk the tree and return its structure in a buffer.

Returns
string

The description of the model.

property get_model_measurements

Collect metrics corresponding to the current status of the tree.

Returns
string

A string buffer containing the measurements of the tree.

get_model_rules(self)[source]

Returns list of list describing the tree.

Returns
list (Rule)

list of the rules describing the tree

get_params(self, deep=True)[source]

Get parameters for this estimator.

Parameters
deepboolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

get_rules_description(self)[source]

Prints the the description of tree using rules.

get_votes_for_instance(self, X)[source]

Get class votes for a single instance.

Parameters
X: numpy.ndarray of length equal to the number of features.

Instance attributes.

Returns
dict (class_value, weight)
measure_byte_size(self)[source]

Calculate the size of the tree.

Returns
int

Size of the tree in bytes.

measure_tree_depth(self)[source]

Calculate the depth of the tree.

Returns
int

Depth of the tree.

new_split_node(self, split_test, class_observations)[source]

Create a new split node.

partial_fit(self, X, y, classes=None, sample_weight=None)[source]
Incrementally trains the model. Train samples (instances) are composed of X attributes and their

corresponding targets y.

Parameters
X: numpy.ndarray of shape (n_samples, n_features)

Instance attributes.

y: array_like

Classes (targets) for all samples in X.

classes: Not used (default=None)
sample_weight: float or array-like, optional (default=None)

Samples weight. If not provided, uniform weights are assumed.

Returns
self
predict(self, X)[source]

Predicts the label of the X instance(s)

Parameters
X: numpy.ndarray of shape (n_samples, n_features)

Samples for which we want to predict the labels.

Returns
numpy.array

Predicted labels for all instances in X.

predict_proba(self, X)[source]

Predicts probabilities of all label of the X instance(s)

Parameters
X: numpy.ndarray of shape (n_samples, n_features)

Samples for which we want to predict the labels.

Returns
numpy.array

Predicted the probabilities of all the labels for all instances in X.

reset(self)[source]

Reset the Hoeffding Tree to default values.

score(self, X, y, sample_weight=None)[source]

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters
Xarray-like, shape = (n_samples, n_features)

Test samples.

yarray-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like, shape = [n_samples], optional

Sample weights.

Returns
scorefloat

Mean accuracy of self.predict(X) wrt. y.

set_params(self, **params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns
self