skmultiflow.trees.LabelCombinationHoeffdingTreeClassifier¶

class skmultiflow.trees.LabelCombinationHoeffdingTreeClassifier(max_byte_size=33554432, memory_estimate_period=1000000, grace_period=200, split_criterion='info_gain', split_confidence=1e-07, tie_threshold=0.05, binary_split=False, stop_mem_management=False, remove_poor_atts=False, no_preprune=False, leaf_prediction='nba', nb_threshold=0, nominal_attributes=None, n_labels=None)[source]¶

Label Combination Hoeffding Tree for multi-label classification.

Label combination transforms the problem from multi-label to multi-class. For each unique combination of labels it assigns a class and proceeds with training the hoeffding tree normally.

The transformation is done by changing the label set which could be seen as a binary number to an int which will represent the class, and after the prediction the int is converted back to a binary number which is the predicted label-set.

The number of labels need to be provided for the transformation to work.

Parameters

max_byte_size: int (default=33554432): Maximum memory consumed by the tree.
memory_estimate_period: int (default=1000000): Number of instances between memory consumption checks.
grace_period: int (default=200): Number of instances a leaf should observe between split attempts.
split_criterion: string (default=’info_gain’): Split criterion to use.

‘gini’ - Gini

‘info_gain’ - Information Gain
split_confidence: float (default=0.0000001): Allowed error in split decision, a value closer to 0 takes longer to decide.
tie_threshold: float (default=0.05): Threshold below which a split will be forced to break ties.
binary_split: boolean (default=False): If True, only allow binary splits.
stop_mem_management: boolean (default=False): If True, stop growing as soon as memory limit is hit.
remove_poor_atts: boolean (default=False): If True, disable poor attributes.
no_preprune: boolean (default=False): If True, disable pre-pruning.
leaf_prediction: string (default=’nba’): Prediction mechanism used at leafs.

‘mc’ - Majority Class

‘nb’ - Naive Bayes

‘nba’ - Naive Bayes Adaptive
nb_threshold: int (default=0): Number of instances a leaf should observe before allowing Naive Bayes.
nominal_attributes: list, optional: List of Nominal attributes. If emtpy, then assume that all attributes are numerical.
n_labels: int (default=None): the number of labels the problem has.

Examples

>>> # Imports
>>> from skmultiflow.data import MultilabelGenerator
>>> from skmultiflow.trees import LabelCombinationHoeffdingTreeClassifier
>>> from skmultiflow.metrics import hamming_score
>>>
>>> # Setting up a data stream
>>> stream = MultilabelGenerator(random_state=1, n_samples=200,
>>>                              n_targets=5, n_features=10)
>>>
>>> # Setup Label Combination Hoeffding Tree classifier
>>> lc_ht = LabelCombinationHoeffdingTreeClassifier(n_labels=stream.n_targets)
>>>
>>> # Setup variables to control loop and track performance
>>> n_samples = 0
>>> max_samples = 200
>>> true_labels = []
>>> predicts = []
>>>
>>> # Train the estimator with the samples provided by the data stream
>>> while n_samples < max_samples and stream.has_more_samples():
>>>     X, y = stream.next_sample()
>>>     y_pred = lc_ht.predict(X)
>>>     lc_ht.partial_fit(X, y)
>>>     predicts.extend(y_pred)
>>>     true_labels.extend(y)
>>>     n_samples += 1
>>>
>>> # Display results
>>> perf = hamming_score(true_labels, predicts)
>>> print('{} samples analyzed.'.format(n_samples))
>>> print('Label Combination Hoeffding Tree Hamming score: ' + str(perf))

Methods

`compute_hoeffding_bound`(range_val, confidence, n)	Compute the Hoeffding bound, used to decide how many samples are necessary at each node.
`deactivate_all_leaves`(self)	Deactivate all leaves.
`enforce_tracker_limit`(self)	Track the size of the tree and disable/enable nodes if required.
`estimate_model_byte_size`(self)	Calculate the size of the model and trigger tracker function if the actual model size exceeds the max size in the configuration.
`fit`(self, X, y[, classes, sample_weight])	Fit the model.
`get_info`(self)	Collects and returns the information about the configuration of the estimator
`get_model_description`(self)	Walk the tree and return its structure in a buffer.
`get_model_rules`(self)	Returns list of list describing the tree.
`get_params`(self[, deep])	Get parameters for this estimator.
`get_rules_description`(self)	Prints the the description of tree using rules.
`get_votes_for_instance`(self, X)	Get class votes for a single instance.
`measure_byte_size`(self)	Calculate the size of the tree.
`measure_tree_depth`(self)	Calculate the depth of the tree.
`new_split_node`(self, split_test, …)	Create a new split node.
`partial_fit`(self, X, y[, classes, sample_weight])	Incrementally trains the model. Train samples (instances) are composed of X attributes and their
`predict`(self, X)	Predicts the label of the X instance(s)
`predict_proba`(self, X)	Predicts probabilities of all label of the X instance(s)
`reset`(self)	Reset the Hoeffding Tree to default values.
`score`(self, X, y[, sample_weight])	Returns the mean accuracy on the given test data and labels.
`set_params`(self, **params)	Set the parameters of this estimator.

Attributes

`binary_split`
`classes`
`get_model_measurements`	Collect metrics corresponding to the current status of the tree.
`grace_period`
`leaf_prediction`
`max_byte_size`
`memory_estimate_period`
`n_labels`
`nb_threshold`
`no_preprune`
`nominal_attributes`
`remove_poor_atts`
`split_confidence`
`split_criterion`
`stop_mem_management`
`tie_threshold`

static compute_hoeffding_bound(range_val, confidence, n)[source]¶

Compute the Hoeffding bound, used to decide how many samples are necessary at each node.

Parameters

range_val: float: Range value.
confidence: float: Confidence of choosing the correct attribute.
n: int or float: Number of samples.

Returns

float: The Hoeffding bound.

Notes

The Hoeffding bound is defined as:

\[\epsilon = \sqrt{\frac{R^2\ln(1/\delta))}{2n}}\]

where:

\(\epsilon\): Hoeffding bound.

\(R\): Range of a random variable. For a probability the range is 1, and for an information gain the range is log c, where c is the number of classes.

\(\delta\): Confidence. 1 minus the desired probability of choosing the correct attribute at any given node.

\(n\): Number of samples.

deactivate_all_leaves(self)[source]¶: Deactivate all leaves.

enforce_tracker_limit(self)[source]¶: Track the size of the tree and disable/enable nodes if required.

estimate_model_byte_size(self)[source]¶: Calculate the size of the model and trigger tracker function if the actual model size exceeds the max size in the configuration.

fit(self, X, y, classes=None, sample_weight=None)[source]¶

Fit the model.

Parameters

Xnumpy.ndarray of shape (n_samples, n_features): The features to train the model.
y: numpy.ndarray of shape (n_samples, n_targets): An array-like with the class labels of all samples in X.
classes: numpy.ndarray, optional (default=None): Contains all possible/known class labels. Usage varies depending on the learning method.
sample_weight: numpy.ndarray, optional (default=None): Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

Returns

self

get_info(self)[source]¶

Collects and returns the information about the configuration of the estimator

Returns

string: Configuration of the estimator.

get_model_description(self)[source]¶

Walk the tree and return its structure in a buffer.

Returns

string: The description of the model.

property get_model_measurements¶

Collect metrics corresponding to the current status of the tree.

Returns

string: A string buffer containing the measurements of the tree.

get_model_rules(self)[source]¶

Returns list of list describing the tree.

Returns

list (Rule): list of the rules describing the tree

get_params(self, deep=True)[source]¶

Get parameters for this estimator.

Parameters

deepboolean, optional: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsmapping of string to any: Parameter names mapped to their values.

get_rules_description(self)[source]¶: Prints the the description of tree using rules.

get_votes_for_instance(self, X)[source]¶

Get class votes for a single instance.

Parameters

X: numpy.ndarray of length equal to the number of features.: Instance attributes.

Returns

dict (class_value, weight)

measure_byte_size(self)[source]¶

Calculate the size of the tree.

Returns

int: Size of the tree in bytes.

measure_tree_depth(self)[source]¶

Calculate the depth of the tree.

Returns

int: Depth of the tree.

new_split_node(self, split_test, class_observations)[source]¶: Create a new split node.

partial_fit(self, X, y, classes=None, sample_weight=None)[source]¶

Incrementally trains the model. Train samples (instances) are composed of X attributes and their: corresponding targets y.

Parameters

X: numpy.ndarray of shape (n_samples, n_features): Instance attributes.
y: array_like: Classes (targets) for all samples in X.
classes: Not used (default=None)
sample_weight: float or array-like, optional (default=None): Samples weight. If not provided, uniform weights are assumed.

Returns

self

predict(self, X)[source]¶

Predicts the label of the X instance(s)

Parameters

X: numpy.ndarray of shape (n_samples, n_features): Samples for which we want to predict the labels.

Returns

numpy.array: Predicted labels for all instances in X.

predict_proba(self, X)[source]¶

Predicts probabilities of all label of the X instance(s)

Parameters

X: numpy.ndarray of shape (n_samples, n_features): Samples for which we want to predict the labels.

Returns

numpy.array: Predicted the probabilities of all the labels for all instances in X.

reset(self)[source]¶: Reset the Hoeffding Tree to default values.

score(self, X, y, sample_weight=None)[source]¶

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

Xarray-like, shape = (n_samples, n_features): Test samples.
yarray-like, shape = (n_samples) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like, shape = [n_samples], optional: Sample weights.

Returns

scorefloat: Mean accuracy of self.predict(X) wrt. y.

set_params(self, **params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

self