skmultiflow.trees.LabelCombinationHoeffdingTreeClassifier¶

class skmultiflow.trees.LabelCombinationHoeffdingTreeClassifier(max_byte_size=33554432, memory_estimate_period=1000000, grace_period=200, split_criterion='info_gain', split_confidence=1e-07, tie_threshold=0.05, binary_split=False, stop_mem_management=False, remove_poor_atts=False, no_preprune=False, leaf_prediction='nba', nb_threshold=0, nominal_attributes=None, n_labels=None)[source]¶

Label Combination Hoeffding Tree for multi-label classification.

Label combination transforms the problem from multi-label to multi-class. For each unique combination of labels it assigns a class and proceeds with training the hoeffding tree normally.

The transformation is done by changing the label set which could be seen as a binary number to an int which will represent the class, and after the prediction the int is converted back to a binary number which is the predicted label-set.

The number of labels need to be provided for the transformation to work.

Parameters

max_byte_size: int (default=33554432): Maximum memory consumed by the tree.
memory_estimate_period: int (default=1000000): Number of instances between memory consumption checks.
grace_period: int (default=200): Number of instances a leaf should observe between split attempts.
split_criterion: string (default=’info_gain’): Split criterion to use.

‘gini’ - Gini

‘info_gain’ - Information Gain
split_confidence: float (default=0.0000001): Allowed error in split decision, a value closer to 0 takes longer to decide.
tie_threshold: float (default=0.05): Threshold below which a split will be forced to break ties.
binary_split: boolean (default=False): If True, only allow binary splits.
stop_mem_management: boolean (default=False): If True, stop growing as soon as memory limit is hit.
remove_poor_atts: boolean (default=False): If True, disable poor attributes.
no_preprune: boolean (default=False): If True, disable pre-pruning.
leaf_prediction: string (default=’nba’): Prediction mechanism used at leafs.

‘mc’ - Majority Class

‘nb’ - Naive Bayes

‘nba’ - Naive Bayes Adaptive
nb_threshold: int (default=0): Number of instances a leaf should observe before allowing Naive Bayes.
nominal_attributes: list, optional: List of Nominal attributes. If emtpy, then assume that all attributes are numerical.
n_labels: int (default=None): the number of labels the problem has.

Examples

>>> # Imports
>>> from skmultiflow.data import MultilabelGenerator
>>> from skmultiflow.trees import LabelCombinationHoeffdingTreeClassifier
>>> from skmultiflow.metrics import hamming_score
>>>
>>> # Setting up a data stream
>>> stream = MultilabelGenerator(random_state=1, n_samples=200,
>>>                              n_targets=5, n_features=10)
>>>
>>> # Setup Label Combination Hoeffding Tree classifier
>>> lc_ht = LabelCombinationHoeffdingTreeClassifier(n_labels=stream.n_targets)
>>>
>>> # Setup variables to control loop and track performance
>>> n_samples = 0
>>> max_samples = 200
>>> true_labels = []
>>> predicts = []
>>>
>>> # Train the estimator with the samples provided by the data stream
>>> while n_samples < max_samples and stream.has_more_samples():
>>>     X, y = stream.next_sample()
>>>     y_pred = lc_ht.predict(X)
>>>     lc_ht.partial_fit(X, y)
>>>     predicts.extend(y_pred)
>>>     true_labels.extend(y)
>>>     n_samples += 1
>>>
>>> # Display results
>>> perf = hamming_score(true_labels, predicts)
>>> print('{} samples analyzed.'.format(n_samples))
>>> print('Label Combination Hoeffding Tree Hamming score: ' + str(perf))

Methods

`fit`(X, y[, classes, sample_weight])	Fit the model.
`get_info`()	Collects and returns the information about the configuration of the estimator
`get_model_description`()	Walk the tree and return its structure in a buffer.
`get_model_rules`()	Returns list of rules describing the tree.
`get_params`([deep])	Get parameters for this estimator.
`get_rules_description`()	Prints the description of tree using rules.
`measure_byte_size`()	Calculate the size of the tree.
`partial_fit`(X, y[, classes, sample_weight])	Incrementally trains the model.
`predict`(X)	Predicts the label of the X instance(s)
`predict_proba`(X)	Predicts probabilities of all label of the X instance(s)
`reset`()	Reset the Hoeffding Tree to default values.
`score`(X, y[, sample_weight])	Returns the mean accuracy on the given test data and labels.
`set_params`(**params)	Set the parameters of this estimator.

Attributes

`leaf_prediction`
`model_measurements`	Collect metrics corresponding to the current status of the tree.
`n_labels`
`split_criterion`

fit(X, y, classes=None, sample_weight=None)[source]¶

Fit the model.

Parameters

Xnumpy.ndarray of shape (n_samples, n_features): The features to train the model.
y: numpy.ndarray of shape (n_samples, n_targets): An array-like with the class labels of all samples in X.
classes: numpy.ndarray, optional (default=None): Contains all possible/known class labels. Usage varies depending on the learning method.
sample_weight: numpy.ndarray, optional (default=None): Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

Returns

self

get_info()[source]¶

Collects and returns the information about the configuration of the estimator

Returns

string: Configuration of the estimator.

get_model_description()[source]¶

Walk the tree and return its structure in a buffer.

Returns

string: The description of the model.

get_model_rules()[source]¶

Returns list of rules describing the tree.

Returns

list (Rule): list of the rules describing the tree

get_params(deep=True)[source]¶

Get parameters for this estimator.

Parameters

deepboolean, optional: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsmapping of string to any: Parameter names mapped to their values.

get_rules_description()[source]¶: Prints the description of tree using rules.

measure_byte_size()[source]¶

Calculate the size of the tree.

Returns

int: Size of the tree in bytes.

property model_measurements¶

Collect metrics corresponding to the current status of the tree.

Returns

string: A string buffer containing the measurements of the tree.

partial_fit(X, y, classes=None, sample_weight=None)[source]¶

Incrementally trains the model. Train samples (instances) are composed of X attributes and their corresponding targets y.

Parameters

X: numpy.ndarray of shape (n_samples, n_features): Instance attributes.
y: array_like: Classes (targets) for all samples in X.
classes: Not used (default=None)
sample_weight: float or array-like, optional (default=None): Samples weight. If not provided, uniform weights are assumed.

Returns

self

predict(X)[source]¶

Predicts the label of the X instance(s)

Parameters

X: numpy.ndarray of shape (n_samples, n_features): Samples for which we want to predict the labels.

Returns

numpy.array: Predicted labels for all instances in X.

predict_proba(X)[source]¶

Predicts probabilities of all label of the X instance(s)

Parameters

X: numpy.ndarray of shape (n_samples, n_features): Samples for which we want to predict the labels.

Returns

numpy.array: Predicted the probabilities of all the labels for all instances in X.

reset()[source]¶: Reset the Hoeffding Tree to default values.

score(X, y, sample_weight=None)[source]¶

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

Xarray-like, shape = (n_samples, n_features): Test samples.
yarray-like, shape = (n_samples) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like, shape = [n_samples], optional: Sample weights.

Returns

scorefloat: Mean accuracy of self.predict(X) wrt. y.

set_params(**params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

self