skmultiflow.meta.AdaptiveRandomForestClassifier

class skmultiflow.meta.AdaptiveRandomForestClassifier(n_estimators=10, max_features='auto', disable_weighted_vote=False, lambda_value=6, performance_metric='acc', drift_detection_method=ADWIN(delta=0.001), warning_detection_method=ADWIN(delta=0.01), max_byte_size=33554432, memory_estimate_period=2000000, grace_period=50, split_criterion='info_gain', split_confidence=0.01, tie_threshold=0.05, binary_split=False, stop_mem_management=False, remove_poor_atts=False, no_preprune=False, leaf_prediction='nba', nb_threshold=0, nominal_attributes=None, random_state=None)[source]

Adaptive Random Forest classifier.

Parameters
n_estimators: int, optional (default=10)

Number of trees in the ensemble.

max_featuresint, float, string or None, optional (default=”auto”)

Max number of attributes for each node split.

  • If int, then consider max_features features at each split.

  • If float, then max_features is a percentage and int(max_features * n_features) features are considered at each split.

  • If “auto”, then max_features=sqrt(n_features).

  • If “sqrt”, then max_features=sqrt(n_features) (same as “auto”).

  • If “log2”, then max_features=log2(n_features).

  • If None, then max_features=n_features.

disable_weighted_vote: bool, optional (default=False)

Weighted vote option.

lambda_value: int, optional (default=6)

The lambda value for bagging (lambda=6 corresponds to Leverage Bagging).

performance_metric: string, optional (default=”acc”)

Metric used to track trees performance within the ensemble.

  • ‘acc’ - Accuracy

  • ‘kappa’ - Accuracy

drift_detection_method: BaseDriftDetector or None, optional (default=ADWIN(0.001))

Drift Detection method. Set to None to disable Drift detection.

warning_detection_method: BaseDriftDetector or None, default(ADWIN(0.01))

Warning Detection method. Set to None to disable warning detection.

max_byte_size: int, optional (default=33554432)

(ARFHoeffdingTreeClassifier parameter) Maximum memory consumed by the tree.

memory_estimate_period: int, optional (default=2000000)

(ARFHoeffdingTreeClassifier parameter) Number of instances between memory consumption checks.

grace_period: int, optional (default=50)

(ARFHoeffdingTreeClassifier parameter) Number of instances a leaf should observe between split attempts.

split_criterion: string, optional (default=’info_gain’)

(ARFHoeffdingTreeClassifier parameter) Split criterion to use.

  • ‘gini’ - Gini

  • ‘info_gain’ - Information Gain

split_confidence: float, optional (default=0.01)

(ARFHoeffdingTreeClassifier parameter) Allowed error in split decision, a value closer to 0 takes longer to decide.

tie_threshold: float, optional (default=0.05)

(ARFHoeffdingTreeClassifier parameter) Threshold below which a split will be forced to break ties.

binary_split: bool, optional (default=False)

(ARFHoeffdingTreeClassifier parameter) If True, only allow binary splits.

stop_mem_management: bool, optional (default=False)

(ARFHoeffdingTreeClassifier parameter) If True, stop growing as soon as memory limit is hit.

remove_poor_atts: bool, optional (default=False)

(ARFHoeffdingTreeClassifier parameter) If True, disable poor attributes.

no_preprune: bool, optional (default=False)

(ARFHoeffdingTreeClassifier parameter) If True, disable pre-pruning.

leaf_prediction: string, optional (default=’nba’)

(ARFHoeffdingTreeClassifier parameter) Prediction mechanism used at leafs.

  • ‘mc’ - Majority Class

  • ‘nb’ - Naive Bayes

  • ‘nba’ - Naive Bayes Adaptive

nb_threshold: int, optional (default=0)

(ARFHoeffdingTreeClassifier parameter) Number of instances a leaf should observe before allowing Naive Bayes.

nominal_attributes: list, optional

(ARFHoeffdingTreeClassifier parameter) List of Nominal attributes. If emtpy, then assume that all attributes are numerical.

random_state: int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Notes

The 3 most important aspects of Adaptive Random Forest [1] are: (1) inducing diversity through re-sampling; (2) inducing diversity through randomly selecting subsets of features for node splits (see skmultiflow.classification.trees.arf_hoeffding_tree); (3) drift detectors per base tree, which cause selective resets in response to drifts. It also allows training background trees, which start training if a warning is detected and replace the active tree if the warning escalates to a drift.

References

1

Heitor Murilo Gomes, Albert Bifet, Jesse Read, Jean Paul Barddal, Fabricio Enembreck, Bernhard Pfharinger, Geoff Holmes, Talel Abdessalem. Adaptive random forests for evolving data stream classification. In Machine Learning, DOI: 10.1007/s10994-017-5642-8, Springer, 2017.

Examples

>>> # Imports
>>> from skmultiflow.data import SEAGenerator
>>> from skmultiflow.meta import AdaptiveRandomForestClassifier
>>>
>>> # Setting up a data stream
>>> stream = SEAGenerator(random_state=1)
>>>
>>> # Setup Adaptive Random Forest Classifier
>>> arf = AdaptiveRandomForestClassifier()
>>>
>>> # Setup variables to control loop and track performance
>>> n_samples = 0
>>> correct_cnt = 0
>>> max_samples = 200
>>>
>>> # Train the estimator with the samples provided by the data stream
>>> while n_samples < max_samples and stream.has_more_samples():
>>>     X, y = stream.next_sample()
>>>     y_pred = arf.predict(X)
>>>     if y[0] == y_pred[0]:
>>>         correct_cnt += 1
>>>     arf.partial_fit(X, y)
>>>     n_samples += 1
>>>
>>> # Display results
>>> print('Adaptive Random Forest ensemble classifier example')
>>> print('{} samples analyzed.'.format(n_samples))
>>> print('Accuracy: {}'.format(correct_cnt / n_samples))

Methods

fit(self, X, y[, classes, sample_weight])

Fit the model.

get_info(self)

Collects and returns the information about the configuration of the estimator

get_params(self[, deep])

Get parameters for this estimator.

get_votes_for_instance(self, X)

partial_fit(self, X, y[, classes, sample_weight])

Partially (incrementally) fit the model.

predict(self, X)

Predict classes for the passed data.

predict_proba(self, X)

Estimates the probability of each sample in X belonging to each of the class-labels.

reset(self)

Reset ARF.

score(self, X, y[, sample_weight])

Returns the mean accuracy on the given test data and labels.

set_params(self, **params)

Set the parameters of this estimator.

fit(self, X, y, classes=None, sample_weight=None)[source]

Fit the model.

Parameters
Xnumpy.ndarray of shape (n_samples, n_features)

The features to train the model.

y: numpy.ndarray of shape (n_samples, n_targets)

An array-like with the class labels of all samples in X.

classes: numpy.ndarray, optional (default=None)

Contains all possible/known class labels. Usage varies depending on the learning method.

sample_weight: numpy.ndarray, optional (default=None)

Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

Returns
self
get_info(self)[source]

Collects and returns the information about the configuration of the estimator

Returns
string

Configuration of the estimator.

get_params(self, deep=True)[source]

Get parameters for this estimator.

Parameters
deepboolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

partial_fit(self, X, y, classes=None, sample_weight=None)[source]

Partially (incrementally) fit the model.

Parameters
Xnumpy.ndarray of shape (n_samples, n_features)

The features to train the model.

y: numpy.ndarray of shape (n_samples)

An array-like with the class labels of all samples in X.

classes: numpy.ndarray, list, optional (default=None)

Array with all possible/known class labels. This is an optional parameter, except for the first partial_fit call where it is compulsory.

sample_weight: numpy.ndarray of shape (n_samples), optional (default=None)

Samples weight. If not provided, uniform weights are assumed.

Returns
self
predict(self, X)[source]

Predict classes for the passed data.

Parameters
Xnumpy.ndarray of shape (n_samples, n_features)

The set of data samples to predict the class labels for.

Returns
A numpy.ndarray with all the predictions for the samples in X.
predict_proba(self, X)[source]

Estimates the probability of each sample in X belonging to each of the class-labels.

Class probabilities are calculated as the mean predicted class probabilities per base estimator.

Parameters
X: numpy.ndarray of shape (n_samples, n_features)

Samples for which we want to predict the class probabilities.

Returns
numpy.ndarray of shape (n_samples, n_classes)

Predicted class probabilities for all instances in X. If class labels were specified in a partial_fit call, the order of the columns matches self.classes. If classes were not specified, they are assumed to be 0-indexed. Class probabilities for a sample shall sum to 1 as long as at least one estimators has non-zero predictions. If no estimator can predict probabilities, probabilities of 0 are returned.

reset(self)[source]

Reset ARF.

score(self, X, y, sample_weight=None)[source]

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters
Xarray-like, shape = (n_samples, n_features)

Test samples.

yarray-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like, shape = [n_samples], optional

Sample weights.

Returns
scorefloat

Mean accuracy of self.predict(X) wrt. y.

set_params(self, **params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns
self