skmultiflow.meta.AdaptiveRandomForestClassifier¶

class skmultiflow.meta.AdaptiveRandomForestClassifier(n_estimators=10, max_features='auto', disable_weighted_vote=False, lambda_value=6, performance_metric='acc', drift_detection_method=ADWIN(delta=0.001), warning_detection_method=ADWIN(delta=0.01), max_byte_size=33554432, memory_estimate_period=2000000, grace_period=50, split_criterion='info_gain', split_confidence=0.01, tie_threshold=0.05, binary_split=False, stop_mem_management=False, remove_poor_atts=False, no_preprune=False, leaf_prediction='nba', nb_threshold=0, nominal_attributes=None, random_state=None)[source]¶

Adaptive Random Forest classifier.

Parameters

n_estimators: int, optional (default=10)

Number of trees in the ensemble.

max_featuresint, float, string or None, optional (default=”auto”)

Max number of attributes for each node split.

If int, then consider max_features features at each split.
If float, then max_features is a percentage and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features) (same as “auto”).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features.

disable_weighted_vote: bool, optional (default=False)

Weighted vote option.

lambda_value: int, optional (default=6)

The lambda value for bagging (lambda=6 corresponds to Leverage Bagging).

performance_metric: string, optional (default=”acc”)

Metric used to track trees performance within the ensemble.

‘acc’ - Accuracy
‘kappa’ - Accuracy

drift_detection_method: BaseDriftDetector or None, optional (default=ADWIN(0.001))

Drift Detection method. Set to None to disable Drift detection.

warning_detection_method: BaseDriftDetector or None, default(ADWIN(0.01))

Warning Detection method. Set to None to disable warning detection.

max_byte_size: int, optional (default=33554432)

(ARFHoeffdingTreeClassifier parameter) Maximum memory consumed by the tree.

memory_estimate_period: int, optional (default=2000000)

(ARFHoeffdingTreeClassifier parameter) Number of instances between memory consumption checks.

grace_period: int, optional (default=50)

(ARFHoeffdingTreeClassifier parameter) Number of instances a leaf should observe between split attempts.

split_criterion: string, optional (default=’info_gain’)

(ARFHoeffdingTreeClassifier parameter) Split criterion to use.

‘gini’ - Gini
‘info_gain’ - Information Gain

split_confidence: float, optional (default=0.01)

(ARFHoeffdingTreeClassifier parameter) Allowed error in split decision, a value closer to 0 takes longer to decide.

tie_threshold: float, optional (default=0.05)

(ARFHoeffdingTreeClassifier parameter) Threshold below which a split will be forced to break ties.

binary_split: bool, optional (default=False)

(ARFHoeffdingTreeClassifier parameter) If True, only allow binary splits.

stop_mem_management: bool, optional (default=False)

(ARFHoeffdingTreeClassifier parameter) If True, stop growing as soon as memory limit is hit.

remove_poor_atts: bool, optional (default=False)

(ARFHoeffdingTreeClassifier parameter) If True, disable poor attributes.

no_preprune: bool, optional (default=False)

(ARFHoeffdingTreeClassifier parameter) If True, disable pre-pruning.

leaf_prediction: string, optional (default=’nba’)

(ARFHoeffdingTreeClassifier parameter) Prediction mechanism used at leafs.

‘mc’ - Majority Class
‘nb’ - Naive Bayes
‘nba’ - Naive Bayes Adaptive

nb_threshold: int, optional (default=0)

(ARFHoeffdingTreeClassifier parameter) Number of instances a leaf should observe before allowing Naive Bayes.

nominal_attributes: list, optional

(ARFHoeffdingTreeClassifier parameter) List of Nominal attributes. If emtpy, then assume that all attributes are numerical.

random_state: int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Notes

The 3 most important aspects of Adaptive Random Forest [1] are: (1) inducing diversity through re-sampling; (2) inducing diversity through randomly selecting subsets of features for node splits (see skmultiflow.classification.trees.arf_hoeffding_tree); (3) drift detectors per base tree, which cause selective resets in response to drifts. It also allows training background trees, which start training if a warning is detected and replace the active tree if the warning escalates to a drift.

References

1: Heitor Murilo Gomes, Albert Bifet, Jesse Read, Jean Paul Barddal, Fabricio Enembreck, Bernhard Pfharinger, Geoff Holmes, Talel Abdessalem. Adaptive random forests for evolving data stream classification. In Machine Learning, DOI: 10.1007/s10994-017-5642-8, Springer, 2017.

Examples

>>> # Imports
>>> from skmultiflow.data import SEAGenerator
>>> from skmultiflow.meta import AdaptiveRandomForestClassifier
>>>
>>> # Setting up a data stream
>>> stream = SEAGenerator(random_state=1)
>>>
>>> # Setup Adaptive Random Forest Classifier
>>> arf = AdaptiveRandomForestClassifier()
>>>
>>> # Setup variables to control loop and track performance
>>> n_samples = 0
>>> correct_cnt = 0
>>> max_samples = 200
>>>
>>> # Train the estimator with the samples provided by the data stream
>>> while n_samples < max_samples and stream.has_more_samples():
>>>     X, y = stream.next_sample()
>>>     y_pred = arf.predict(X)
>>>     if y[0] == y_pred[0]:
>>>         correct_cnt += 1
>>>     arf.partial_fit(X, y)
>>>     n_samples += 1
>>>
>>> # Display results
>>> print('Adaptive Random Forest ensemble classifier example')
>>> print('{} samples analyzed.'.format(n_samples))
>>> print('Accuracy: {}'.format(correct_cnt / n_samples))

Methods

`fit`(self, X, y[, classes, sample_weight])	Fit the model.
`get_info`(self)	Collects and returns the information about the configuration of the estimator
`get_params`(self[, deep])	Get parameters for this estimator.
`get_votes_for_instance`(self, X)
`partial_fit`(self, X, y[, classes, sample_weight])	Partially (incrementally) fit the model.
`predict`(self, X)	Predict classes for the passed data.
`predict_proba`(self, X)	Estimates the probability of each sample in X belonging to each of the class-labels.
`reset`(self)	Reset ARF.
`score`(self, X, y[, sample_weight])	Returns the mean accuracy on the given test data and labels.
`set_params`(self, **params)	Set the parameters of this estimator.

fit(self, X, y, classes=None, sample_weight=None)[source]¶

Fit the model.

Parameters

Xnumpy.ndarray of shape (n_samples, n_features): The features to train the model.
y: numpy.ndarray of shape (n_samples, n_targets): An array-like with the class labels of all samples in X.
classes: numpy.ndarray, optional (default=None): Contains all possible/known class labels. Usage varies depending on the learning method.
sample_weight: numpy.ndarray, optional (default=None): Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

Returns

self

get_info(self)[source]¶

Collects and returns the information about the configuration of the estimator

Returns

string: Configuration of the estimator.

get_params(self, deep=True)[source]¶

Get parameters for this estimator.

Parameters

deepboolean, optional: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsmapping of string to any: Parameter names mapped to their values.

partial_fit(self, X, y, classes=None, sample_weight=None)[source]¶

Partially (incrementally) fit the model.

Parameters

Xnumpy.ndarray of shape (n_samples, n_features): The features to train the model.
y: numpy.ndarray of shape (n_samples): An array-like with the class labels of all samples in X.
classes: numpy.ndarray, list, optional (default=None): Array with all possible/known class labels. This is an optional parameter, except for the first partial_fit call where it is compulsory.
sample_weight: numpy.ndarray of shape (n_samples), optional (default=None): Samples weight. If not provided, uniform weights are assumed.

Returns

self

predict(self, X)[source]¶

Predict classes for the passed data.

Parameters

Xnumpy.ndarray of shape (n_samples, n_features): The set of data samples to predict the class labels for.

Returns

A numpy.ndarray with all the predictions for the samples in X.

predict_proba(self, X)[source]¶

Estimates the probability of each sample in X belonging to each of the class-labels.

Class probabilities are calculated as the mean predicted class probabilities per base estimator.

Parameters

X: numpy.ndarray of shape (n_samples, n_features): Samples for which we want to predict the class probabilities.

Returns

numpy.ndarray of shape (n_samples, n_classes): Predicted class probabilities for all instances in X. If class labels were specified in a partial_fit call, the order of the columns matches self.classes. If classes were not specified, they are assumed to be 0-indexed. Class probabilities for a sample shall sum to 1 as long as at least one estimators has non-zero predictions. If no estimator can predict probabilities, probabilities of 0 are returned.

reset(self)[source]¶: Reset ARF.

score(self, X, y, sample_weight=None)[source]¶

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

Xarray-like, shape = (n_samples, n_features): Test samples.
yarray-like, shape = (n_samples) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like, shape = [n_samples], optional: Sample weights.

Returns

scorefloat: Mean accuracy of self.predict(X) wrt. y.

set_params(self, **params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

self