skmultiflow.meta.
AdaptiveRandomForestClassifier
Adaptive Random Forest classifier.
Number of trees in the ensemble.
Max number of attributes for each node split.
If int, then consider max_features features at each split.
max_features
If float, then max_features is a percentage and int(max_features * n_features) features are considered at each split.
int(max_features * n_features)
If “auto”, then max_features=sqrt(n_features).
max_features=sqrt(n_features)
If “sqrt”, then max_features=sqrt(n_features) (same as “auto”).
If “log2”, then max_features=log2(n_features).
max_features=log2(n_features)
If None, then max_features=n_features.
max_features=n_features
Weighted vote option.
The lambda value for bagging (lambda=6 corresponds to Leverage Bagging).
Metric used to track trees performance within the ensemble.
‘acc’ - Accuracy
‘kappa’ - Accuracy
Drift Detection method. Set to None to disable Drift detection.
Warning Detection method. Set to None to disable warning detection.
(ARFHoeffdingTreeClassifier parameter) Maximum memory consumed by the tree.
(ARFHoeffdingTreeClassifier parameter) Number of instances between memory consumption checks.
(ARFHoeffdingTreeClassifier parameter) Number of instances a leaf should observe between split attempts.
(ARFHoeffdingTreeClassifier parameter) Split criterion to use.
‘gini’ - Gini
‘info_gain’ - Information Gain
(ARFHoeffdingTreeClassifier parameter) Allowed error in split decision, a value closer to 0 takes longer to decide.
(ARFHoeffdingTreeClassifier parameter) Threshold below which a split will be forced to break ties.
(ARFHoeffdingTreeClassifier parameter) If True, only allow binary splits.
(ARFHoeffdingTreeClassifier parameter) If True, stop growing as soon as memory limit is hit.
(ARFHoeffdingTreeClassifier parameter) If True, disable poor attributes.
(ARFHoeffdingTreeClassifier parameter) If True, disable pre-pruning.
(ARFHoeffdingTreeClassifier parameter) Prediction mechanism used at leafs.
‘mc’ - Majority Class
‘nb’ - Naive Bayes
‘nba’ - Naive Bayes Adaptive
(ARFHoeffdingTreeClassifier parameter) Number of instances a leaf should observe before allowing Naive Bayes.
(ARFHoeffdingTreeClassifier parameter) List of Nominal attributes. If emtpy, then assume that all attributes are numerical.
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Notes
The 3 most important aspects of Adaptive Random Forest [1] are: (1) inducing diversity through re-sampling; (2) inducing diversity through randomly selecting subsets of features for node splits (see skmultiflow.classification.trees.arf_hoeffding_tree); (3) drift detectors per base tree, which cause selective resets in response to drifts. It also allows training background trees, which start training if a warning is detected and replace the active tree if the warning escalates to a drift.
References
Heitor Murilo Gomes, Albert Bifet, Jesse Read, Jean Paul Barddal, Fabricio Enembreck, Bernhard Pfharinger, Geoff Holmes, Talel Abdessalem. Adaptive random forests for evolving data stream classification. In Machine Learning, DOI: 10.1007/s10994-017-5642-8, Springer, 2017.
Examples
>>> # Imports >>> from skmultiflow.data import SEAGenerator >>> from skmultiflow.meta import AdaptiveRandomForestClassifier >>> >>> # Setting up a data stream >>> stream = SEAGenerator(random_state=1) >>> >>> # Setup Adaptive Random Forest Classifier >>> arf = AdaptiveRandomForestClassifier() >>> >>> # Setup variables to control loop and track performance >>> n_samples = 0 >>> correct_cnt = 0 >>> max_samples = 200 >>> >>> # Train the estimator with the samples provided by the data stream >>> while n_samples < max_samples and stream.has_more_samples(): >>> X, y = stream.next_sample() >>> y_pred = arf.predict(X) >>> if y[0] == y_pred[0]: >>> correct_cnt += 1 >>> arf.partial_fit(X, y) >>> n_samples += 1 >>> >>> # Display results >>> print('Adaptive Random Forest ensemble classifier example') >>> print('{} samples analyzed.'.format(n_samples)) >>> print('Accuracy: {}'.format(correct_cnt / n_samples))
Methods
fit(self, X, y[, classes, sample_weight])
fit
Fit the model.
get_info(self)
get_info
Collects and returns the information about the configuration of the estimator
get_params(self[, deep])
get_params
Get parameters for this estimator.
get_votes_for_instance(self, X)
get_votes_for_instance
partial_fit(self, X, y[, classes, sample_weight])
partial_fit
Partially (incrementally) fit the model.
predict(self, X)
predict
Predict classes for the passed data.
predict_proba(self, X)
predict_proba
Estimates the probability of each sample in X belonging to each of the class-labels.
reset(self)
reset
Reset ARF.
score(self, X, y[, sample_weight])
score
Returns the mean accuracy on the given test data and labels.
set_params(self, **params)
set_params
Set the parameters of this estimator.
The features to train the model.
An array-like with the class labels of all samples in X.
Contains all possible/known class labels. Usage varies depending on the learning method.
Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.
Configuration of the estimator.
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Parameter names mapped to their values.
Array with all possible/known class labels. This is an optional parameter, except for the first partial_fit call where it is compulsory.
Samples weight. If not provided, uniform weights are assumed.
The set of data samples to predict the class labels for.
Class probabilities are calculated as the mean predicted class probabilities per base estimator.
Samples for which we want to predict the class probabilities.
Predicted class probabilities for all instances in X. If class labels were specified in a partial_fit call, the order of the columns matches self.classes. If classes were not specified, they are assumed to be 0-indexed. Class probabilities for a sample shall sum to 1 as long as at least one estimators has non-zero predictions. If no estimator can predict probabilities, probabilities of 0 are returned.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Test samples.
True labels for X.
Sample weights.
Mean accuracy of self.predict(X) wrt. y.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
<component>__<parameter>