skmultiflow.meta.StreamingRandomPatchesClassifier¶

class skmultiflow.meta.StreamingRandomPatchesClassifier(base_estimator=HoeffdingTreeClassifier(binary_split=False, grace_period=50, leaf_prediction='nba', max_byte_size=33554432, memory_estimate_period=1000000, nb_threshold=0, no_preprune=False, nominal_attributes=None, remove_poor_atts=False, split_confidence=0.01, split_criterion='info_gain', stop_mem_management=False, tie_threshold=0.05), n_estimators=100, subspace_mode='percentage', subspace_size=60, training_method='randompatches', lam=6.0, drift_detection_method=ADWIN(delta=1e-05), warning_detection_method=ADWIN(delta=0.0001), disable_weighted_vote=False, disable_drift_detection=False, disable_background_learner=False, nominal_attributes=None, random_state=None)[source]¶

Streaming Random Patches ensemble classifier.

Parameters

base_estimator: BaseSKMObject or sklearn.BaseObject, (default=HoeffdingTreeClassifier): The base estimator.
n_estimators: int, (default=100): Number of members in the ensemble.
subspace_mode: str, (default=’percentage’): Indicates how m, defined by subspace_size, is interpreted. M represents the total number of features.

Only applies when training method is random subspaces or random patches.

‘m’ - Specified value

‘sqrtM1’ - sqrt(M)+1

‘MsqrtM1’ - M-(sqrt(M)+1)

‘percentage’ - Percentage
subspace_size: int, (default=60): Number of features per subset for each classifier. Negative value means total_features - subspace_size.
training_method: str, (default=’randompatches’): The training method to use.

‘randomsubspaces’ - Random subspaces

‘resampling’ - Resampling (bagging)

‘randompatches’ - Random patches
lam: float, (default=6.0): Lambda value for bagging.
drift_detection_method: BaseDriftDetector, (default=ADWIN(delta=1e-5)): Drift detection method.
warning_detection_method: BaseDriftDetector, (default=ADWIN(delta=1e-4)): Warning detection method.
disable_weighted_vote: bool (default=False): If True, disables weighted voting.
disable_drift_detection: bool (default=False): If True, disables drift detection and background learner.
disable_background_learner: bool (default=False): If True, disables background learner and trees are reset immediately if drift is detected.
nominal_attributes: list, optional: List of Nominal attributes. If emtpy, then assume that all attributes are numerical.
random_state: int, RandomState instance or None, optional (default=None): If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Notes

The Streaming Random Patches (SRP) [1] ensemble method simulates bagging or random subspaces. The default algorithm uses both bagging and random subspaces, namely Random Patches. The default base estimator is a Hoeffding Tree, but it can be used with any other base estimator (differently from random forest variations).

References

1: Heitor Murilo Gomes, Jesse Read, Albert Bifet. Streaming Random Patches for Evolving Data Stream Classification. IEEE International Conference on Data Mining (ICDM), 2019.

Examples

>>> from skmultiflow.data import AGRAWALGenerator
>>> from skmultiflow.meta import StreamingRandomPatchesClassifier
>>>
>>> stream = AGRAWALGenerator(random_state=1)
>>> srp = StreamingRandomPatchesClassifier(random_state=1,
>>>                                              n_estimators=3)
>>>
>>> # Variables to control loop and track performance
>>> n_samples = 0
>>> correct_cnt = 0
>>> max_samples = 200
>>>
>>> # Run test-then-train loop for max_samples
>>> # or while there is data in the stream
>>> while n_samples < max_samples and stream.has_more_samples():
>>>     X, y = stream.next_sample()
>>>     y_pred = srp.predict(X)
>>>     if y[0] == y_pred[0]:
>>>         correct_cnt += 1
>>>     srp.partial_fit(X, y)
>>>     n_samples += 1
>>>
>>> print('{} samples analyzed.'.format(n_samples))

Methods

`fit`(self, X, y[, classes, sample_weight])	Fit the model.
`get_info`(self)	Collects and returns the information about the configuration of the estimator
`get_params`(self[, deep])	Get parameters for this estimator.
`partial_fit`(self, X, y[, classes, sample_weight])	Partially (incrementally) fit the model.
`predict`(self, X)	Predict classes for the passed data.
`predict_proba`(self, X)	Estimate the probability of X belonging to each class-labels.
`reset`(self)	Resets the estimator to its initial state.
`score`(self, X, y[, sample_weight])	Returns the mean accuracy on the given test data and labels.
`set_params`(self, **params)	Set the parameters of this estimator.

fit(self, X, y, classes=None, sample_weight=None)[source]¶

Fit the model.

Parameters

Xnumpy.ndarray of shape (n_samples, n_features): The features to train the model.
y: numpy.ndarray of shape (n_samples, n_targets): An array-like with the class labels of all samples in X.
classes: numpy.ndarray, optional (default=None): Contains all possible/known class labels. Usage varies depending on the learning method.
sample_weight: numpy.ndarray, optional (default=None): Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

Returns

self

get_info(self)[source]¶

Collects and returns the information about the configuration of the estimator

Returns

string: Configuration of the estimator.

get_params(self, deep=True)[source]¶

Get parameters for this estimator.

Parameters

deepboolean, optional: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsmapping of string to any: Parameter names mapped to their values.

partial_fit(self, X, y, classes=None, sample_weight=None)[source]¶

Partially (incrementally) fit the model.

Parameters

Xnumpy.ndarray of shape (n_samples, n_features): The features to train the model.
y: numpy.ndarray of shape (n_samples): An array-like with the class labels of all samples in X.
classes: numpy.ndarray, optional (default=None): No used.
sample_weight: numpy.ndarray of shape (n_samples), optional (default=None): Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

Returns

self

predict(self, X)[source]¶

Predict classes for the passed data.

Parameters

Xnumpy.ndarray of shape (n_samples, n_features): The set of data samples to predict the class labels for.

Returns

A numpy.ndarray with all the predictions for the samples in X.

predict_proba(self, X)[source]¶

Estimate the probability of X belonging to each class-labels.

Parameters

Xnumpy.ndarray of shape (n_samples, n_features): Samples one wants to predict the class probabilities for.

Returns

A numpy.ndarray of shape (n_samples, n_labels), in which each outer
entry is associated with the X entry of the same index. And where the
list in index [i] contains len(self.target_values) elements, each of
which represents the probability that the i-th sample of X belongs to
a certain class-label.

reset(self)[source]¶

Resets the estimator to its initial state.

Returns

self

score(self, X, y, sample_weight=None)[source]¶

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

Xarray-like, shape = (n_samples, n_features): Test samples.
yarray-like, shape = (n_samples) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like, shape = [n_samples], optional: Sample weights.

Returns

scorefloat: Mean accuracy of self.predict(X) wrt. y.

set_params(self, **params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

self