skmultiflow.drift_detection.
ADWIN
Adaptive Windowing method for concept drift detection.
The delta parameter for the ADWIN algorithm.
Notes
ADWIN [1] (ADaptive WINdowing) is an adaptive sliding window algorithm for detecting change, and keeping updated statistics about a data stream. ADWIN allows algorithms not adapted for drifting data, to be resistant to this phenomenon.
The general idea is to keep statistics from a window of variable size while detecting concept drift.
The algorithm will decide the size of the window by cutting the statistics’ window at different points and analysing the average of some statistic over these two windows. If the absolute value of the difference between the two averages surpasses a pre-defined threshold, change is detected at that point and all data before that time is discarded.
References
Bifet, Albert, and Ricard Gavalda. “Learning from time-changing data with adaptive windowing.” In Proceedings of the 2007 SIAM international conference on data mining, pp. 443-448. Society for Industrial and Applied Mathematics, 2007.
Examples
>>> # Imports >>> import numpy as np >>> from skmultiflow.drift_detection.adwin import ADWIN >>> adwin = ADWIN() >>> # Simulating a data stream as a normal distribution of 1's and 0's >>> data_stream = np.random.randint(2, size=2000) >>> # Changing the data concept from index 999 to 2000 >>> for i in range(999, 2000): ... data_stream[i] = np.random.randint(4, high=8) >>> # Adding stream elements to ADWIN and verifying if drift occurred >>> for i in range(2000): ... adwin.add_element(data_stream[i]) ... if adwin.detected_change(): ... print('Change detected in data: ' + str(data_stream[i]) + ' - at index: ' + str(i))
Methods
add_element(self, value)
add_element
Add a new element to the sample window.
bucket_size(row)
bucket_size
delete_element(self)
delete_element
Delete an Item from the bucket list.
detected_change(self)
detected_change
Detects concept change in a drifting data stream.
detected_warning_zone(self)
detected_warning_zone
If the change detector supports the warning zone, this function will return whether it’s inside the warning zone or not.
get_change(self)
get_change
Get drift
get_info(self)
get_info
Collects and returns the information about the configuration of the estimator
get_length_estimation(self)
get_length_estimation
Returns the length estimation.
get_params(self[, deep])
get_params
Get parameters for this estimator.
reset(self)
reset
Reset detectors
reset_change(self)
reset_change
set_clock(self, clock)
set_clock
set_params(self, **params)
set_params
Set the parameters of this estimator.
Attributes
MAX_BUCKETS
estimation
estimator_type
n_detections
total
variance
width
width_t
Apart from adding the element value to the window, by inserting it in the correct bucket, it will also update the relevant statistics, in this case the total sum of all values, the window width and the total variance.
The value parameter can be any numeric value relevant to the analysis of concept change. For the learners in this framework we are using either 0’s or 1’s, that are interpreted as follows: 0: Means the learners prediction was wrong 1: Means the learners prediction was correct
This function should be used at every new sample analysed.
Deletes the last Item and updates relevant statistics kept by ADWIN.
The bucket size from the updated bucket
The ADWIN algorithm is described in Bifet and Gavaldà’s ‘Learning from Time-Changing Data with Adaptive Windowing’. The general idea is to keep statistics from a window of variable size while detecting concept drift.
This function is responsible for analysing different cutting points in the sliding window, to verify if there is a significant change in concept.
Whether change was detected or not
If change was detected, one should verify the new window size, by reading the width property.
Whether the change detector is in the warning zone or not.
Whether or not a drift occurred
Configuration of the estimator.
The length estimation
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Parameter names mapped to their values.
Resets statistics and adwin’s window.
self
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
<component>__<parameter>