skmultiflow.drift_detection.
KSWIN
Kolmogorov-Smirnov Windowing method for concept drift detection.
Probability for the test statistic of the Kolmogorov-Smirnov-Test The alpha parameter is very sensitive, therefore should be set below 0.01.
Size of the sliding window
Size of the statistic window
Already collected data to avoid cold start.
Notes
KSWIN (Kolmogorov-Smirnov Windowing) [1] is a concept change detection method based on the Kolmogorov-Smirnov (KS) statistical test. KS-test is a statistical test with no assumption of underlying data distribution. KSWIN can monitor data or performance distributions. Note that the detector accepts one dimensional input as array.
KSWIN maintains a sliding window \(\Psi\) of fixed size \(n\) (window_size). The last \(r\) (stat_size) samples of \(\Psi\) are assumed to represent the last concept considered as \(R\). From the first \(n-r\) samples of \(\Psi\), \(r\) samples are uniformly drawn, representing an approximated last concept \(W\).
The KS-test is performed on the windows \(R\) and \(W\) of the same size. KS -test compares the distance of the empirical cumulative data distribution \(dist(R,W)\).
A concept drift is detected by KSWIN if:
\(dist(R,W) > \sqrt{-\frac{ln\alpha}{r}}\)
-> The difference in empirical data distributions between the windows \(R\) and \(W\) is too large as that R and W come from the same distribution.
References
Christoph Raab, Moritz Heusinger, Frank-Michael Schleif, Reactive Soft Prototype Computing for Concept Drift Streams, Neurocomputing, 2020,
Examples
>>> # Imports >>> import numpy as np >>> from skmultiflow.data.sea_generator import SEAGenerator >>> from skmultiflow.drift_detection import KSWIN >>> import numpy as np >>> # Initialize KSWIN and a data stream >>> kswin = KSWIN(alpha=0.01) >>> stream = SEAGenerator(classification_function = 2, >>> random_state = 112, balance_classes = False,noise_percentage = 0.28) >>> # Store detections >>> detections = [] >>> # Process stream via KSWIN and print detections >>> for i in range(1000): >>> data = stream.next_sample(10) >>> batch = data[0][0][0] >>> kswin.add_element(batch) >>> if kswin.detected_change(): >>> print("\rIteration {}".format(i)) >>> print("\r KSWINReject Null Hyptheses") >>> detections.append(i) >>> print("Number of detections: "+str(len(detections)))
Methods
add_element(self, input_value)
add_element
Add element to sliding window
detected_change(self)
detected_change
Get detected change
detected_warning_zone(self)
detected_warning_zone
If the change detector supports the warning zone, this function will return whether it’s inside the warning zone or not.
get_info(self)
get_info
Collects and returns the information about the configuration of the estimator
get_length_estimation(self)
get_length_estimation
Returns the length estimation.
get_params(self[, deep])
get_params
Get parameters for this estimator.
reset(self)
reset
Resets the change detector parameters.
set_params(self, **params)
set_params
Set the parameters of this estimator.
Attributes
estimator_type
Adds an element on top of the sliding window and removes the oldest one from the window. Afterwards, the KS-test is performed.
New data sample the sliding window should add.
Whether or not a drift occurred
Whether the change detector is in the warning zone or not.
Configuration of the estimator.
The length estimation
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Parameter names mapped to their values.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
<component>__<parameter>