skmultiflow.drift_detection.KSWIN¶

class skmultiflow.drift_detection.KSWIN(alpha=0.005, window_size=100, stat_size=30, data=None)[source]¶

Kolmogorov-Smirnov Windowing method for concept drift detection.

Parameters

alpha: float (default=0.005): Probability for the test statistic of the Kolmogorov-Smirnov-Test The alpha parameter is very sensitive, therefore should be set below 0.01.
window_size: float (default=100): Size of the sliding window
stat_size: float (default=30): Size of the statistic window
data: numpy.ndarray of shape (n_samples, 1) (default=None,optional): Already collected data to avoid cold start.

Notes

KSWIN (Kolmogorov-Smirnov Windowing) [1] is a concept change detection method based on the Kolmogorov-Smirnov (KS) statistical test. KS-test is a statistical test with no assumption of underlying data distribution. KSWIN can monitor data or performance distributions. Note that the detector accepts one dimensional input as array.

KSWIN maintains a sliding window \(\Psi\) of fixed size \(n\) (window_size). The last \(r\) (stat_size) samples of \(\Psi\) are assumed to represent the last concept considered as \(R\). From the first \(n-r\) samples of \(\Psi\), \(r\) samples are uniformly drawn, representing an approximated last concept \(W\).

The KS-test is performed on the windows \(R\) and \(W\) of the same size. KS -test compares the distance of the empirical cumulative data distribution \(dist(R,W)\).

A concept drift is detected by KSWIN if:

\(dist(R,W) > \sqrt{-\frac{ln\alpha}{r}}\)

-> The difference in empirical data distributions between the windows \(R\) and \(W\) is too large as that R and W come from the same distribution.

References

1: Christoph Raab, Moritz Heusinger, Frank-Michael Schleif, Reactive Soft Prototype Computing for Concept Drift Streams, Neurocomputing, 2020,

Examples

>>> # Imports
>>> import numpy as np
>>> from skmultiflow.data.sea_generator import SEAGenerator
>>> from skmultiflow.drift_detection import KSWIN
>>> import numpy as np
>>> # Initialize KSWIN and a data stream
>>> kswin = KSWIN(alpha=0.01)
>>> stream = SEAGenerator(classification_function = 2,
>>>     random_state = 112, balance_classes = False,noise_percentage = 0.28)
>>> # Store detections
>>> detections = []
>>> # Process stream via KSWIN and print detections
>>> for i in range(1000):
>>>         data = stream.next_sample(10)
>>>         batch = data[0][0][0]
>>>         kswin.add_element(batch)
>>>         if kswin.detected_change():
>>>             print("\rIteration {}".format(i))
>>>             print("\r KSWINReject Null Hyptheses")
>>>             detections.append(i)
>>> print("Number of detections: "+str(len(detections)))

Methods

`add_element`(self, input_value)	Add element to sliding window
`detected_change`(self)	Get detected change
`detected_warning_zone`(self)	If the change detector supports the warning zone, this function will return whether it’s inside the warning zone or not.
`get_info`(self)	Collects and returns the information about the configuration of the estimator
`get_length_estimation`(self)	Returns the length estimation.
`get_params`(self[, deep])	Get parameters for this estimator.
`reset`(self)	Resets the change detector parameters.
`set_params`(self, **params)	Set the parameters of this estimator.

Attributes

estimator_type

add_element(self, input_value)[source]¶

Add element to sliding window

Adds an element on top of the sliding window and removes the oldest one from the window. Afterwards, the KS-test is performed.

Parameters

input_value: ndarray: New data sample the sliding window should add.

detected_change(self)[source]¶

Get detected change

Returns

bool: Whether or not a drift occurred

detected_warning_zone(self)[source]¶

If the change detector supports the warning zone, this function will return whether it’s inside the warning zone or not.

Returns

bool: Whether the change detector is in the warning zone or not.

get_info(self)[source]¶

Collects and returns the information about the configuration of the estimator

Returns

string: Configuration of the estimator.

get_length_estimation(self)[source]¶

Returns the length estimation.

Returns

int: The length estimation

get_params(self, deep=True)[source]¶

Get parameters for this estimator.

Parameters

deepboolean, optional: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsmapping of string to any: Parameter names mapped to their values.

reset(self)[source]¶: Resets the change detector parameters.

set_params(self, **params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

self