skmultiflow.drift_detection.KSWIN

class skmultiflow.drift_detection.KSWIN(alpha=0.005, window_size=100, stat_size=30, data=None)[source]

Kolmogorov-Smirnov Windowing method for concept drift detection.

Parameters
alpha: float (default=0.005)

Probability for the test statistic of the Kolmogorov-Smirnov-Test The alpha parameter is very sensitive, therefore should be set below 0.01.

window_size: float (default=100)

Size of the sliding window

stat_size: float (default=30)

Size of the statistic window

data: numpy.ndarray of shape (n_samples, 1) (default=None,optional)

Already collected data to avoid cold start.

Notes

KSWIN (Kolmogorov-Smirnov Windowing) [1] is a concept change detection method based on the Kolmogorov-Smirnov (KS) statistical test. KS-test is a statistical test with no assumption of underlying data distribution. KSWIN can monitor data or performance distributions. Note that the detector accepts one dimensional input as array.

KSWIN maintains a sliding window \(\Psi\) of fixed size \(n\) (window_size). The last \(r\) (stat_size) samples of \(\Psi\) are assumed to represent the last concept considered as \(R\). From the first \(n-r\) samples of \(\Psi\), \(r\) samples are uniformly drawn, representing an approximated last concept \(W\).

The KS-test is performed on the windows \(R\) and \(W\) of the same size. KS -test compares the distance of the empirical cumulative data distribution \(dist(R,W)\).

A concept drift is detected by KSWIN if:

  • \(dist(R,W) > \sqrt{-\frac{ln\alpha}{r}}\)

-> The difference in empirical data distributions between the windows \(R\) and \(W\) is too large as that R and W come from the same distribution.

References

1

Christoph Raab, Moritz Heusinger, Frank-Michael Schleif, Reactive Soft Prototype Computing for Concept Drift Streams, Neurocomputing, 2020,

Examples

>>> # Imports
>>> import numpy as np
>>> from skmultiflow.data.sea_generator import SEAGenerator
>>> from skmultiflow.drift_detection import KSWIN
>>> import numpy as np
>>> # Initialize KSWIN and a data stream
>>> kswin = KSWIN(alpha=0.01)
>>> stream = SEAGenerator(classification_function = 2,
>>>     random_state = 112, balance_classes = False,noise_percentage = 0.28)
>>> # Store detections
>>> detections = []
>>> # Process stream via KSWIN and print detections
>>> for i in range(1000):
>>>         data = stream.next_sample(10)
>>>         batch = data[0][0][0]
>>>         kswin.add_element(batch)
>>>         if kswin.detected_change():
>>>             print("\rIteration {}".format(i))
>>>             print("\r KSWINReject Null Hyptheses")
>>>             detections.append(i)
>>> print("Number of detections: "+str(len(detections)))

Methods

add_element(self, input_value)

Add element to sliding window

detected_change(self)

Get detected change

detected_warning_zone(self)

If the change detector supports the warning zone, this function will return whether it’s inside the warning zone or not.

get_info(self)

Collects and returns the information about the configuration of the estimator

get_length_estimation(self)

Returns the length estimation.

get_params(self[, deep])

Get parameters for this estimator.

reset(self)

Resets the change detector parameters.

set_params(self, **params)

Set the parameters of this estimator.

Attributes

estimator_type

add_element(self, input_value)[source]

Add element to sliding window

Adds an element on top of the sliding window and removes the oldest one from the window. Afterwards, the KS-test is performed.

Parameters
input_value: ndarray

New data sample the sliding window should add.

detected_change(self)[source]

Get detected change

Returns
bool

Whether or not a drift occurred

detected_warning_zone(self)[source]

If the change detector supports the warning zone, this function will return whether it’s inside the warning zone or not.

Returns
bool

Whether the change detector is in the warning zone or not.

get_info(self)[source]

Collects and returns the information about the configuration of the estimator

Returns
string

Configuration of the estimator.

get_length_estimation(self)[source]

Returns the length estimation.

Returns
int

The length estimation

get_params(self, deep=True)[source]

Get parameters for this estimator.

Parameters
deepboolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

reset(self)[source]

Resets the change detector parameters.

set_params(self, **params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns
self