skmultiflow.data.STAGGERGenerator¶

class skmultiflow.data.STAGGERGenerator(classification_function=0, random_state=None, balance_classes=False)[source]¶

STAGGER concepts stream generator.

This generator is an implementation of the dara stream with abrupt concept drift, as described in Gama, Joao, et al [1].

The STAGGER Concepts are boolean functions f three features encoding objects: size (small, medium and large), shape (circle, square and triangle) and colour (red, blue and green). A classification function is chosen among three possible ones:

Function that return 1 if the size is small and the color is red.
Function that return 1 if the color is green or the shape is a circle.
Function that return 1 if the size is medium or large

Concept drift can be introduced by changing the classification function. This can be done manually or using ConceptDriftStream.

One important feature is the possibility to balance classes, which means the class distribution will tend to a uniform one.

Parameters

classification_function: int (Default: 0): Which of the four classification functions to use for the generation. The value can vary from 0 to 2.
random_state: int, RandomState instance or None, optional (default=None): If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
balance_classes: bool (Default: False): Whether to balance classes or not. If balanced, the class distribution will converge to a uniform distribution.

References

1: Gama, Joao, et al.’s ‘Learning with drift detection. ‘ Advances in artificial intelligence–SBIA 2004. Springer Berlin Heidelberg, 2004. 286-295.”

Examples

>>> # Imports
>>> from skmultiflow.data.stagger_generator import STAGGERGenerator
>>> # Setting up the stream
>>> stream = STAGGERGenerator(classification_function = 2, random_state = 112,
...  balance_classes = False)
>>> # Retrieving one sample
>>> stream.next_sample()
(array([[0., 0., 2.]]), array([0.]))
>>> stream.next_sample(10)
(array([[1., 0., 1.],
   [0., 0., 0.],
   [1., 2., 0.],
   [1., 0., 2.],
   [0., 2., 1.],
   [0., 1., 2.],
   [0., 1., 1.],
   [0., 1., 2.],
   [1., 2., 2.],
   [1., 2., 0.]]), array([1., 0., 1., 1., 0., 0., 0., 0., 1., 1.]))
>>> stream.n_remaining_samples()
-1
>>> stream.has_more_samples()
True

Methods

`classification_function_one`(size, color, shape)	Decides the sample class label as positive if the color is green or shape is a circle.
`classification_function_two`(size, color, shape)	Decides the sample class label as positive if the size is medium or large.
`classification_function_zero`(size, color, shape)	Decides the sample class label as positive if the color is red and size is small.
`generate_drift`(self)	Generate drift by switching the classification function randomly.
`get_data_info`(self)	Retrieves minimum information from the stream
`get_info`(self)	Collects and returns the information about the configuration of the estimator
`get_params`(self[, deep])	Get parameters for this estimator.
`has_more_samples`(self)	Checks if stream has more samples.
`is_restartable`(self)	Determine if the stream is restartable.
`last_sample`(self)	Retrieves last batch_size samples in the stream.
`n_remaining_samples`(self)	Returns the estimated number of remaining samples.
`next_sample`(self[, batch_size])	Returns next sample from the stream.
`prepare_for_use`()	Prepare the stream for use.
`reset`(self)	Resets the estimator to its initial state.
`restart`(self)	Restart the stream.
`set_params`(self, **params)	Set the parameters of this estimator.

Attributes

`balance_classes`	Retrieve the value of the option: Balance classes
`classification_function`	Retrieve the index of the current classification function.
`feature_names`	Retrieve the names of the features.
`n_cat_features`	Retrieve the number of integer features.
`n_features`	Retrieve the number of features.
`n_num_features`	Retrieve the number of numerical features.
`n_targets`	Retrieve the number of targets
`target_names`	Retrieve the names of the targets
`target_values`	Retrieve all target_values in the stream for each target.

property balance_classes¶

Retrieve the value of the option: Balance classes

Returns

Boolean: True is the classes are balanced

property classification_function¶

Retrieve the index of the current classification function.

Returns

int: index of the classification function from 0 to 2.

static classification_function_one(size, color, shape)[source]¶

Decides the sample class label as positive if the color is green or shape is a circle.

Parameters

size: int: First numeric attribute.
color: int: Second boolean attribute.
shape: int: Third boolean attribute

Returns

int: Returns the sample class label, either 0 or 1.

static classification_function_two(size, color, shape)[source]¶

Decides the sample class label as positive if the size is medium or large.

Parameters

size: int: First numeric attribute.
color: int: Second boolean attribute.
shape: int: Third boolean attribute

Returns

int: Returns the sample class label, either 0 or 1.

static classification_function_zero(size, color, shape)[source]¶

Decides the sample class label as positive if the color is red and size is small.

Parameters

size: int: First numeric attribute.
color: int: Second boolean attribute.
shape: int: Third boolean attribute

Returns

int: Returns the sample class label, either 0 or 1.

property feature_names¶

Retrieve the names of the features.

Returns

list: names of the features

generate_drift(self)[source]¶: Generate drift by switching the classification function randomly.

get_data_info(self)[source]¶

Retrieves minimum information from the stream

Used by evaluator methods to id the stream.

The default format is: ‘Stream name - n_targets, n_classes, n_features’.

Returns

string: Stream data information

get_info(self)[source]¶

Collects and returns the information about the configuration of the estimator

Returns

string: Configuration of the estimator.

get_params(self, deep=True)[source]¶

Get parameters for this estimator.

Parameters

deepboolean, optional: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsmapping of string to any: Parameter names mapped to their values.

has_more_samples(self)[source]¶

Checks if stream has more samples.

Returns

Boolean: True if stream has more samples.

is_restartable(self)[source]¶

Determine if the stream is restartable.

Returns

Bool: True if stream is restartable.

last_sample(self)[source]¶

Retrieves last batch_size samples in the stream.

Returns

tuple or tuple list: A numpy.ndarray of shape (batch_size, n_features) and an array-like of shape (batch_size, n_targets), representing the next batch_size samples.

property n_cat_features¶

Retrieve the number of integer features.

Returns

int: The number of integer features in the stream.

property n_features¶

Retrieve the number of features.

Returns

int: The total number of features.

property n_num_features¶

Retrieve the number of numerical features.

Returns

int: The number of numerical features in the stream.

n_remaining_samples(self)[source]¶

Returns the estimated number of remaining samples.

Returns

int: Remaining number of samples. -1 if infinite (e.g. generator)

property n_targets¶

Retrieve the number of targets

Returns

int: the number of targets in the stream.

next_sample(self, batch_size=1)[source]¶

Returns next sample from the stream.

The sample generation works as follows: The three attributes are generated with the random int generator, initialized with the seed passed by the user. Then, the classification function decides whether to classify the instance as class 0 or class 1. The next step is to verify if the classes should be balanced, and if so, balance the classes.

The generated sample will have relevant features and 1 label (it has one classification task).

Parameters

batch_size: int (optional, default=1): The number of samples to return.

Returns

tuple or tuple list: Return a tuple with the features matrix and the labels matrix for the batch_size samples that were requested.

static prepare_for_use()[source]¶

Prepare the stream for use.

Deprecated in v0.5.0 and will be removed in v0.7.0

reset(self)[source]¶

Resets the estimator to its initial state.

Returns

self

restart(self)[source]¶: Restart the stream.

set_params(self, **params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

self

property target_names¶

Retrieve the names of the targets

Returns

list: the names of the targets in the stream.

property target_values¶

Retrieve all target_values in the stream for each target.

Returns

list: list of lists of all target_values for each target