skmultiflow.data.MIXEDGenerator¶

class skmultiflow.data.MIXEDGenerator(classification_function=0, random_state=None, balance_classes=False)[source]¶

Mixed data stream generator.

This generator is an implementation of a data stream with abrupt concept drift and boolean noise-free examples as described in Gama, João, et al [1].

It has four relevant attributes, two boolean attributes \(v, w\) and two numeric attributes \(x, y\) uniformly distributed from 0 to 1. The examples are labeled depending on the classification function chosen from below.

function 0:
if \(v\) and \(w\) are true or \(v\) and \(z\) are true or \(w\) are true then 0 else 1, where \(z\) is \(y < 0.5 + 0.3 sin(3 \pi x)\)
function 1:
The opposite of function 0.

Concept drift can be introduced by changing the classification function. This can be done manually or using ConceptDriftStream.

Parameters

classification_function: int (default: 0): Which of the two classification functions to use for the generation. Valid options are 0 or 1.
random_state: int, RandomState instance or None, optional (default=None): If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
balance_classes: bool (Default: False): Whether to balance classes or not. If balanced, the class distribution will converge to a uniform distribution.

References

1: Gama, Joao, et al. “Learning with drift detection.” Advances in artificial intelligence–SBIA 2004. Springer Berlin Heidelberg, 2004. 286-295”

Examples

>>> # Imports
>>> from skmultiflow.data.mixed_generator import MIXEDGenerator
>>> # Setting up the stream
>>> stream = MIXEDGenerator(classification_function = 1, random_state= 112,
... balance_classes = False)
>>> # Retrieving one sample
>>> stream.next_sample()
(array([[0.        , 1.        , 0.95001658, 0.0756772 ]]), array([1.]))

>>> stream.next_sample(10)
(array([[1.        , 1.        , 0.05480574, 0.81767738],
       [1.        , 1.        , 0.00255603, 0.98119928],
       [0.        , 0.        , 0.39464259, 0.00494492],
       [1.        , 1.        , 0.82060937, 0.344983  ],
       [0.        , 1.        , 0.08623151, 0.54607394],
       [0.        , 0.        , 0.04500817, 0.33218776],
       [1.        , 1.        , 0.70936161, 0.18840112],
       [1.        , 0.        , 0.50315448, 0.76353033],
       [1.        , 1.        , 0.21415209, 0.76309258],
       [0.        , 1.        , 0.42563042, 0.23435109]]),
       array([1., 1., 0., 1., 1., 0., 1., 0., 1., 1.]))

>>> stream.n_remaining_samples()
-1
>>> stream.has_more_samples()
True

Methods

`generate_drift`(self)	Generate drift by switching the classification function.
`get_data_info`(self)	Retrieves minimum information from the stream
`get_info`(self)	Collects and returns the information about the configuration of the estimator
`get_params`(self[, deep])	Get parameters for this estimator.
`has_more_samples`(self)	Checks if stream has more samples.
`is_restartable`(self)	Determine if the stream is restartable.
`last_sample`(self)	Retrieves last batch_size samples in the stream.
`n_remaining_samples`(self)	Returns the estimated number of remaining samples.
`next_sample`(self[, batch_size])	Returns next sample from the stream.
`prepare_for_use`()	Prepare the stream for use.
`reset`(self)	Resets the estimator to its initial state.
`restart`(self)	Restart the stream.
`set_params`(self, **params)	Set the parameters of this estimator.

Attributes

`balance_classes`	Retrieve the value of the option: Balance classes
`classification_function`	Retrieve the index of the current classification function.
`feature_names`	Retrieve the names of the features.
`n_cat_features`	Retrieve the number of integer features.
`n_features`	Retrieve the number of features.
`n_num_features`	Retrieve the number of numerical features.
`n_targets`	Retrieve the number of targets
`target_names`	Retrieve the names of the targets
`target_values`	Retrieve all target_values in the stream for each target.

property balance_classes¶

Retrieve the value of the option: Balance classes

Returns

Boolean: True is the classes are balanced

property classification_function¶

Retrieve the index of the current classification function.

Returns

int: index of the classification function [0,1]

property feature_names¶

Retrieve the names of the features.

Returns

list: names of the features

generate_drift(self)[source]¶: Generate drift by switching the classification function.

get_data_info(self)[source]¶

Retrieves minimum information from the stream

Used by evaluator methods to id the stream.

The default format is: ‘Stream name - n_targets, n_classes, n_features’.

Returns

string: Stream data information

get_info(self)[source]¶

Collects and returns the information about the configuration of the estimator

Returns

string: Configuration of the estimator.

get_params(self, deep=True)[source]¶

Get parameters for this estimator.

Parameters

deepboolean, optional: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsmapping of string to any: Parameter names mapped to their values.

has_more_samples(self)[source]¶

Checks if stream has more samples.

Returns

Boolean: True if stream has more samples.

is_restartable(self)[source]¶

Determine if the stream is restartable.

Returns

Bool: True if stream is restartable.

last_sample(self)[source]¶

Retrieves last batch_size samples in the stream.

Returns

tuple or tuple list: A numpy.ndarray of shape (batch_size, n_features) and an array-like of shape (batch_size, n_targets), representing the next batch_size samples.

property n_cat_features¶

Retrieve the number of integer features.

Returns

int: The number of integer features in the stream.

property n_features¶

Retrieve the number of features.

Returns

int: The total number of features.

property n_num_features¶

Retrieve the number of numerical features.

Returns

int: The number of numerical features in the stream.

n_remaining_samples(self)[source]¶

Returns the estimated number of remaining samples.

Returns

int: Remaining number of samples. -1 if infinite (e.g. generator)

property n_targets¶

Retrieve the number of targets

Returns

int: the number of targets in the stream.

next_sample(self, batch_size=1)[source]¶

Returns next sample from the stream.

The sample generation works as follows: The two numeric attributes are generated with the random generator, initialized with the seed passed by the user. The boolean attributes are either 0 or 1 based on the comparison of the random generator and 0.5 , the classification function decides whether to classify the instance as class 0 or class 1. The next step is to verify if the classes should be balanced, and if so, balance the classes.

The generated sample will have 4 relevant features and 1 label (it has one classification task).

Parameters

batch_size: int (optional, default=1): The number of samples to return.

Returns

tuple or tuple list: Return a tuple with the features matrix and the labels matrix for

static prepare_for_use()[source]¶

Prepare the stream for use.

Deprecated in v0.5.0 and will be removed in v0.7.0

reset(self)[source]¶

Resets the estimator to its initial state.

Returns

self

restart(self)[source]¶: Restart the stream.

set_params(self, **params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

self

property target_names¶

Retrieve the names of the targets

Returns

list: the names of the targets in the stream.

property target_values¶

Retrieve all target_values in the stream for each target.

Returns

list: list of lists of all target_values for each target