skmultiflow.data.HyperplaneGenerator¶

class skmultiflow.data.HyperplaneGenerator(random_state=None, n_features=10, n_drift_features=2, mag_change=0.0, noise_percentage=0.05, sigma_percentage=0.1)[source]¶

Hyperplane stream generator.

Generates a problem of prediction class of a rotation hyperplane. It was used as testbed for CVFDT and VFDT in [1].

A hyperplane in d-dimensional space is the set of points \(x\) that satisfy \(\sum^{d}_{i=1} w_i x_i = w_0 = \sum^{d}_{i=1} w_i\), where \(x_i\) is the ith coordinate of \(x\). Examples for which \(\sum^{d}_{i=1} w_i x_i > w_0\), are labeled positive, and examples for which \(\sum^{d}_{i=1} w_i x_i \leq w_0\), are labeled negative.

Hyperplanes are useful for simulating time-changing concepts, because we can change the orientation and position of the hyperplane in a smooth manner by changing the relative size of the weights. We introduce change to this dataset by adding drift to each weight feature \(w_i = w_i + d \sigma\), where \(\sigma\) is the probability that the direction of change is reversed and \(d\) is the change applied to every example.

Parameters

random_state: int, RandomState instance or None, optional (default=None): If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
n_features: int (Default 10): The number of attributes to generate. Higher than 2.
n_drift_features: int (Default: 2): The number of attributes with drift. Higher than 2.
mag_change: float (Default: 0.0): Magnitude of the change for every example. From 0.0 to 1.0.
noise_percentage: float (Default: 0.05): Percentage of noise to add to the data. From 0.0 to 1.0.
sigma_percentage: int (Default 0.1): Percentage of probability that the direction of change is reversed. From 0.0 to 1.0.

References

1: G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In KDD’01, pages 97–106, San Francisco, CA, 2001. ACM Press.

Methods

`get_data_info`(self)	Retrieves minimum information from the stream
`get_info`(self)	Collects and returns the information about the configuration of the estimator
`get_params`(self[, deep])	Get parameters for this estimator.
`has_more_samples`(self)	Checks if stream has more samples.
`is_restartable`(self)	Determine if the stream is restartable.
`last_sample`(self)	Retrieves last batch_size samples in the stream.
`n_remaining_samples`(self)	Returns the estimated number of remaining samples.
`next_sample`(self[, batch_size])	Returns next sample from the stream.
`prepare_for_use`()	Prepare the stream for use.
`reset`(self)	Resets the estimator to its initial state.
`restart`(self)	Restart the stream.
`set_params`(self, **params)	Set the parameters of this estimator.

Attributes

`feature_names`	Retrieve the names of the features.
`mag_change`	Retrieve the value of the value of magnitude of change.
`n_cat_features`	Retrieve the number of integer features.
`n_drift_features`	Retrieve the number of drift features.
`n_features`	Retrieve the number of features.
`n_num_features`	Retrieve the number of numerical features.
`n_targets`	Retrieve the number of targets
`noise_percentage`	Retrieve the value of the value of Noise percentage
`sigma_percentage`	Retrieve the value of the value of sigma percentage
`target_names`	Retrieve the names of the targets
`target_values`	Retrieve all target_values in the stream for each target.

property feature_names¶

Retrieve the names of the features.

Returns

list: names of the features

get_data_info(self)[source]¶

Retrieves minimum information from the stream

Used by evaluator methods to id the stream.

The default format is: ‘Stream name - n_targets, n_classes, n_features’.

Returns

string: Stream data information

get_info(self)[source]¶

Collects and returns the information about the configuration of the estimator

Returns

string: Configuration of the estimator.

get_params(self, deep=True)[source]¶

Get parameters for this estimator.

Parameters

deepboolean, optional: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsmapping of string to any: Parameter names mapped to their values.

has_more_samples(self)[source]¶

Checks if stream has more samples.

Returns

Boolean: True if stream has more samples.

is_restartable(self)[source]¶

Determine if the stream is restartable.

Returns

Bool: True if stream is restartable.

last_sample(self)[source]¶

Retrieves last batch_size samples in the stream.

Returns

tuple or tuple list: A numpy.ndarray of shape (batch_size, n_features) and an array-like of shape (batch_size, n_targets), representing the next batch_size samples.

property mag_change¶

Retrieve the value of the value of magnitude of change.

Returns

float: magnitude of change

property n_cat_features¶

Retrieve the number of integer features.

Returns

int: The number of integer features in the stream.

property n_drift_features¶

Retrieve the number of drift features.

Returns

int: The total number of drift features.

property n_features¶

Retrieve the number of features.

Returns

int: The total number of features.

property n_num_features¶

Retrieve the number of numerical features.

Returns

int: The number of numerical features in the stream.

n_remaining_samples(self)[source]¶

Returns the estimated number of remaining samples.

Returns

int: Remaining number of samples. -1 if infinite (e.g. generator)

property n_targets¶

Retrieve the number of targets

Returns

int: the number of targets in the stream.

next_sample(self, batch_size=1)[source]¶

Returns next sample from the stream.

The sample generation works as follows: The features are generated with the random generator, initialized with the seed passed by the user. Then the classification function decides, as a function of the sum and weight’s sum, whether to instance belongs to class 0 or class 1. The next step is to add noise if requested by the user and than generate drift.

Parameters

batch_size: int (optional, default=1): The number of samples to return.

Returns

tuple or tuple list: Return a tuple with the features matrix and the labels matrix for the batch_size samples that were requested.

property noise_percentage¶

Retrieve the value of the value of Noise percentage

Returns

float: percentage of the noise

static prepare_for_use()[source]¶

Prepare the stream for use.

Deprecated in v0.5.0 and will be removed in v0.7.0

reset(self)[source]¶

Resets the estimator to its initial state.

Returns

self

restart(self)[source]¶: Restart the stream.

set_params(self, **params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

self

property sigma_percentage¶

Retrieve the value of the value of sigma percentage

Returns

float: percentage of the sigma

property target_names¶

Retrieve the names of the targets

Returns

list: the names of the targets in the stream.

property target_values¶

Retrieve all target_values in the stream for each target.

Returns

list: list of lists of all target_values for each target