skmultiflow.data.DataStream

class skmultiflow.data.DataStream(data, y=None, target_idx=- 1, n_targets=1, cat_features=None, name=None, allow_nan=False)[source]

Creates a stream from a data source.

DataStream takes the whole data set containing the X (features) and Y (targets) or takes X and Y separately. For the first case target_idx and n_targets need to be provided, in the second case they are not needed.

Parameters
data: np.ndarray or pd.DataFrame (Default=None)

The features’ columns and targets’ columns or the feature columns only if they are passed separately.

y: np.ndarray or pd.DataFrame, optional (Default=None)

The targets’ columns.

target_idx: int, optional (default=-1)

The column index from which the targets start.

n_targets: int, optional (default=1)

The number of targets.

cat_features: list, optional (default=None)

A list of indices corresponding to the location of categorical features.

name: str, optional (default=None)

A string to id the data.

allow_nan: bool, optional (default=False)

If True, allows NaN values in the data. Otherwise, an error is raised.

Notes

The stream object provides upon request a number of samples, in a way such that old samples cannot be accessed at a later time. This is done to correctly simulate the stream context.

Methods

get_data_info(self)

Retrieves minimum information from the stream

get_info(self)

Collects and returns the information about the configuration of the estimator

get_params(self[, deep])

Get parameters for this estimator.

has_more_samples(self)

Checks if stream has more samples.

is_restartable(self)

Determine if the stream is restartable.

last_sample(self)

Retrieves last batch_size samples in the stream.

n_remaining_samples(self)

Returns the estimated number of remaining samples.

next_sample(self[, batch_size])

Returns next sample from the stream.

prepare_for_use()

Prepare the stream for use.

print_df(self)

Prints all the samples in the stream.

reset(self)

Resets the estimator to its initial state.

restart(self)

Restarts the stream.

set_params(self, **params)

Set the parameters of this estimator.

Attributes

X

Return the features’ columns.

cat_features_idx

Get the list of the categorical features index.

data

Return the data set used to generate the stream.

feature_names

Retrieve the names of the features.

n_cat_features

Retrieve the number of integer features.

n_features

Retrieve the number of features.

n_num_features

Retrieve the number of numerical features.

n_targets

Get the number of targets.

target_idx

Get the number of the column where Y begins.

target_names

Retrieve the names of the targets

target_values

Retrieve all target_values in the stream for each target.

y

Return the targets’ columns.

property X

Return the features’ columns.

Returns
np.ndarray:

the features’ columns

property cat_features_idx

Get the list of the categorical features index.

Returns
list:

List of categorical features index.

property data

Return the data set used to generate the stream.

Returns
pd.DataFrame:

Data set.

property feature_names

Retrieve the names of the features.

Returns
list

names of the features

get_data_info(self)[source]

Retrieves minimum information from the stream

Used by evaluator methods to id the stream.

The default format is: ‘Stream name - n_targets, n_classes, n_features’.

Returns
string

Stream data information

get_info(self)[source]

Collects and returns the information about the configuration of the estimator

Returns
string

Configuration of the estimator.

get_params(self, deep=True)[source]

Get parameters for this estimator.

Parameters
deepboolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

has_more_samples(self)[source]

Checks if stream has more samples.

Returns
Boolean

True if stream has more samples.

is_restartable(self)[source]

Determine if the stream is restartable.

Returns
Bool

True if stream is restartable.

last_sample(self)[source]

Retrieves last batch_size samples in the stream.

Returns
tuple or tuple list

A numpy.ndarray of shape (batch_size, n_features) and an array-like of shape (batch_size, n_targets), representing the next batch_size samples.

property n_cat_features

Retrieve the number of integer features.

Returns
int

The number of integer features in the stream.

property n_features

Retrieve the number of features.

Returns
int

The total number of features.

property n_num_features

Retrieve the number of numerical features.

Returns
int

The number of numerical features in the stream.

n_remaining_samples(self)[source]

Returns the estimated number of remaining samples.

Returns
int

Remaining number of samples.

property n_targets

Get the number of targets.

Returns
int:

The number of targets.

next_sample(self, batch_size=1)[source]

Returns next sample from the stream.

If there is enough instances to supply at least batch_size samples, those are returned. If there aren’t a tuple of (None, None) is returned.

Parameters
batch_size: int (optional, default=1)

The number of instances to return.

Returns
tuple or tuple list

Returns the next batch_size instances. For general purposes the return can be treated as a numpy.ndarray.

static prepare_for_use()[source]

Prepare the stream for use.

Deprecated in v0.5.0 and will be removed in v0.7.0

print_df(self)[source]

Prints all the samples in the stream.

reset(self)[source]

Resets the estimator to its initial state.

Returns
self
restart(self)[source]

Restarts the stream.

It basically server the purpose of reinitializing the stream to its initial state.

set_params(self, **params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns
self
property target_idx

Get the number of the column where Y begins.

Returns
int:

The number of the column where Y begins.

property target_names

Retrieve the names of the targets

Returns
list

the names of the targets in the stream.

property target_values

Retrieve all target_values in the stream for each target.

Returns
list

list of lists of all target_values for each target

property y

Return the targets’ columns.

Returns
np.ndarray:

the targets’ columns