skmultiflow.data.TemporalDataStream

class skmultiflow.data.TemporalDataStream(data, y=None, time=None, sample_weight=None, sample_delay=0, target_idx=- 1, n_targets=1, cat_features=None, name=None, allow_nan=False, ordered=True)[source]

Create a temporal stream from a data source.

TemporalDataStream takes the whole data set containing the X (features), time (timestamps) and Y (targets).

Parameters
data: numpy.ndarray or pandas.DataFrame

The features and targets or only the features if they are passed in the y parameter.

time: numpy.ndarray(dtype=datetime64) or pandas.Series (Default=None)

The timestamp column of each instance. If its a pandas.Series, it will be converted into a numpy.ndarray. If None, delay by number of samples is considered and sample_delay must be int.

sample_weight: numpy.ndarray or pandas.Series, optional (Default=None)

Sample weights.

sample_delay: numpy.ndarray, pandas.Series, numpy.timedelta64 or int, optional (Default=0)
Options per data type used:
numpy.timedelta64: Samples delay in time, the time-offset between the event time and when the label is available, e.g., numpy.timedelta64(1,”D”) for a 1-day delay)
numpy.ndarray[numpy.datetime64]: array with the timestamps when each sample will be available
pandas.Series: series with the timestamps when each sample will be available
int: the delay in number of samples.
y: numpy.ndarray or pandas.DataFrame, optional (Default=None)

The targets.

target_idx: int, optional (default=-1)

The column index from which the targets start.

n_targets: int, optional (default=1)

The number of targets.

cat_features: list, optional (default=None)

A list of indices corresponding to the location of categorical features.

name: str, optional (default=None)

A string to id the data.

ordered: bool, optional (default=True)

If True, consider that data, y, and time are already ordered by timestamp. Otherwise, the data is ordered based on time timestamps (time cannot be None).

allow_nan: bool, optional (default=False)

If True, allows NaN values in the data. Otherwise, an error is raised.

Notes

The stream object provides upon request a number of samples, in a way such that old samples cannot be accessed at a later time. This is done to correctly simulate the stream context.

Methods

get_data_info(self)

Retrieves minimum information from the stream

get_info(self)

Collects and returns the information about the configuration of the estimator

get_params(self[, deep])

Get parameters for this estimator.

has_more_samples(self)

Checks if stream has more samples.

is_restartable(self)

Determine if the stream is restartable.

last_sample(self)

Retrieves last batch_size samples in the stream.

n_remaining_samples(self)

Returns the estimated number of remaining samples.

next_sample(self[, batch_size])

Get next sample.

prepare_for_use()

Prepare the stream for use.

print_df(self)

Prints all the samples in the stream.

reset(self)

Resets the estimator to its initial state.

restart(self)

Restarts the stream.

set_params(self, **params)

Set the parameters of this estimator.

Attributes

X

Return the features’ columns.

cat_features_idx

Get the list of the categorical features index.

data

Return the data set used to generate the stream.

feature_names

Retrieve the names of the features.

n_cat_features

Retrieve the number of integer features.

n_features

Retrieve the number of features.

n_num_features

Retrieve the number of numerical features.

n_targets

Get the number of targets.

target_idx

Get the number of the column where Y begins.

target_names

Retrieve the names of the targets

target_values

Retrieve all target_values in the stream for each target.

y

Return the targets’ columns.

property X

Return the features’ columns.

Returns
np.ndarray:

the features’ columns

property cat_features_idx

Get the list of the categorical features index.

Returns
list:

List of categorical features index.

property data

Return the data set used to generate the stream.

Returns
pd.DataFrame:

Data set.

property feature_names

Retrieve the names of the features.

Returns
list

names of the features

get_data_info(self)[source]

Retrieves minimum information from the stream

Used by evaluator methods to id the stream.

The default format is: ‘Stream name - n_targets, n_classes, n_features’.

Returns
string

Stream data information

get_info(self)[source]

Collects and returns the information about the configuration of the estimator

Returns
string

Configuration of the estimator.

get_params(self, deep=True)[source]

Get parameters for this estimator.

Parameters
deepboolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

has_more_samples(self)[source]

Checks if stream has more samples.

Returns
Boolean

True if stream has more samples.

is_restartable(self)[source]

Determine if the stream is restartable.

Returns
Bool

True if stream is restartable.

last_sample(self)[source]

Retrieves last batch_size samples in the stream.

Returns
tuple or tuple list

A numpy.ndarray of shape (batch_size, n_features) and an array-like of shape (batch_size, n_targets), representing the next batch_size samples.

property n_cat_features

Retrieve the number of integer features.

Returns
int

The number of integer features in the stream.

property n_features

Retrieve the number of features.

Returns
int

The total number of features.

property n_num_features

Retrieve the number of numerical features.

Returns
int

The number of numerical features in the stream.

n_remaining_samples(self)[source]

Returns the estimated number of remaining samples.

Returns
int

Remaining number of samples.

property n_targets

Get the number of targets.

Returns
int:

The number of targets.

next_sample(self, batch_size=1)[source]

Get next sample.

If there is enough instances to supply at least batch_size samples, those are returned. If there aren’t a tuple of (None, None) is returned.

Parameters
batch_size: int

The number of instances to return.

Returns
tuple or tuple list

Returns the next batch_size instances (sample_x, sample_y, sample_time, sample_delay (if available), sample_weight (if available)). For general purposes the return can be treated as a numpy.ndarray.

static prepare_for_use()[source]

Prepare the stream for use.

Deprecated in v0.5.0 and will be removed in v0.7.0

print_df(self)[source]

Prints all the samples in the stream.

reset(self)[source]

Resets the estimator to its initial state.

Returns
self
restart(self)[source]

Restarts the stream.

It basically server the purpose of reinitializing the stream to its initial state.

set_params(self, **params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns
self
property target_idx

Get the number of the column where Y begins.

Returns
int:

The number of the column where Y begins.

property target_names

Retrieve the names of the targets

Returns
list

the names of the targets in the stream.

property target_values

Retrieve all target_values in the stream for each target.

Returns
list

list of lists of all target_values for each target

property y

Return the targets’ columns.

Returns
np.ndarray:

the targets’ columns