skmultiflow.data.TemporalDataStream¶

class skmultiflow.data.TemporalDataStream(data, y=None, time=None, sample_weight=None, sample_delay=0, target_idx=- 1, n_targets=1, cat_features=None, name=None, allow_nan=False, ordered=True)[source]¶

Create a temporal stream from a data source.

TemporalDataStream takes the whole data set containing the X (features), time (timestamps) and Y (targets).

Parameters

data: numpy.ndarray or pandas.DataFrame: The features and targets or only the features if they are passed in the y parameter.
time: numpy.ndarray(dtype=datetime64) or pandas.Series (Default=None): The timestamp column of each instance. If its a pandas.Series, it will be converted into a numpy.ndarray. If None, delay by number of samples is considered and sample_delay must be int.
sample_weight: numpy.ndarray or pandas.Series, optional (Default=None): Sample weights.
sample_delay: numpy.ndarray, pandas.Series, numpy.timedelta64 or int, optional (Default=0): Options per data type used:

numpy.timedelta64: Samples delay in time, the time-offset between the event time and when the label is available, e.g., numpy.timedelta64(1,”D”) for a 1-day delay)

numpy.ndarray[numpy.datetime64]: array with the timestamps when each sample will be available

pandas.Series: series with the timestamps when each sample will be available

int: the delay in number of samples.
y: numpy.ndarray or pandas.DataFrame, optional (Default=None): The targets.
target_idx: int, optional (default=-1): The column index from which the targets start.
n_targets: int, optional (default=1): The number of targets.
cat_features: list, optional (default=None): A list of indices corresponding to the location of categorical features.
name: str, optional (default=None): A string to id the data.
ordered: bool, optional (default=True): If True, consider that data, y, and time are already ordered by timestamp. Otherwise, the data is ordered based on time timestamps (time cannot be None).
allow_nan: bool, optional (default=False): If True, allows NaN values in the data. Otherwise, an error is raised.

Notes

The stream object provides upon request a number of samples, in a way such that old samples cannot be accessed at a later time. This is done to correctly simulate the stream context.

Methods

`get_data_info`(self)	Retrieves minimum information from the stream
`get_info`(self)	Collects and returns the information about the configuration of the estimator
`get_params`(self[, deep])	Get parameters for this estimator.
`has_more_samples`(self)	Checks if stream has more samples.
`is_restartable`(self)	Determine if the stream is restartable.
`last_sample`(self)	Retrieves last batch_size samples in the stream.
`n_remaining_samples`(self)	Returns the estimated number of remaining samples.
`next_sample`(self[, batch_size])	Get next sample.
`prepare_for_use`()	Prepare the stream for use.
`print_df`(self)	Prints all the samples in the stream.
`reset`(self)	Resets the estimator to its initial state.
`restart`(self)	Restarts the stream.
`set_params`(self, **params)	Set the parameters of this estimator.

Attributes

`X`	Return the features’ columns.
`cat_features_idx`	Get the list of the categorical features index.
`data`	Return the data set used to generate the stream.
`feature_names`	Retrieve the names of the features.
`n_cat_features`	Retrieve the number of integer features.
`n_features`	Retrieve the number of features.
`n_num_features`	Retrieve the number of numerical features.
`n_targets`	Get the number of targets.
`target_idx`	Get the number of the column where Y begins.
`target_names`	Retrieve the names of the targets
`target_values`	Retrieve all target_values in the stream for each target.
`y`	Return the targets’ columns.

property X¶

Return the features’ columns.

Returns

np.ndarray:: the features’ columns

property cat_features_idx¶

Get the list of the categorical features index.

Returns

list:: List of categorical features index.

property data¶

Return the data set used to generate the stream.

Returns

pd.DataFrame:: Data set.

property feature_names¶

Retrieve the names of the features.

Returns

list: names of the features

get_data_info(self)[source]¶

Retrieves minimum information from the stream

Used by evaluator methods to id the stream.

The default format is: ‘Stream name - n_targets, n_classes, n_features’.

Returns

string: Stream data information

get_info(self)[source]¶

Collects and returns the information about the configuration of the estimator

Returns

string: Configuration of the estimator.

get_params(self, deep=True)[source]¶

Get parameters for this estimator.

Parameters

deepboolean, optional: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsmapping of string to any: Parameter names mapped to their values.

has_more_samples(self)[source]¶

Checks if stream has more samples.

Returns

Boolean: True if stream has more samples.

is_restartable(self)[source]¶

Determine if the stream is restartable.

Returns

Bool: True if stream is restartable.

last_sample(self)[source]¶

Retrieves last batch_size samples in the stream.

Returns

tuple or tuple list: A numpy.ndarray of shape (batch_size, n_features) and an array-like of shape (batch_size, n_targets), representing the next batch_size samples.

property n_cat_features¶

Retrieve the number of integer features.

Returns

int: The number of integer features in the stream.

property n_features¶

Retrieve the number of features.

Returns

int: The total number of features.

property n_num_features¶

Retrieve the number of numerical features.

Returns

int: The number of numerical features in the stream.

n_remaining_samples(self)[source]¶

Returns the estimated number of remaining samples.

Returns

int: Remaining number of samples.

property n_targets¶

Get the number of targets.

Returns

int:: The number of targets.

next_sample(self, batch_size=1)[source]¶

Get next sample.

If there is enough instances to supply at least batch_size samples, those are returned. If there aren’t a tuple of (None, None) is returned.

Parameters

batch_size: int: The number of instances to return.

Returns

tuple or tuple list: Returns the next batch_size instances (sample_x, sample_y, sample_time, sample_delay (if available), sample_weight (if available)). For general purposes the return can be treated as a numpy.ndarray.

static prepare_for_use()[source]¶

Prepare the stream for use.

Deprecated in v0.5.0 and will be removed in v0.7.0

print_df(self)[source]¶: Prints all the samples in the stream.

reset(self)[source]¶

Resets the estimator to its initial state.

Returns

self

restart(self)[source]¶

Restarts the stream.

It basically server the purpose of reinitializing the stream to its initial state.

set_params(self, **params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

self

property target_idx¶

Get the number of the column where Y begins.

Returns

int:: The number of the column where Y begins.

property target_names¶

Retrieve the names of the targets

Returns

list: the names of the targets in the stream.

property target_values¶

Retrieve all target_values in the stream for each target.

Returns

list: list of lists of all target_values for each target

property y¶

Return the targets’ columns.

Returns

np.ndarray:: the targets’ columns