skmultiflow.evaluation.EvaluatePrequentialDelayed

class skmultiflow.evaluation.EvaluatePrequentialDelayed(n_wait=200, max_samples=100000, batch_size=1, pretrain_size=200, max_time=inf, metrics=None, output_file=None, show_plot=False, restart_stream=True, data_points_for_classification=False)[source]

The prequential evaluation delayed method.

The prequential evaluation delayed is designed specifically for stream settings, in the sense that each sample serves two purposes, and that samples are analysed sequentially, in order of arrival, and are used to update the model only when their label are available, given their timestamps (arrival and available times).

This method consists of using each sample to test the model, which means to make a predictions, and then the same sample is used to train the model (partial fit) after its label is available after a certain delay. This way the model is always tested on samples that it hasn’t seen yet and updated on samples that have their labels available.

Parameters
n_wait: int (Default: 200)

The number of samples to process between each test. Also defines when to update the plot if show_plot=True. Note that setting n_wait too small can significantly slow the evaluation process.

max_samples: int (Default: 100000)

The maximum number of samples to process during the evaluation.

batch_size: int (Default: 1)

The number of samples to pass at a time to the model(s).

pretrain_size: int (Default: 200)

The number of samples to use to train the model before starting the evaluation. Used to enforce a ‘warm’ start.

max_time: float (Default: float(“inf”))

The maximum duration of the simulation (in seconds).

metrics: list, optional (Default: [‘accuracy’, ‘kappa’])
The list of metrics to track during the evaluation. Also defines the metrics that will be displayed in plots and/or logged into the output file. Valid options are:
Classification
‘accuracy’
‘kappa’
‘kappa_t’
‘kappa_m’
‘true_vs_predicted’
‘precision’
‘recall’
‘f1’
‘gmean’
Multi-target Classification
‘hamming_score’
‘hamming_loss’
‘exact_match’
‘j_index’
Regression
‘mean_square_error’
‘mean_absolute_error’
‘true_vs_predicted’
Multi-target Regression
‘average_mean_squared_error’
‘average_mean_absolute_error’
‘average_root_mean_square_error’
General purpose (no plot generated)
‘running_time’
‘model_size’
output_file: string, optional (Default: None)

File name to save the summary of the evaluation.

show_plot: bool (Default: False)

If True, a plot will show the progress of the evaluation. Warning: Plotting can slow down the evaluation process.

restart_stream: bool, optional (default: True)

If True, the stream is restarted once the evaluation is complete.

data_points_for_classification: bool(Default: False)

If True, the visualization used is a cloud of data points (only works for classification) and default performance metrics are ignored. If specific metrics are required, then they must be explicitly set using the metrics attribute.

Notes

  1. This evaluator can process a single learner to track its performance; or multiple learners at a time, to compare different models on the same stream.

  2. The metric ‘true_vs_predicted’ is intended to be informative only. It corresponds to evaluations at a specific moment which might not represent the actual learner performance across all instances.

  3. The metrics running_time and model_size ` are not plotted when the `show_plot option is set. Only their current value is displayed at the bottom of the figure. However, their values over the evaluation are written into the resulting csv file if the output_file option is set.

Examples

>>> # The first example demonstrates how to evaluate one model
>>> import numpy as np
>>> import pandas as pd
>>> from skmultiflow.data import TemporalDataStream
>>> from skmultiflow.trees import HoeffdingTreeClassifier
>>> from skmultiflow.evaluation import EvaluatePrequentialDelayed
>>>
>>> # Columns used to get the data, label and time from iris_timestamp dataset
>>> DATA_COLUMNS = ["sepal_length", "sepal_width", "petal_length", "petal_width"]
>>> LABEL_COLUMN = "label"
>>> TIME_COLUMN = "timestamp"
>>>
>>> # Read a csv with stream data
>>> data = pd.read_csv("https://raw.githubusercontent.com/scikit-multiflow/streaming-datasets/"
>>>                    "master/iris_timestamp.csv")
>>> # Convert time column to datetime
>>> data[TIME_COLUMN] = pd.to_datetime(data[TIME_COLUMN])
>>> # Sort data by time
>>> data = data.sort_values(by=TIME_COLUMN)
>>> # Get X, y and time
>>> X = data[DATA_COLUMNS].values
>>> y = data[LABEL_COLUMN].values
>>> time = data[TIME_COLUMN].values
>>>
>>>
>>> # Set a delay of 1 day
>>> delay_time = np.timedelta64(1, "D")
>>> # Set the stream
>>> stream = TemporalDataStream(X, y, time, sample_delay=delay_time, ordered=False)
>>>
>>> # Set the model
>>> ht = HoeffdingTreeClassifier()
>>>
>>> # Set the evaluator
>>>
>>> evaluator = EvaluatePrequentialDelayed(batch_size=1,
>>>                                 pretrain_size=X.shape[0]//2,
>>>                                 max_samples=X.shape[0],
>>>                                 output_file='results_delay.csv',
>>>                                 metrics=['accuracy', 'recall', 'precision', 'f1', 'kappa'])
>>>
>>> # Run evaluation
>>> evaluator.evaluate(stream=stream, model=ht, model_names=['HT'])
>>> # The second example demonstrates how to compare two models
>>> import numpy as np
>>> import pandas as pd
>>> from skmultiflow.data import TemporalDataStream
>>> from skmultiflow.trees import HoeffdingTreeClassifier
>>> from skmultiflow.bayes import NaiveBayes
>>> from skmultiflow.evaluation import EvaluatePrequentialDelayed
>>>
>>> # Columns used to get the data, label and time from iris_timestamp dataset
>>> DATA_COLUMNS = ["sepal_length", "sepal_width", "petal_length", "petal_width"]
>>> LABEL_COLUMN = "label"
>>> TIME_COLUMN = "timestamp"
>>>
>>> # Read a csv with stream data
>>> data = pd.read_csv("../data/datasets/iris_timestamp.csv")
>>> # Convert time column to datetime
>>> data[TIME_COLUMN] = pd.to_datetime(data[TIME_COLUMN])
>>> # Sort data by time
>>> data = data.sort_values(by=TIME_COLUMN)
>>> # Get X, y and time
>>> X = data[DATA_COLUMNS].values
>>> y = data[LABEL_COLUMN].values
>>> time = data[TIME_COLUMN].values
>>>
>>>
>>> # Set a delay of 30 minutes
>>> delay_time = np.timedelta64(30, "m")
>>> # Set the stream
>>> stream = TemporalDataStream(X, y, time, sample_delay=delay_time, ordered=False)
>>>
>>> # Set the models
>>> ht = HoeffdingTreeClassifier()
>>> nb = NaiveBayes()
>>>
>>> evaluator = EvaluatePrequentialDelayed(batch_size=1,
>>>                                 pretrain_size=X.shape[0]//2,
>>>                                 max_samples=X.shape[0],
>>>                                 output_file='results_delay.csv',
>>>                                 metrics=['accuracy', 'recall', 'precision', 'f1', 'kappa'])
>>>
>>> # Run evaluation
>>> evaluator.evaluate(stream=stream, model=[ht, nb], model_names=['HT', 'NB'])

Methods

evaluate(self, stream, model[, model_names])

Evaluates a model or set of models on samples from a stream.

evaluation_summary(self)

get_current_measurements(self[, model_idx])

Get current measurements from the evaluation (measured on last n_wait samples).

get_info(self)

Collects and returns the information about the configuration of the estimator

get_mean_measurements(self[, model_idx])

Get mean measurements from the evaluation.

get_measurements(self[, model_idx])

Get measurements from the evaluation.

get_params(self[, deep])

Get parameters for this estimator.

partial_fit(self, X, y[, classes, sample_weight])

Partially fit all the models on the given data.

predict(self, X)

Predicts with the estimator(s) being evaluated.

reset(self)

Resets the estimator to its initial state.

set_params(self, **params)

Set the parameters of this estimator.

update_progress_bar(curr, total, steps, time)

evaluate(self, stream, model, model_names=None)[source]

Evaluates a model or set of models on samples from a stream.

Parameters
stream: Stream

The stream from which to draw the samples.

model: skmultiflow.core.BaseSKMObject or sklearn.base.BaseEstimator or list

The model or list of models to evaluate.

model_names: list, optional (Default=None)

A list with the names of the models.

Returns
StreamModel or list

The trained model(s).

get_current_measurements(self, model_idx=None)[source]

Get current measurements from the evaluation (measured on last n_wait samples).

Parameters
model_idx: int, optional (Default=None)

Indicates the index of the model as defined in evaluate(model). If None, returns a list with the measurements for each model.

Returns
measurements or list
Current measurements. If model_idx is None, returns a list with the measurements

for each model.

Raises
IndexError: If the index is invalid.
get_info(self)[source]

Collects and returns the information about the configuration of the estimator

Returns
string

Configuration of the estimator.

get_mean_measurements(self, model_idx=None)[source]

Get mean measurements from the evaluation.

Parameters
model_idx: int, optional (Default=None)

Indicates the index of the model as defined in evaluate(model). If None, returns a list with the measurements for each model.

Returns
measurements or list
Mean measurements. If model_idx is None, returns a list with the measurements

for each model.

Raises
IndexError: If the index is invalid.
get_measurements(self, model_idx=None)[source]

Get measurements from the evaluation.

Parameters
model_idx: int, optional (Default=None)

Indicates the index of the model as defined in evaluate(model). If None, returns a list with the measurements for each model.

Returns
tuple (mean, current)
Mean and Current measurements. If model_idx is None, each member of the tuple

is a a list with the measurements for each model.

Raises
IndexError: If the index is invalid.
get_params(self, deep=True)[source]

Get parameters for this estimator.

Parameters
deepboolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

partial_fit(self, X, y, classes=None, sample_weight=None)[source]

Partially fit all the models on the given data.

Parameters
X: numpy.ndarray of shape (n_samples, n_features)

The data upon which the estimators will be trained.

y: numpy.ndarray of shape (, n_samples)

The classification labels / target values for all samples in X.

classes: list, optional (default=None)

Stores all the classes that may be encountered during the classification task. Not used for regressors.

sample_weight: numpy.ndarray, optional (default=None)

Samples weight. If not provided, uniform weights are assumed.

Returns
EvaluatePrequential

self

predict(self, X)[source]

Predicts with the estimator(s) being evaluated.

Parameters
X: numpy.ndarray of shape (n_samples, n_features)

All the samples we want to predict the label for.

Returns
list of numpy.ndarray

Model(s) predictions

reset(self)[source]

Resets the estimator to its initial state.

Returns
self
set_params(self, **params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns
self