Creates a stream from a file source.
For the moment only csv files are supported, but the goal is to support different formats,
as long as there is a function that correctly reads, interprets, and returns
a pandas’ DataFrame or numpy.ndarray with the data.
Path to the data file
The column index from which the targets start.
The number of targets.
A list of indices corresponding to the location of categorical features.
If True, allows NaN values in the data. Otherwise, an error is raised.
The stream object provides upon request a number of samples, in a way such that old samples
cannot be accessed at a later time. This is done to correctly simulate the stream context.
>>> # Imports
>>> from skmultiflow.data.file_stream import FileStream
>>> # Setup the stream
>>> stream = FileStream("https://raw.githubusercontent.com/scikit-multiflow/"
>>> # Retrieving one sample
(array([[0.080429, 8.397187, 7.074928]]), array())
>>> # Retrieving 10 samples
(array([[1.42074 , 7.504724, 6.764101],
[0.960543, 5.168416, 8.298959],
[3.367279, 6.797711, 4.857875],
[9.265933, 8.548432, 2.460325],
[7.295862, 2.373183, 3.427656],
[9.289001, 3.280215, 3.154171],
[0.279599, 7.340643, 3.729721],
[4.387696, 1.97443 , 6.447183],
[2.933823, 7.150514, 2.566901],
[4.303049, 1.471813, 9.078151]]),
array([0, 0, 1, 1, 1, 1, 0, 0, 1, 0]))
returns all the samples in the stream.
Retrieves minimum information from the stream
Collects and returns the information about the configuration of the estimator
Get parameters for this estimator.
Checks if stream has more samples.
Determine if the stream is restartable.
Retrieves last batch_size samples in the stream.
Returns the estimated number of remaining samples.
Returns next sample from the stream.
Prepare the stream for use.
Resets the estimator to its initial state.
Restarts the stream.
Set the parameters of this estimator.
Get the list of the categorical features index.
Retrieve the names of the features.
Retrieve the number of integer features.
Retrieve the number of features.
Retrieve the number of numerical features.
Get the number of targets.
Get the number of the column where Y begins.
Retrieve the names of the targets
Retrieve all target_values in the stream for each target.
List of categorical features index.
names of the features
The features’ columns.
The targets’ columns.
Used by evaluator methods to id the stream.
The default format is: ‘Stream name - n_targets, n_classes, n_features’.
Stream data information
Configuration of the estimator.
If True, will return the parameters for this estimator and
contained subobjects that are estimators.
Parameter names mapped to their values.
True if stream has more samples.
True if stream is restartable.
A numpy.ndarray of shape (batch_size, n_features) and an array-like of shape
(batch_size, n_targets), representing the next batch_size samples.
The number of integer features in the stream.
The total number of features.
The number of numerical features in the stream.
Remaining number of samples.
Get the number of targets.
If there is enough instances to supply at least batch_size samples, those
are returned. If there aren’t a tuple of (None, None) is returned.
The number of instances to return.
Returns the next batch_size instances.
For general purposes the return can be treated as a numpy.ndarray.
Deprecated in v0.5.0 and will be removed in v0.7.0
It basically server the purpose of reinitializing the stream to
its initial state.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter> so that it’s possible to update each
component of a nested object.
The number of the column where Y begins.
the names of the targets in the stream.
list of lists of all target_values for each target