The Stream class is in charge of “providing” data inside scikit-multiflow. The most important method of the Stream class is next_sample(batch_size).
Stream
scikit-multiflow
next_sample(batch_size)
The shape \((n, m)\) of the \(X\) and \(Y\) arrays depends on the batch_size and the type of learning problem.
batch_size
next_sample(batch_size) will return a features vector \(X\) and its corresponding target vector \(Y\)
The number of samples \(n\) is defined by batch_size which by default is 1.
The total number of features \(m\) in \(X\) is equal to the number of numerical features plus the number of categorical features: \(X_m = n_{num} + n_{cat}\)
The number of columns \(m\) in \(Y\) determines the number of targets to learn. Consider the following examples:
S_bc: A binary classification stream
S_bc
Number of targets: Y_m = 1
Y_m = 1
Unique target values: [0, 1]
[0, 1]
S_mc: A multi-class classification stream with 3 classes (0, 1, 2)
S_mc
Unique target values: [0, 1, 2]
[0, 1, 2]
S_mc: A multi-target classification stream, with 2 targets, where classes (0,1,2) correspond to the first target and classes (1, 2) to the second target.
Number of targets: Y_m = 2
Y_m = 2
Unique target values: [[0, 1, 2],[1, 2]]
[[0, 1, 2],[1, 2]]
S_r: A regression stream
S_r
Target values indicates the data type: [float]
[float]
S_mtr: A multi-target regression stream with 3 targets
S_mtr
Number of targets: Y_m = 3
Y_m = 3
Target values indicates the data type: [float, float, float]
[float, float, float]