Stream class

The Stream class is in charge of “providing” data inside scikit-multiflow. The most important method of the Stream class is next_sample(batch_size).

The shape \((n, m)\) of the \(X\) and \(Y\) arrays depends on the batch_size and the type of learning problem.

Supervised learning

next_sample(batch_size) will return a features vector \(X\) and its corresponding target vector \(Y\)

The number of samples \(n\) is defined by batch_size which by default is 1.

The total number of features \(m\) in \(X\) is equal to the number of numerical features plus the number of categorical features: \(X_m = n_{num} + n_{cat}\)

The number of columns \(m\) in \(Y\) determines the number of targets to learn. Consider the following examples:

  • S_bc: A binary classification stream

    • Number of targets: Y_m = 1

    • Unique target values: [0, 1]

  • S_mc: A multi-class classification stream with 3 classes (0, 1, 2)

    • Number of targets: Y_m = 1

    • Unique target values: [0, 1, 2]

  • S_mc: A multi-target classification stream, with 2 targets, where classes (0,1,2) correspond to the first target and classes (1, 2) to the second target.

    • Number of targets: Y_m = 2

    • Unique target values: [[0, 1, 2],[1, 2]]

  • S_r: A regression stream

    • Number of targets: Y_m = 1

    • Target values indicates the data type: [float]

  • S_mtr: A multi-target regression stream with 3 targets

    • Number of targets: Y_m = 3

    • Target values indicates the data type: [float, float, float]