Core ConceptsΒΆ

Consider a continuous stream of data \(A=\{(\vec{x}_t,y_t)\} | t = 1,\ldots,T\) where \(T \rightarrow \infty\). \(\vec{x}_t\) is a feature vector and \(y_t\) the corresponding target where \(y\) is continuous in the case of regression and discrete for classification. The objective is to predict the target \(y\) for an unknown \(\vec{x}\). Two target_values are considered in binary classification, \(y\in \{0,1\}\), while \(K>2\) labels are used in multi-label classification, \(y\in \{1,\ldots,K\}\). For both binary and multi-label classification only one class is assigned per instance. On the other hand, in multi-output learning \(y\) is a target_values vector and \(\vec{x}_i\) can be assigned multiple-target_values at the same time.

Different to batch learning, where all data is available for training \(train(X, y)\); in stream learning, training is performed incrementally as new data is available \(train(\vec{x}_i, y_i)\). Performance \(P\) of a given model is measured according to some loss function that evaluates the difference between the set of expected labels \(Y\) and the predicted ones \(\hat{Y}\).

Hold-out evaluation is a popular performance evaluation method where tests are performed in a separate test set. Prequential-evaluation or interleaved-test-then-train evaluation, is a popular performance evaluation method for the stream setting, where tests are performed on new data before using it to train the model.