Using Streams in scikit-multiflow¶

Stream generators¶

Stream generators are a cheap source of data, since data samples are generated on demand we can avoid storing data physically. There are multiple stream generators in scikit-multiflow and all of them work in a similar way.

Here, we will use the AGRAWALGenerator to exemplify how to use generators within scikit-multiflow

Instantiate the Stream generator
```
>>> generator = AGRAWALGenerator()
```
Get data from the stream

Use next_sample() to obtain data (samples) from any Stream object. The Stream will return n_samples using two arrays: X for features and y for classes (classification) or targets (regression).
```
>>> X, y = generator.next_sample()
>>> print(X.shape, y.shape)
(1, 9) (1,)
```
By default, next_sample() returns one sample, but we can pass an arbitrary number of samples as next_sample(n_samples). For example, to get 1000 samples:
```
>>> X, y = generator.next_sample(1000)
>>> print(X.shape, y.shape)
(1000, 9) (1000,)
```

Check if the stream has more data

When working with streams, it is important to know if there is more data remaining. You can use has_more_samples() to query the Stream for this information.
```
>>> generator.has_more_samples()
True
```
Restart the stream

To restart a Stream object to its initial state, we can use restart()
```
>>> generator.restart()
```
Save the data into a csv file [Optional]

There might be cases where we want to store the information obtained from a Stream generator. An easy way to do it is using numpy and pandas. First, we concatenate the X and y arrays into a single np.array. Then we create a DataFrame that is easy manipulate, for example if we want to name the features, pre-process the data, etc.
```
>>> df = pd.DataFrame(np.hstack((X,np.array([y]).T)))
```
Finally, to write the data into a csv:
```
>>> df.to_csv("file.csv")
```

Putting it all together:

>>> from skmultiflow.data import AGRAWALGenerator
>>> import pandas as pd
>>> import numpy as np
>>>
>>> # 1. Instantiate the stream generator
>>> generator = AGRAWALGenerator()
>>>
>>> # 2. Get data from the stream
>>> X, y = generator.next_sample()
>>> print(X.shape, y.shape)
>>> >>> (1, 9) (1,)
>>>
>>> X, y = generator.next_sample(1000)
>>> print(X.shape, y.shape)
>>> >>> (1000, 9) (1000,)
>>>
>>> # 3. Check if the stream has more data
>>> generator.has_more_samples()
>>> >>> True
>>>
>>> # 4. Restart the stream
>>> generator.restart()
>>>
>>> # 5. Save data into a csv file [Optional]
>>> df = pd.DataFrame(np.hstack((X,np.array([y]).T)))