Agrawal stream generator.
The generator was introduced by Agrawal et al. in , and was common source
of data for early work on scaling up decision tree learners.
The generator produces a stream containing nine features, six numeric and
There are ten functions defined for generating binary class labels from the
features. Presumably these determine whether the loan should be approved.
The features and functions are listed in the original paper .
uniformly distributed from 20k to 150k
if (salary < 75k) then 0 else uniformly distributed from 10k to 75k
uniformly distributed from 20 to 80
the education level
uniformly chosen from 0 to 4
uniformly chosen from 1 to 20
zip code of the town
uniformly chosen from 0 to 8
value of the house
uniformly distributed from 50k x zipcode to 100k x zipcode
years house owned
uniformly distributed from 1 to 30
total loan amount
uniformly distributed from 0 to 500k
Which of the four classification functions to use for the generation.
The value can vary from 0 to 9.
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
Whether to balance classes or not. If balanced, the class
distribution will converge to a uniform distribution.
The probability that noise will happen in the generation. At each
new sample generated, the sample with will perturbed by the amount of
Values go from 0.0 to 1.0.
Rakesh Agrawal, Tomasz Imielinksi, and Arun Swami. “Database
Mining: A Performance Perspective”, IEEE Transactions on Knowledge and
Data Engineering, 5(6), December 1993.
Generate drift by switching the classification function randomly.
Retrieves minimum information from the stream
Collects and returns the information about the configuration of the estimator
Get parameters for this estimator.
Checks if stream has more samples.
Determine if the stream is restartable.
Retrieves last batch_size samples in the stream.
Returns the estimated number of remaining samples.
Returns next sample from the stream.
Prepare the stream for use.
Resets the estimator to its initial state.
Restart the stream.
Set the parameters of this estimator.
Retrieve the value of the option: Balance classes
Retrieve the index of the current classification function.
Retrieve the names of the features.
Retrieve the number of integer features.
Retrieve the number of features.
Retrieve the number of numerical features.
Retrieve the number of targets
Retrieve the value of the option: Noise percentage
Retrieve the names of the targets
Retrieve all target_values in the stream for each target.
True is the classes are balanced
index of the classification function, from 0 to 9
names of the features
Used by evaluator methods to id the stream.
The default format is: ‘Stream name - n_targets, n_classes, n_features’.
Stream data information
Configuration of the estimator.
If True, will return the parameters for this estimator and
contained subobjects that are estimators.
Parameter names mapped to their values.
True if stream has more samples.
True if stream is restartable.
A numpy.ndarray of shape (batch_size, n_features) and an array-like of shape
(batch_size, n_targets), representing the next batch_size samples.
The number of integer features in the stream.
The total number of features.
The number of numerical features in the stream.
Remaining number of samples. -1 if infinite (e.g. generator)
the number of targets in the stream.
The sample generation works as follows: The 9 features are generated
with the random generator, initialized with the seed passed by the
user. Then, the classification function decides, as a function of all
the attributes, whether to classify the instance as class 0 or class
1. The next step is to verify if the classes should be balanced, and
if so, balance the classes. The last step is to add noise, if the noise
percentage is higher than 0.0.
The generated sample will have 9 features and 1 label (it has one
The number of samples to return.
Return a tuple with the features matrix and the labels matrix for
the batch_size samples that were requested.
Deprecated in v0.5.0 and will be removed in v0.7.0
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter> so that it’s possible to update each
component of a nested object.
the names of the targets in the stream.
list of lists of all target_values for each target