skmultiflow.data.RandomTreeGenerator¶

class skmultiflow.data.RandomTreeGenerator(tree_random_state=None, sample_random_state=None, n_classes=2, n_cat_features=5, n_num_features=5, n_categories_per_cat_feature=5, max_tree_depth=5, min_leaf_depth=3, fraction_leaves_per_level=0.15)[source]¶

Random Tree stream generator.

This generator is built based on its description in Domingo and Hulten’s ‘Knowledge Discovery and Data Mining’. The generator is based on a random tree that splits features at random and sets labels to its leafs.

The tree structure is composed on Node objects, which can be either inner nodes or leaf nodes. The choice comes as a function fo the parameters passed to its initializer.

Since the concepts are generated and classified according to a tree structure, in theory, it should favour decision tree learners.

Parameters

tree_random_state: int (Default: None): Seed for random generation of tree.
sample_random_state: int (Default: None): Seed for random generation of instances.
n_classes: int (Default: 2): The number of classes to generate.
n_cat_features: int (Default: 5): The number of categorical features to generate. Categorical features are binary encoded, the actual number of categorical features is n_cat_features`x`n_categories_per_cat_feature
n_num_features: int (Default: 5): The number of numerical features to generate.
n_categories_per_cat_feature: int (Default: 5): The number of values to generate per categorical feature.
max_tree_depth: int (Default: 5): The maximum depth of the tree concept.
min_leaf_depth: int (Default: 3): The first level of the tree above MaxTreeDepth that can have leaves.
fraction_leaves_per_level: float (Default: 0.15): The fraction of leaves per level from min_leaf_depth onwards.

Examples

>>> # Imports
>>> from skmultiflow.data.random_tree_generator import RandomTreeGenerator
>>> # Setting up the stream
>>> stream = RandomTreeGenerator(tree_random_state=8873, sample_random_seed=69, n_classes=2,
... n_cat_features=2, n_num_features=5, n_categories_per_cat_feature=5, max_tree_depth=6,
...  min_leaf_depth=3, fraction_leaves_per_level=0.15)
>>> # Retrieving one sample
>>> stream.next_sample()
(array([[ 0.16268102,  0.1105941 ,  0.7172657 ,  0.13021257,  0.61664241,
     1.        ,  0.        ,  0.        ,  0.        ,  0.        ,
     0.        ,  0.        ,  0.        ,  1.        ,  0.        ]]), array([ 0.]))
>>> # Retrieving 10 samples
>>> stream.next_sample(10)
(array([[ 0.23752865,  0.58739728,  0.33649431,  0.62104964,  0.85182531,
     0.        ,  0.        ,  0.        ,  0.        ,  1.        ,
     0.        ,  0.        ,  0.        ,  0.        ,  1.        ],
   [ 0.80996022,  0.71970756,  0.49121675,  0.18175096,  0.41738968,
     0.        ,  0.        ,  0.        ,  1.        ,  0.        ,
     0.        ,  0.        ,  0.        ,  0.        ,  1.        ],
   [ 0.3450778 ,  0.27301117,  0.52986614,  0.68253015,  0.79836113,
     0.        ,  0.        ,  1.        ,  0.        ,  0.        ,
     1.        ,  0.        ,  0.        ,  0.        ,  0.        ],
   [ 0.28974746,  0.64385678,  0.11726876,  0.14956833,  0.90919843,
     0.        ,  1.        ,  0.        ,  0.        ,  0.        ,
     0.        ,  0.        ,  0.        ,  1.        ,  0.        ],
   [ 0.85404693,  0.77693923,  0.25851095,  0.13574941,  0.01739845,
     0.        ,  0.        ,  0.        ,  0.        ,  1.        ,
     0.        ,  0.        ,  0.        ,  0.        ,  1.        ],
   [ 0.23404205,  0.67644455,  0.65199858,  0.22742471,  0.01895565,
     1.        ,  0.        ,  0.        ,  0.        ,  0.        ,
     0.        ,  0.        ,  1.        ,  0.        ,  0.        ],
   [ 0.12843591,  0.56112384,  0.08013747,  0.46674409,  0.48333615,
     0.        ,  0.        ,  1.        ,  0.        ,  0.        ,
     1.        ,  0.        ,  0.        ,  0.        ,  0.        ],
   [ 0.52058342,  0.51999097,  0.28294293,  0.11435212,  0.83731519,
     0.        ,  1.        ,  0.        ,  0.        ,  0.        ,
     1.        ,  0.        ,  0.        ,  0.        ,  0.        ],
   [ 0.82455551,  0.3758063 ,  0.02672009,  0.87081727,  0.3165448 ,
     1.        ,  0.        ,  0.        ,  0.        ,  0.        ,
     0.        ,  0.        ,  0.        ,  1.        ,  0.        ],
   [ 0.03012729,  0.30479727,  0.65407304,  0.14532937,  0.47670874,
     0.        ,  1.        ,  0.        ,  0.        ,  0.        ,
     0.        ,  0.        ,  1.        ,  0.        ,  0.        ]]),
    array([ 1.,  1.,  1.,  1.,  0.,  1.,  1.,  0.,  0.,  0.]))
>>> # Generators will have infinite remaining instances, so it returns -1
>>> stream.n_remaining_samples()
-1
>>> stream.has_more_samples()
True

Methods

`get_data_info`(self)	Retrieves minimum information from the stream
`get_info`(self)	Collects and returns the information about the configuration of the estimator
`get_params`(self[, deep])	Get parameters for this estimator.
`has_more_samples`(self)	Checks if stream has more samples.
`is_restartable`(self)	Determine if the stream is restartable.
`last_sample`(self)	Retrieves last batch_size samples in the stream.
`n_remaining_samples`(self)	Returns the estimated number of remaining samples.
`next_sample`(self[, batch_size])	Returns next sample from the stream.
`prepare_for_use`()	Prepare the stream for use.
`reset`(self)	Resets the estimator to its initial state.
`restart`(self)	Restart the stream.
`set_params`(self, **params)	Set the parameters of this estimator.

Attributes

`feature_names`	Retrieve the names of the features.
`n_cat_features`	Retrieve the number of integer features.
`n_features`	Retrieve the number of features.
`n_num_features`	Retrieve the number of numerical features.
`n_targets`	Retrieve the number of targets
`target_names`	Retrieve the names of the targets
`target_values`	Retrieve all target_values in the stream for each target.

property feature_names¶

Retrieve the names of the features.

Returns

list: names of the features

get_data_info(self)[source]¶

Retrieves minimum information from the stream

Used by evaluator methods to id the stream.

The default format is: ‘Stream name - n_targets, n_classes, n_features’.

Returns

string: Stream data information

get_info(self)[source]¶

Collects and returns the information about the configuration of the estimator

Returns

string: Configuration of the estimator.

get_params(self, deep=True)[source]¶

Get parameters for this estimator.

Parameters

deepboolean, optional: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsmapping of string to any: Parameter names mapped to their values.

has_more_samples(self)[source]¶

Checks if stream has more samples.

Returns

Boolean: True if stream has more samples.

is_restartable(self)[source]¶

Determine if the stream is restartable.

Returns

Bool: True if stream is restartable.

last_sample(self)[source]¶

Retrieves last batch_size samples in the stream.

Returns

tuple or tuple list: A numpy.ndarray of shape (batch_size, n_features) and an array-like of shape (batch_size, n_targets), representing the next batch_size samples.

property n_cat_features¶

Retrieve the number of integer features.

Returns

int: The number of integer features in the stream.

property n_features¶

Retrieve the number of features.

Returns

int: The total number of features.

property n_num_features¶

Retrieve the number of numerical features.

Returns

int: The number of numerical features in the stream.

n_remaining_samples(self)[source]¶

Returns the estimated number of remaining samples.

Returns

int: Remaining number of samples. -1 if infinite (e.g. generator)

property n_targets¶

Retrieve the number of targets

Returns

int: the number of targets in the stream.

next_sample(self, batch_size=1)[source]¶

Returns next sample from the stream.

Randomly generates attributes values, and then classify each instance generated.

Parameters

batch_size: int (optional, default=1): The number of samples to return.

Returns

tuple or tuple list: Return a tuple with the features matrix and the labels matrix for the batch_size samples that were requested.

static prepare_for_use()[source]¶

Prepare the stream for use.

Deprecated in v0.5.0 and will be removed in v0.7.0

reset(self)[source]¶

Resets the estimator to its initial state.

Returns

self

restart(self)[source]¶: Restart the stream.

set_params(self, **params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

self

property target_names¶

Retrieve the names of the targets

Returns

list: the names of the targets in the stream.

property target_values¶

Retrieve all target_values in the stream for each target.

Returns

list: list of lists of all target_values for each target