skmultiflow.transform.
MissingValuesCleaner
Fill missing values with some defined value.
Provides a simple way to replace missing values in data samples with some value. The imputation value can be set via a set of imputation strategies.
Missing value to replace
The strategy adopted to find the missing value replacement. It can be one of the following: ‘zero’, ‘mean’, ‘median’, ‘mode’, ‘custom’.
Defines the window size for the ‘mean’, ‘median’ and ‘mode’ strategies.
This is the replacement value in case the chosen strategy is ‘custom’.
Notes
A missing value in a sample can be coded in many different ways, but the most common one is to use numpy’s NaN, that’s why that is the default missing value parameter.
The user should choose the correct substitution strategy for his use case, as each strategy has its pros and cons. The strategy can be chosen from a set of predefined strategies, which are: ‘zero’, ‘mean’, ‘median’, ‘mode’, ‘custom’.
Notice that MissingValuesCleaner can actually be used to replace arbitrary values.
Examples
>>> # Imports >>> import numpy as np >>> from skmultiflow.data.file_stream import FileStream >>> from skmultiflow.transform.missing_values_cleaner import MissingValuesCleaner >>> # Setting up a stream >>> stream = FileStream("https://raw.githubusercontent.com/scikit-multiflow/" ... "streaming-datasets/master/covtype.csv") >>> # Setting up the filter to substitute values -47 by the median of the >>> # last 10 samples >>> cleaner = MissingValuesCleaner(-47, 'median', 10) >>> X, y = stream.next_sample(10) >>> X[9, 0] = -47 >>> # We will use this list to keep track of values >>> data = [] >>> # Iterate over the first 9 samples, to build a sample window >>> for i in range(9): >>> X_transf = cleaner.partial_fit_transform([X[i].tolist()]) >>> data.append(X_transf[0][0]) >>> >>> # Transform last sample. The first feature should be replaced by the list's >>> # median value >>> X_transf = cleaner.partial_fit_transform([X[9].tolist()]) >>> np.median(data)
Methods
get_info(self)
get_info
Collects and returns the information about the configuration of the estimator
get_params(self[, deep])
get_params
Get parameters for this estimator.
partial_fit(self, X[, y])
partial_fit
Partial fits the model.
partial_fit_transform(self, X[, y])
partial_fit_transform
Partially fits the model and then apply the transform to the data.
reset(self)
reset
Resets the estimator to its initial state.
set_params(self, **params)
set_params
Set the parameters of this estimator.
transform(self, X)
transform
Does the transformation process in the samples in X.
Configuration of the estimator.
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Parameter names mapped to their values.
The sample or set of samples that should be transformed.
The true labels.
self
The transformed data.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
<component>__<parameter>