skmultiflow.bayes.NaiveBayes¶

class skmultiflow.bayes.NaiveBayes(nominal_attributes=None)[source]¶

Naive Bayes classifier.

Performs classic bayesian prediction while making naive assumption that all inputs are independent. Naive Bayes is a classifier algorithm known for its simplicity and low computational cost. Given n different classes, the trained Naive Bayes classifier predicts for every unlabelled instance the class to which it belongs with high accuracy.

Parameters

nominal_attributes: numpy.ndarray (optional, default=None): List of Nominal attributes. If emtpy, then assume that all attributes are numerical.

Notes

The scikit-learn implementations of NaiveBayes are compatible with scikit-multiflow with the caveat that they must be partially fitted before use. In the scikit-multiflow evaluators this is done by setting pretrain_size>0.

Examples

>>> # Imports
>>> from skmultiflow.data import SEAGenerator
>>> from skmultiflow.bayes import NaiveBayes
>>>
>>> # Setup a data stream
>>> stream = SEAGenerator(random_state=1)
>>>
>>> # Setup Naive Bayes estimator
>>> naive_bayes = NaiveBayes()
>>>
>>> # Setup variables to control loop and track performance
>>> n_samples = 0
>>> correct_cnt = 0
>>> max_samples = 200
>>>
>>> # Train the estimator with the samples provided by the data stream
>>> while n_samples < max_samples and stream.has_more_samples():
>>>     X, y = stream.next_sample()
>>>     y_pred = naive_bayes.predict(X)
>>>     if y[0] == y_pred[0]:
>>>         correct_cnt += 1
>>>     naive_bayes.partial_fit(X, y)
>>>     n_samples += 1
>>>
>>> # Display results
>>> print('{} samples analyzed.'.format(n_samples))
>>> print('Naive Bayes accuracy: {}'.format(correct_cnt / n_samples))

Methods

`fit`(self, X, y[, classes, sample_weight])	Fit the model.
`get_info`(self)	Collects and returns the information about the configuration of the estimator
`get_params`(self[, deep])	Get parameters for this estimator.
`partial_fit`(self, X, y[, classes, sample_weight])	Partially (incrementally) fit the model.
`predict`(self, X)	Predict classes for the passed data.
`predict_proba`(self, X)	Estimates the probability of each sample in X belonging to each of the class-labels.
`reset`(self)	Resets the estimator to its initial state.
`score`(self, X, y[, sample_weight])	Returns the mean accuracy on the given test data and labels.
`set_params`(self, **params)	Set the parameters of this estimator.

fit(self, X, y, classes=None, sample_weight=None)[source]¶

Fit the model.

Parameters

Xnumpy.ndarray of shape (n_samples, n_features): The features to train the model.
y: numpy.ndarray of shape (n_samples, n_targets): An array-like with the class labels of all samples in X.
classes: numpy.ndarray, optional (default=None): Contains all possible/known class labels. Usage varies depending on the learning method.
sample_weight: numpy.ndarray, optional (default=None): Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

Returns

self

get_info(self)[source]¶

Collects and returns the information about the configuration of the estimator

Returns

string: Configuration of the estimator.

get_params(self, deep=True)[source]¶

Get parameters for this estimator.

Parameters

deepboolean, optional: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsmapping of string to any: Parameter names mapped to their values.

partial_fit(self, X, y, classes=None, sample_weight=None)[source]¶

Partially (incrementally) fit the model.

Parameters

Xnumpy.ndarray of shape (n_samples, n_features): The features to train the model.
y: numpy.ndarray of shape (n_samples): An array-like with the labels of all samples in X.
classes: numpy.ndarray, optional (default=None): Array with all possible/known classes. Usage varies depending on the learning method.
sample_weight: numpy.ndarray of shape (n_samples), optional (default=None): Samples weight. If not provided, uniform weights are assumed. Usage varies depending on the learning method.

Returns

NaiveBayes: self

predict(self, X)[source]¶

Predict classes for the passed data.

Parameters

Xnumpy.ndarray of shape (n_samples, n_features): The set of data samples to predict the labels for.

Returns

A numpy.ndarray with all the predictions for the samples in X.

predict_proba(self, X)[source]¶

Estimates the probability of each sample in X belonging to each of the class-labels.

Parameters

XNumpy.ndarray of shape (n_samples, n_features): The matrix of samples one wants to predict the class probabilities for.

Returns

A numpy.ndarray of shape (n_samples, n_labels), in which each outer entry is associated
with the X entry of the same index. And where the list in index [i] contains
len(self.target_values) elements, each of which represents the probability that
the i-th sample of X belongs to a certain class-label.

reset(self)[source]¶

Resets the estimator to its initial state.

Returns

self

score(self, X, y, sample_weight=None)[source]¶

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

Xarray-like, shape = (n_samples, n_features): Test samples.
yarray-like, shape = (n_samples) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like, shape = [n_samples], optional: Sample weights.

Returns

scorefloat: Mean accuracy of self.predict(X) wrt. y.

set_params(self, **params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

self