skmultiflow.meta.
ClassifierChain
Classifier Chains for multi-label learning.
Each member of the ensemble is an instance of the base estimator
None to use default order, ‘random’ for random order.
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Notes
Classifier Chains [1] is a popular method for multi-label learning. It exploits correlation between labels by incrementally building binary classifiers for each label.
scikit-learn also includes ‘ClassifierChain’. A difference is probabilistic extensions are included here.
References
Read, Jesse, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. “Classifier chains for multi-label classification.” In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 254-269. Springer, Berlin, Heidelberg, 2009.
Examples
>>> from skmultiflow.data import make_logical >>> >>> X, Y = make_logical(random_state=1) >>> >>> print("TRUE: ") >>> print(Y) >>> print("vs") >>> >>> print("CC") >>> cc = ClassifierChain(SGDClassifier(max_iter=100, loss='log', random_state=1)) >>> cc.fit(X, Y) >>> print(cc.predict(X)) >>> >>> print("RCC") >>> cc = ClassifierChain(SGDClassifier(max_iter=100, loss='log', random_state=1), order='random', random_state=1) >>> cc.fit(X, Y) >>> print(cc.predict(X)) >>> TRUE: [[1. 0. 1.] [1. 1. 0.] [0. 0. 0.] [1. 1. 0.]] vs CC [[1. 0. 1.] [1. 1. 0.] [0. 0. 0.] [1. 1. 0.]] RCC [[1. 0. 1.] [1. 1. 0.] [0. 0. 0.] [1. 1. 0.]]
Methods
fit(self, X, y[, classes, sample_weight])
fit
Fit the model.
get_info(self)
get_info
Collects and returns the information about the configuration of the estimator
get_params(self[, deep])
get_params
Get parameters for this estimator.
partial_fit(self, X, y[, classes, sample_weight])
partial_fit
Partially (incrementally) fit the model.
predict(self, X)
predict
Predict classes for the passed data.
predict_proba(self, X)
predict_proba
Estimates the probability of each sample in X belonging to each of the class-labels.
reset(self)
reset
Resets the estimator to its initial state.
score(self, X, y[, sample_weight])
score
Returns the mean accuracy on the given test data and labels.
set_params(self, **params)
set_params
Set the parameters of this estimator.
The features to train the model.
An array-like with the labels of all samples in X.
Configuration of the estimator.
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Parameter names mapped to their values.
The set of data samples to predict the labels for.
The matrix of samples one wants to predict the class probabilities for.
Returns marginals [P(y_1=1|x),…,P(y_L=1|x,y_1,…,y_{L-1})] i.e., confidence predictions given inputs, for each instance.
This function suitable for multi-label (binary) data only at the moment (may give index-out-of-bounds error if uni- or multi-target (of > 2 values) data is used in training).
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Test samples.
True labels for X.
Sample weights.
Mean accuracy of self.predict(X) wrt. y.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
<component>__<parameter>