UFJF - Machine Learning Toolkit  0.51.8
mltk::validation Namespace Reference

Validation methods namespace. More...

Classes

struct  ValidationReport
 Solution for the validation of a ML method. More...
 
struct  CrossValidation
 Structure to manage cross validation. More...
 
struct  TrainTestPair
 A struct representing a pair with training and test data. More...
 

Functions

template<typename T >
std::vector< std::vector< size_t > > generateConfusionMatrix (Data< T > &samples, Learner< T > &learner)
 Compute the confusion matrix for a given trained classifier. More...
 
template<typename T >
ValidationReport metricsReport (const Data< T > &data, const std::vector< std::vector< size_t > > &cfm, std::vector< int > positive_labels=std::vector< int >())
 Generates a report with classifiers metrics. More...
 
template<typename T , typename Classifier >
double accuracy (const Data< T > &data, Classifier &model, bool trained=true)
 
double confusionMatrixAccuracy (const std::vector< std::vector< size_t > > &conf_matrix)
 Compute the accuracy based on a confusion matrix. More...
 
ValidationReport classificationReport (const Point< int > &real, const Point< int > &predicted)
 
template<typename T >
std::vector< TrainTestPair< T > > kfoldsplit (Data< T > &samples, size_t folds=5, bool stratified=true, bool keepIndex=true, size_t seed=0)
 Split the data in k folds. More...
 
template<typename T >
std::vector< TrainTestPair< T > > kfoldsplit (Data< T > &samples, size_t folds, size_t qtde, bool stratified=true, bool keepIndex=true, size_t seed=0)
 Split the data in k folds multiplied by the number of executions. More...
 
template<typename T >
TrainTestPair< T > partTrainTest (Data< T > &data, size_t fold, bool stratified=true, bool keepIndex=true, size_t seed=0)
 Divide the samples in training and test set. More...
 
template<typename T >
ValidationReport kfold (Data< T > sample, classifier::Classifier< T > &classifier, size_t fold, bool stratified=true, size_t seed=0, int verbose=0)
 Executes k-fold stratified cross-validation. More...
 
template<typename T >
ValidationReport kkfold (Data< T > samples, classifier::Classifier< T > &classifier, size_t qtde, size_t fold, bool stratified=true, size_t seed=0, int verbose=0)
 Executes the validation with several executions of the k fold algorithm. More...
 

Detailed Description

Validation methods namespace.

Function Documentation

◆ confusionMatrixAccuracy()

double mltk::validation::confusionMatrixAccuracy ( const std::vector< std::vector< size_t > > &  conf_matrix)
inline

Compute the accuracy based on a confusion matrix.

Parameters
conf_matrixA confusion matrix.
Returns
Accuracy based on a confusion matrix.

◆ generateConfusionMatrix()

template<typename T >
std::vector< std::vector< size_t > > mltk::validation::generateConfusionMatrix ( Data< T > &  samples,
Learner< T > &  learner 
)

Compute the confusion matrix for a given trained classifier.

Parameters
samplesData to train the classifier on.
learnerClassifier to be evaluated.
Returns
A matrix where the the rows are the true labels and the columns are the missclassification.

◆ kfold()

template<typename T >
ValidationReport mltk::validation::kfold ( Data< T >  sample,
classifier::Classifier< T > &  classifier,
size_t  fold,
bool  stratified = true,
size_t  seed = 0,
int  verbose = 0 
)

Executes k-fold stratified cross-validation.

Parameters
sampleData to train the classifier on.
classifierClassifier to be evaluated.
foldNumber of folds.
seedSeed to feed the pseudo random number generator.
Returns
struct containing the validation metrics.

◆ kfoldsplit() [1/2]

template<typename T >
std::vector< TrainTestPair< T > > mltk::validation::kfoldsplit ( Data< T > &  samples,
size_t  folds,
size_t  qtde,
bool  stratified = true,
bool  keepIndex = true,
size_t  seed = 0 
)

Split the data in k folds multiplied by the number of executions.

Template Parameters
TType of the underlying data.
Parameters
samplesData to be splitted.
foldsNumber of folds.
qtdeNumber of executions, the number of folds is the folds value multiplied by the number of executions.
stratifiedIf true, the data will be splitted in a stratified manner.
seedSeed to feed the pseudo random number generator.
Returns
Vector containing the folds as TrainTestPair objects.

◆ kfoldsplit() [2/2]

template<typename T >
std::vector< TrainTestPair< T > > mltk::validation::kfoldsplit ( Data< T > &  samples,
size_t  folds = 5,
bool  stratified = true,
bool  keepIndex = true,
size_t  seed = 0 
)

Split the data in k folds.

Template Parameters
TType of the underlying data.
Parameters
samplesData to be splitted.
foldsNumber of folds.
stratifiedIf true, the data will be splitted in a stratified manner.
seedSeed to feed the pseudo random number generator.
Returns
Vector containing the folds as TrainTestPair objects.

◆ kkfold()

template<typename T >
ValidationReport mltk::validation::kkfold ( Data< T >  samples,
classifier::Classifier< T > &  classifier,
size_t  qtde,
size_t  fold,
bool  stratified = true,
size_t  seed = 0,
int  verbose = 0 
)

Executes the validation with several executions of the k fold algorithm.

Parameters
samplesData to train the classifier on.
classifierClassifier to be evaluated.
qtdeNumber of executions.
foldNumber of folds.
stratifiedIf true, the data will be splitted in a stratified manner.
seedseed for the random generator engine.
verboseverbose level.
Returns
struct containing the validation metrics.

◆ metricsReport()

template<typename T >
ValidationReport mltk::validation::metricsReport ( const Data< T > &  data,
const std::vector< std::vector< size_t > > &  cfm,
std::vector< int >  positive_labels = std::vector<int>() 
)
inline

Generates a report with classifiers metrics.

Template Parameters
TType of the underlying data.
Parameters
dataData to be used to compute the metrics.
cfmPre-computed confusion matrix.
positive_labelsClasses to be considered as positive labels.
Returns
Struct containing the generated metrics.

◆ partTrainTest()

template<typename T >
TrainTestPair< T > mltk::validation::partTrainTest ( Data< T > &  data,
size_t  fold,
bool  stratified = true,
bool  keepIndex = true,
size_t  seed = 0 
)

Divide the samples in training and test set.

Parameters
dataData to be splitted.
foldNumber of folds.
seedSeed to feed the pseudo random number generator.
Returns
A pair containing the training and test data.