UFJF - Machine Learning Toolkit  0.51.8
UFJF - Machine Learning Toolkit

This project aims to provide researchers and developers basic tools for manipulation of datasets, implementation and test of ML algorithms and some already implemented methods.
It's not intended to be just a collection of algorithms, but also to auxiliate and create a pattern in future ML algorithms implementations through a set of interconected modules that can be used in most ML projects.

Documentation

You can find the documentation at the project page: UFJF-MLTK.
And for examples and other information you can access the wiki.

Installation

In order to make the project available for the majority of users and to be cross-platform, the project was adapted to CMake and Meson,the most wide used build systems. Therefore, there are two install methods for the project that can be seen below.

Requirements

  • meson or cmake
  • g++ >= 8
  • c++ >= 17
  • gnuplot >= 5 (only for visualization module)

CMake

mkdir build
cd build
cmake ..
make
sudo make install

Meson

meson build
meson compile -C build
meson install -C build

After that, the library will be available system wide and it can be used as any library.

Code Example

The framework is intended to make easier the usage of machine learning algorithms in C++, in the following example we output the 10-fold cross validation accuracy of the kNN algorithm with 3 neighbors, as we can see, we can do it with few lines of code.

main.cpp

#include <ufjfmltk/Core.hpp>
#include <ufjfmltk/Validation.hpp>
#include <ufjfmltk/Classifier.hpp>
int main(){
mltk::Data<double> data("iris.data");
std::cout << "Dataset size: " << data.size() << std::endl;
std::cout << "Dataset dimension: " << data.dim() << std::endl;
std::cout << "KNN accuracy: ";
std::cout << mltk::validation::kfold(data, knn, 10, 42, 0).accuracy
<< "%" << std::endl;
}
Wrapper for the implementation of the K-Nearest Neighbors classifier algorithm.
Definition: KNNClassifier.hpp:21
ValidationReport kfold(Data< T > sample, classifier::Classifier< T > &classifier, size_t fold, bool stratified=true, size_t seed=0, int verbose=0)
Executes k-fold stratified cross-validation.
Definition: valid/Validation.hpp:312
double accuracy
Accuracy of the validated model.
Definition: valid/Validation.hpp:24

Compiling:

g++ main.cpp -o main -lufjfmltk

This program outputs the following:

Dataset size: 150
Dataset dimension: 4
KNN accuracy: 100%

Modules status

  • Data manipulation Generic badge
  • Artificial datasets Generic badge
  • Data visualization Generic badge
  • Classifiers (Primal and Dual) Generic badge
  • Ensemble Generic badge
  • Regression Generic badge
  • Validation (K-Fold Cross-Validation) Generic badge
  • Feature Selection Generic badge
  • Documentation Generic badge

Authors

Mateus Coutinho Marim (mateu.nosp@m.s.ma.nosp@m.rim@i.nosp@m.ce.u.nosp@m.fjf.b.nosp@m.r)
Saulo Moraes Villela (saulo.nosp@m..mor.nosp@m.aes@u.nosp@m.fjf..nosp@m.edu.b.nosp@m.r)
Alessandreia Marta de Oliveira Julio (aless.nosp@m.andr.nosp@m.eia.o.nosp@m.live.nosp@m.ira@i.nosp@m.ce.u.nosp@m.fjf.b.nosp@m.r)

Universidade Federal de Juiz de Fora
Departamento de Ciência da Computação