UFJF - Machine Learning Toolkit  0.51.8
mltk::datasets Namespace Reference

Namespace for artificial datasets generation. More...

Classes

struct  RegPair
 
struct  BlobsPair
 

Typedefs

using Centers = std::vector< mltk::Point< double > >
 

Functions

mltk::Data< double > make_spirals (size_t n_samples=100, int n_classes=2, bool shuffle=true, double noise=1.0, size_t n_loops=2, double margin=0.5, size_t seed=0)
 generates a synthetic data set composed of interlaced Archimedean spirals [source]. More...
 
BlobsPair make_blobs (size_t n_samples=100, int n_centers=2, int n_dims=2, double cluster_std=1.0, double center_min=-10.0, double center_max=10.0, bool shuffle=true, bool has_classes=true, size_t seed=0)
 Generate isotropic Gaussian blobs for clustering or classification [source]. More...
 
BlobsPair make_blobs (const std::vector< size_t > &n_samples, const std::vector< mltk::Point< double >> &centers, std::vector< double > clusters_std, int n_dims=2, bool shuffle=true, bool has_classes=true, size_t seed=0)
 Generate isotropic Gaussian blobs for clustering or classification from given centers and samples distribution. More...
 
RegPair make_regression (size_t n_samples=100, size_t n_dims=100, double bias=0.0, double noise=0.1, double stdev=0.01, size_t n_informative=10, bool shuffle=true, size_t seed=0)
 Generate a random regression problem [source]. More...
 

Detailed Description

Namespace for artificial datasets generation.

Function Documentation

◆ make_blobs() [1/2]

BlobsPair mltk::datasets::make_blobs ( const std::vector< size_t > &  n_samples,
const std::vector< mltk::Point< double >> &  centers,
std::vector< double >  clusters_std,
int  n_dims = 2,
bool  shuffle = true,
bool  has_classes = true,
size_t  seed = 0 
)

Generate isotropic Gaussian blobs for clustering or classification from given centers and samples distribution.

Parameters
n_samplesnumber of samples in each gaussian blob.
centerscenters of the gaussian blobs.
clusters_stdvector of standard deviations for each blob.
n_dimsdimensionality of the data.
shuffletells if the dataset must be shuffled after generation.
has_classestells if the returned data have labels for the blobs.
seedseed for random data generation.
Returns
pair containing the dataset and blobs centers.

◆ make_blobs() [2/2]

BlobsPair mltk::datasets::make_blobs ( size_t  n_samples = 100,
int  n_centers = 2,
int  n_dims = 2,
double  cluster_std = 1.0,
double  center_min = -10.0,
double  center_max = 10.0,
bool  shuffle = true,
bool  has_classes = true,
size_t  seed = 0 
)

Generate isotropic Gaussian blobs for clustering or classification [source].

Parameters
n_samplesnumber of samples on each gaussian blob.
n_centersnumber of classes or gaussian blobs.
n_dimsdimensionality of the data.
cluster_stdstandard deviation for blobs generation.
center_minmin value of generated data.
center_maxmax value of generated data.
shuffletells if the dataset must be shuffled after generation.
has_classestells if the returned data have labels for the blobs.
seedseed for random data generation.
Returns
pair containing the dataset and blobs centers.

◆ make_regression()

RegPair mltk::datasets::make_regression ( size_t  n_samples = 100,
size_t  n_dims = 100,
double  bias = 0.0,
double  noise = 0.1,
double  stdev = 0.01,
size_t  n_informative = 10,
bool  shuffle = true,
size_t  seed = 0 
)

Generate a random regression problem [source].

Parameters
n_samplesnumber of samples on the dataset.
n_dimsdimensionality of the data.
biasThe bias term in the underlying linear model.
noiseThe standard deviation of the gaussian noise applied to the output.
stdevThe standard deviation of the gaussian noise applied to the output.
n_informativeThe number of informative features, i.e., the number of features used to build the linear model used to generate the output.
shuffleShuffle the samples and the features.
seedseed for random data generation.
Returns
A struct containing the coefficients and the generated regression dataset.

◆ make_spirals()

mltk::Data< double > mltk::datasets::make_spirals ( size_t  n_samples = 100,
int  n_classes = 2,
bool  shuffle = true,
double  noise = 1.0,
size_t  n_loops = 2,
double  margin = 0.5,
size_t  seed = 0 
)

generates a synthetic data set composed of interlaced Archimedean spirals [source].

Parameters
n_samplesnumber of samples on the dataset.
n_classesnumber of classes or spirals.
shuffletells if the dataset must be shuffled after generation.
noisevalue of the noise added to the dataset.
n_loopshow many loops each spiral must have.
marginmargin between spirals.
seedseed for random data generation.
Returns
spirals artificial dataset.