![]() |
UFJF - Machine Learning Toolkit
0.51.8
|
Namespace for artificial datasets generation. More...
Classes | |
struct | RegPair |
struct | BlobsPair |
Typedefs | |
using | Centers = std::vector< mltk::Point< double > > |
Functions | |
mltk::Data< double > | make_spirals (size_t n_samples=100, int n_classes=2, bool shuffle=true, double noise=1.0, size_t n_loops=2, double margin=0.5, size_t seed=0) |
generates a synthetic data set composed of interlaced Archimedean spirals [source]. More... | |
BlobsPair | make_blobs (size_t n_samples=100, int n_centers=2, int n_dims=2, double cluster_std=1.0, double center_min=-10.0, double center_max=10.0, bool shuffle=true, bool has_classes=true, size_t seed=0) |
Generate isotropic Gaussian blobs for clustering or classification [source]. More... | |
BlobsPair | make_blobs (const std::vector< size_t > &n_samples, const std::vector< mltk::Point< double >> ¢ers, std::vector< double > clusters_std, int n_dims=2, bool shuffle=true, bool has_classes=true, size_t seed=0) |
Generate isotropic Gaussian blobs for clustering or classification from given centers and samples distribution. More... | |
RegPair | make_regression (size_t n_samples=100, size_t n_dims=100, double bias=0.0, double noise=0.1, double stdev=0.01, size_t n_informative=10, bool shuffle=true, size_t seed=0) |
Generate a random regression problem [source]. More... | |
Namespace for artificial datasets generation.
BlobsPair mltk::datasets::make_blobs | ( | const std::vector< size_t > & | n_samples, |
const std::vector< mltk::Point< double >> & | centers, | ||
std::vector< double > | clusters_std, | ||
int | n_dims = 2 , |
||
bool | shuffle = true , |
||
bool | has_classes = true , |
||
size_t | seed = 0 |
||
) |
Generate isotropic Gaussian blobs for clustering or classification from given centers and samples distribution.
n_samples | number of samples in each gaussian blob. |
centers | centers of the gaussian blobs. |
clusters_std | vector of standard deviations for each blob. |
n_dims | dimensionality of the data. |
shuffle | tells if the dataset must be shuffled after generation. |
has_classes | tells if the returned data have labels for the blobs. |
seed | seed for random data generation. |
BlobsPair mltk::datasets::make_blobs | ( | size_t | n_samples = 100 , |
int | n_centers = 2 , |
||
int | n_dims = 2 , |
||
double | cluster_std = 1.0 , |
||
double | center_min = -10.0 , |
||
double | center_max = 10.0 , |
||
bool | shuffle = true , |
||
bool | has_classes = true , |
||
size_t | seed = 0 |
||
) |
Generate isotropic Gaussian blobs for clustering or classification [source].
n_samples | number of samples on each gaussian blob. |
n_centers | number of classes or gaussian blobs. |
n_dims | dimensionality of the data. |
cluster_std | standard deviation for blobs generation. |
center_min | min value of generated data. |
center_max | max value of generated data. |
shuffle | tells if the dataset must be shuffled after generation. |
has_classes | tells if the returned data have labels for the blobs. |
seed | seed for random data generation. |
RegPair mltk::datasets::make_regression | ( | size_t | n_samples = 100 , |
size_t | n_dims = 100 , |
||
double | bias = 0.0 , |
||
double | noise = 0.1 , |
||
double | stdev = 0.01 , |
||
size_t | n_informative = 10 , |
||
bool | shuffle = true , |
||
size_t | seed = 0 |
||
) |
Generate a random regression problem [source].
n_samples | number of samples on the dataset. |
n_dims | dimensionality of the data. |
bias | The bias term in the underlying linear model. |
noise | The standard deviation of the gaussian noise applied to the output. |
stdev | The standard deviation of the gaussian noise applied to the output. |
n_informative | The number of informative features, i.e., the number of features used to build the linear model used to generate the output. |
shuffle | Shuffle the samples and the features. |
seed | seed for random data generation. |
mltk::Data< double > mltk::datasets::make_spirals | ( | size_t | n_samples = 100 , |
int | n_classes = 2 , |
||
bool | shuffle = true , |
||
double | noise = 1.0 , |
||
size_t | n_loops = 2 , |
||
double | margin = 0.5 , |
||
size_t | seed = 0 |
||
) |
generates a synthetic data set composed of interlaced Archimedean spirals [source].
n_samples | number of samples on the dataset. |
n_classes | number of classes or spirals. |
shuffle | tells if the dataset must be shuffled after generation. |
noise | value of the noise added to the dataset. |
n_loops | how many loops each spiral must have. |
margin | margin between spirals. |
seed | seed for random data generation. |