Namespace for artificial datasets generation. More...

Typedefs
using	Centers = std::vector< mltk::Point< double > >

Functions
mltk::Data< double >	make_spirals (size_t n_samples=100, int n_classes=2, bool shuffle=true, double noise=1.0, size_t n_loops=2, double margin=0.5, size_t seed=0)
	generates a synthetic data set composed of interlaced Archimedean spirals [source]. More...

BlobsPair	make_blobs (size_t n_samples=100, int n_centers=2, int n_dims=2, double cluster_std=1.0, double center_min=-10.0, double center_max=10.0, bool shuffle=true, bool has_classes=true, size_t seed=0)
	Generate isotropic Gaussian blobs for clustering or classification [source]. More...

BlobsPair	make_blobs (const std::vector< size_t > &n_samples, const std::vector< mltk::Point< double >> &centers, std::vector< double > clusters_std, int n_dims=2, bool shuffle=true, bool has_classes=true, size_t seed=0)
	Generate isotropic Gaussian blobs for clustering or classification from given centers and samples distribution. More...

RegPair	make_regression (size_t n_samples=100, size_t n_dims=100, double bias=0.0, double noise=0.1, double stdev=0.01, size_t n_informative=10, bool shuffle=true, size_t seed=0)
	Generate a random regression problem [source]. More...

Detailed Description

Namespace for artificial datasets generation.

Function Documentation

BlobsPair mltk::datasets::make_blobs	(	const std::vector< size_t > &	n_samples,
		const std::vector< mltk::Point< double >> &	centers,
		std::vector< double >	clusters_std,
		int	n_dims = `2`,
		bool	shuffle = `true`,
		bool	has_classes = `true`,
		size_t	seed = `0`
	)

Generate isotropic Gaussian blobs for clustering or classification from given centers and samples distribution.

Parameters

n_samples	number of samples in each gaussian blob.
centers	centers of the gaussian blobs.
clusters_std	vector of standard deviations for each blob.
n_dims	dimensionality of the data.
shuffle	tells if the dataset must be shuffled after generation.
has_classes	tells if the returned data have labels for the blobs.
seed	seed for random data generation.

BlobsPair mltk::datasets::make_blobs	(	size_t	n_samples = `100`,
		int	n_centers = `2`,
		int	n_dims = `2`,
		double	cluster_std = `1.0`,
		double	center_min = `-10.0`,
		double	center_max = `10.0`,
		bool	shuffle = `true`,
		bool	has_classes = `true`,
		size_t	seed = `0`
	)

Generate isotropic Gaussian blobs for clustering or classification [source].

Parameters

n_samples	number of samples on each gaussian blob.
n_centers	number of classes or gaussian blobs.
n_dims	dimensionality of the data.
cluster_std	standard deviation for blobs generation.
center_min	min value of generated data.
center_max	max value of generated data.
shuffle	tells if the dataset must be shuffled after generation.
has_classes	tells if the returned data have labels for the blobs.
seed	seed for random data generation.

RegPair mltk::datasets::make_regression	(	size_t	n_samples = `100`,
		size_t	n_dims = `100`,
		double	bias = `0.0`,
		double	noise = `0.1`,
		double	stdev = `0.01`,
		size_t	n_informative = `10`,
		bool	shuffle = `true`,
		size_t	seed = `0`
	)

Generate a random regression problem [source].

Parameters

n_samples	number of samples on the dataset.
n_dims	dimensionality of the data.
bias	The bias term in the underlying linear model.
noise	The standard deviation of the gaussian noise applied to the output.
stdev	The standard deviation of the gaussian noise applied to the output.
n_informative	The number of informative features, i.e., the number of features used to build the linear model used to generate the output.
shuffle	Shuffle the samples and the features.
seed	seed for random data generation.

Returns: A struct containing the coefficients and the generated regression dataset.

mltk::Data< double > mltk::datasets::make_spirals	(	size_t	n_samples = `100`,
		int	n_classes = `2`,
		bool	shuffle = `true`,
		double	noise = `1.0`,
		size_t	n_loops = `2`,
		double	margin = `0.5`,
		size_t	seed = `0`
	)

generates a synthetic data set composed of interlaced Archimedean spirals [source].

Parameters

n_samples	number of samples on the dataset.
n_classes	number of classes or spirals.
shuffle	tells if the dataset must be shuffled after generation.
noise	value of the noise added to the dataset.
n_loops	how many loops each spiral must have.
margin	margin between spirals.
seed	seed for random data generation.