simplex_dirichlet_sampling#
- mlquantify.utils.simplex_dirichlet_sampling(n_dim: int, n_prev: int, n_iter: int, alpha=1.0, min_val: float = 0.0, max_val: float = 1.0, max_tries: int = 1000, random_state: int | None = None) ndarray[source]#
Sample prevalence vectors from a Dirichlet distribution on the simplex, constrained by
min_val≤ \(p_i\) ≤max_val.The concentration parameter
alphacontrols how the probability mass is spread over the simplex:alpha == 1— the flat Dirichlet \(\mathrm{Dir}(\mathbf{1})\), i.e. a uniform distribution over the simplex (every prevalence combination is equally likely).alpha > 1— mass is pulled towards the balanced centre \((1/k, \ldots, 1/k)\); extreme prevalences become rare.alpha < 1— mass is pushed towards the corners; near-pure (one-class-dominant) prevalences become common.
- Parameters:
- n_dimint
Number of dimensions (classes).
- n_prevint
Number of prevalence vectors to generate.
- n_iterint
Number of repetitions for each generated vector.
- alphafloat or array-like of shape (n_dim,), default=1.0
Dirichlet concentration parameter. A scalar is broadcast to a symmetric Dirichlet over all classes; an array sets a per-class concentration.
- min_valfloat, default=0.0
Minimum allowed prevalence for each class.
- max_valfloat, default=1.0
Maximum allowed prevalence for each class.
- max_triesint, optional
Maximum number of sampling iterations to reach the target count.
- random_stateint, RandomState instance or None, default=None
Seed or generator controlling the sampling.
- Returns:
- np.ndarray
Array of shape (n_prev * n_iter, n_dim) with valid prevalence vectors.