simplex_dirichlet_sampling#

mlquantify.utils.simplex_dirichlet_sampling(n_dim: int, n_prev: int, n_iter: int, alpha=1.0, min_val: float = 0.0, max_val: float = 1.0, max_tries: int = 1000, random_state: int | None = None) ndarray[source]#

Sample prevalence vectors from a Dirichlet distribution on the simplex, constrained by min_val\(p_i\)max_val.

The concentration parameter alpha controls how the probability mass is spread over the simplex:

  • alpha == 1 — the flat Dirichlet \(\mathrm{Dir}(\mathbf{1})\), i.e. a uniform distribution over the simplex (every prevalence combination is equally likely).

  • alpha > 1 — mass is pulled towards the balanced centre \((1/k, \ldots, 1/k)\); extreme prevalences become rare.

  • alpha < 1 — mass is pushed towards the corners; near-pure (one-class-dominant) prevalences become common.

Parameters:
n_dimint

Number of dimensions (classes).

n_prevint

Number of prevalence vectors to generate.

n_iterint

Number of repetitions for each generated vector.

alphafloat or array-like of shape (n_dim,), default=1.0

Dirichlet concentration parameter. A scalar is broadcast to a symmetric Dirichlet over all classes; an array sets a per-class concentration.

min_valfloat, default=0.0

Minimum allowed prevalence for each class.

max_valfloat, default=1.0

Maximum allowed prevalence for each class.

max_triesint, optional

Maximum number of sampling iterations to reach the target count.

random_stateint, RandomState instance or None, default=None

Seed or generator controlling the sampling.

Returns:
np.ndarray

Array of shape (n_prev * n_iter, n_dim) with valid prevalence vectors.