fetch_lequa2024#

mlquantify.datasets.fetch_lequa2024(*, data_home=None, download_if_missing=True, return_X_y=False, as_frame=False, n_retries=3, delay=1.0, protocol=None, n_samples=1000, sample_size=500, random_state=None, task='T1', include_test=False)[source]#

LeQua 2024 competition vectors, all tasks via task (text/ordinal).

Official LeQua 2024 data (Zenodo 11661820): 256-dimensional document vectors with controlled prevalence/covariate shift. task='T1' binary (prior-prob shift), 'T2' 28-class, 'T3' ordinal, 'T4' binary covariate shift. Returns the training set as (X, y); .data_dir points at the extracted dev sample files for official-protocol evaluation.

Quantification: the field’s official shared task – directly comparable, citable results.

Features

256 (document vectors)

Classes

2 (T1/T4) / 28 (T2) / ordinal (T3)

Shift

prior-prob (T1-T3) / covariate (T4)

Source: https://lequa2024.github.io/ (Zenodo 11661820)

Parameters:
data_homestr or path-like, default=None

Folder used to cache the downloaded file(s); defaults to _data/ next to the package.

download_if_missingbool, default=True

If False, raise instead of downloading when the cache is empty.

return_X_ybool, default=False

Return (X, y) instead of a Bunch.

as_framebool, default=False

Return .data as a DataFrame, .target as a Series, and a combined .frame (features + a "target" column).

n_retriesint, default=3

Number of download attempts before giving up.

delayfloat, default=1.0

Seconds to wait between attempts.

protocol{None, “app”, “npp”, “upp”, “ppp”} or mlquantify protocol, default=None

If set, draw evaluation sample-bags with an mlquantify protocol; the Bunch then also has .samples (index bags into .data), .prevalences and .protocol.

n_samplesint, default=1000

Number of prevalence points (bags) generated by the protocol.

sample_sizeint, default=500

Instances per bag (the protocol batch_size).

random_stateint or None, default=None

Seed forwarded to the protocol.

task{‘T1’, ‘T2’, ‘T3’, ‘T4’}, default=’T1’

Which LeQua-2024 task to load.

include_testbool, default=False

Also download the large official test bag zip.

Returns:
dataBunch

Dictionary-like object. Attributes: data (features), target (labels), feature_names, target_names, DESCR; frame when as_frame=True; and samples / prevalences / protocol when protocol is set.

(X, y)tuple

Returned instead when return_X_y=True.

References

Esuli, A., Moreo, A. & Sebastiani, F. (2024). LeQua 2024 overview. CLEF 2024.

Examples

>>> b = fetch_lequa2024(task='T3'); b.data.shape[1]  
256