fetch_lequa2024#
- mlquantify.datasets.fetch_lequa2024(*, data_home=None, download_if_missing=True, return_X_y=False, as_frame=False, n_retries=3, delay=1.0, protocol=None, n_samples=1000, sample_size=500, random_state=None, task='T1', include_test=False)[source]#
LeQua 2024 competition vectors, all tasks via
task(text/ordinal).Official LeQua 2024 data (Zenodo 11661820): 256-dimensional document vectors with controlled prevalence/covariate shift.
task='T1'binary (prior-prob shift),'T2'28-class,'T3'ordinal,'T4'binary covariate shift. Returns the training set as(X, y);.data_dirpoints at the extracted dev sample files for official-protocol evaluation.Quantification: the field’s official shared task – directly comparable, citable results.
Features
256 (document vectors)
Classes
2 (T1/T4) / 28 (T2) / ordinal (T3)
Shift
prior-prob (T1-T3) / covariate (T4)
Source: https://lequa2024.github.io/ (Zenodo 11661820)
- Parameters:
- data_homestr or path-like, default=None
Folder used to cache the downloaded file(s); defaults to
_data/next to the package.- download_if_missingbool, default=True
If False, raise instead of downloading when the cache is empty.
- return_X_ybool, default=False
Return
(X, y)instead of a Bunch.- as_framebool, default=False
Return
.dataas a DataFrame,.targetas a Series, and a combined.frame(features + a"target"column).- n_retriesint, default=3
Number of download attempts before giving up.
- delayfloat, default=1.0
Seconds to wait between attempts.
- protocol{None, “app”, “npp”, “upp”, “ppp”} or mlquantify protocol, default=None
If set, draw evaluation sample-bags with an mlquantify protocol; the Bunch then also has
.samples(index bags into.data),.prevalencesand.protocol.- n_samplesint, default=1000
Number of prevalence points (bags) generated by the protocol.
- sample_sizeint, default=500
Instances per bag (the protocol
batch_size).- random_stateint or None, default=None
Seed forwarded to the protocol.
- task{‘T1’, ‘T2’, ‘T3’, ‘T4’}, default=’T1’
Which LeQua-2024 task to load.
- include_testbool, default=False
Also download the large official test bag zip.
- Returns:
- dataBunch
Dictionary-like object. Attributes:
data(features),target(labels),feature_names,target_names,DESCR;framewhenas_frame=True; andsamples/prevalences/protocolwhenprotocolis set.- (X, y)tuple
Returned instead when
return_X_y=True.
References
Esuli, A., Moreo, A. & Sebastiani, F. (2024). LeQua 2024 overview. CLEF 2024.
Examples
>>> b = fetch_lequa2024(task='T3'); b.data.shape[1] 256