fetch_sentiment140#
- mlquantify.datasets.fetch_sentiment140(*, data_home=None, download_if_missing=True, return_X_y=False, as_frame=False, n_retries=3, delay=1.0, protocol=None, n_samples=1000, sample_size=500, random_state=None)[source]#
Sentiment140: 1.6M timestamped tweets (text, binary, temporal).
1.6 million tweets weakly labelled by emoticons as negative (0) or positive (1), in original chronological order (the
datefield is exposed in the Bunch). Returned as raw tweet text + 0/1 labels; ideal for estimating the sentiment-trend curve over time.Quantification: a real timeline for quantification-over-time of sentiment.
Documents
1600000
Features
raw text (+ date)
Classes
2 (balanced 0/4)
Order
chronological
Source: https://www.kaggle.com/datasets/kazanova/sentiment140 (Stanford mirror)
- Parameters:
- data_homestr or path-like, default=None
Folder used to cache the downloaded file(s); defaults to
_data/next to the package.- download_if_missingbool, default=True
If False, raise instead of downloading when the cache is empty.
- return_X_ybool, default=False
Return
(X, y)instead of a Bunch.- as_framebool, default=False
Return
.dataas a DataFrame,.targetas a Series, and a combined.frame(features + a"target"column).- n_retriesint, default=3
Number of download attempts before giving up.
- delayfloat, default=1.0
Seconds to wait between attempts.
- protocol{None, “app”, “npp”, “upp”, “ppp”} or mlquantify protocol, default=None
If set, draw evaluation sample-bags with an mlquantify protocol; the Bunch then also has
.samples(index bags into.data),.prevalencesand.protocol.- n_samplesint, default=1000
Number of prevalence points (bags) generated by the protocol.
- sample_sizeint, default=500
Instances per bag (the protocol
batch_size).- random_stateint or None, default=None
Seed forwarded to the protocol.
- Returns:
- dataBunch
Dictionary-like object. Attributes:
data(features),target(labels),feature_names,target_names,DESCR;framewhenas_frame=True; andsamples/prevalences/protocolwhenprotocolis set.- (X, y)tuple
Returned instead when
return_X_y=True.
References
Go, A., Bhayani, R. & Huang, L. (2009). Twitter sentiment classification. Stanford.
Examples
>>> b = fetch_sentiment140(); len(b.data) 1600000