fetch_planetoid_cora_citeseer_pubmed#
- mlquantify.datasets.fetch_planetoid_cora_citeseer_pubmed(*, data_home=None, download_if_missing=True, return_X_y=False, as_frame=False, n_retries=3, delay=1.0, protocol=None, n_samples=1000, sample_size=500, random_state=None, name='cora')[source]#
Planetoid citation graphs: Cora / CiteSeer / PubMed (graph nodes).
The three standard Planetoid citation networks selected with
name. Nodes are papers with bag-of-words/TF-IDF features and a topic label;.graphholds the adjacency (citation links). Cora: 2708 nodes, 1433 features, 7 classes. CiteSeer: 3327 / 3703 / 6. PubMed: 19717 / 500 / 3.Quantification: node-level quantification under covariate/structural graph shift.
Nodes
2708 / 3327 / 19717
Features
1433 / 3703 / 500 (sparse)
Classes
7 / 6 / 3
Source: kimiyoung/planetoid
- Parameters:
- data_homestr or path-like, default=None
Folder used to cache the downloaded file(s); defaults to
_data/next to the package.- download_if_missingbool, default=True
If False, raise instead of downloading when the cache is empty.
- return_X_ybool, default=False
Return
(X, y)instead of a Bunch.- as_framebool, default=False
Return
.dataas a DataFrame,.targetas a Series, and a combined.frame(features + a"target"column).- n_retriesint, default=3
Number of download attempts before giving up.
- delayfloat, default=1.0
Seconds to wait between attempts.
- protocol{None, “app”, “npp”, “upp”, “ppp”} or mlquantify protocol, default=None
If set, draw evaluation sample-bags with an mlquantify protocol; the Bunch then also has
.samples(index bags into.data),.prevalencesand.protocol.- n_samplesint, default=1000
Number of prevalence points (bags) generated by the protocol.
- sample_sizeint, default=500
Instances per bag (the protocol
batch_size).- random_stateint or None, default=None
Seed forwarded to the protocol.
- name{‘cora’, ‘citeseer’, ‘pubmed’}, default=’cora’
Which citation network to load.
- Returns:
- dataBunch
Dictionary-like object. Attributes:
data(features),target(labels),feature_names,target_names,DESCR;framewhenas_frame=True; andsamples/prevalences/protocolwhenprotocolis set.- (X, y)tuple
Returned instead when
return_X_y=True.
References
Yang, Z. et al. (2016). Revisiting semi-supervised learning with graph embeddings. ICML 2016. Sen, P. et al. (2008). Collective classification in network data. AI Magazine.
Examples
>>> b = fetch_planetoid_cora_citeseer_pubmed(name='cora'); b.data.shape (2708, 1433) sparse; b.graph has the edges