TY - JOUR
T1 - QMSampler: Joint Sampling of Multiple Networks with Quality Guarantee
AU - Shuai, Hong-Han
AU - Yang, De Nian
AU - Shen, Chih Ya
AU - Yu, Philip S.
AU - Chen, Ming Syan
PY - 2018/3
Y1 - 2018/3
N2 - Because Online Social Networks (OSNs) have become increasingly important in the last decade, they have motivated a great deal of research on Social Network Analysis (SNA). Currently, SNA algorithms are evaluated on real datasets obtained from large-scale OSNs, which are usually sampled by Breadth-First-Search (BFS), Random Walk (RW), or some variations of the latter. However, none of the released datasets provides any statistical guarantees on the difference between the sampled datasets and the ground truth. Moreover, all existing sampling algorithms only focus on sampling a single OSN, but each OSN is actually a sampling of a complete social network. Hence, even if the whole dataset from a single OSN is sampled, the results may still be skewed and may not fully reflect the properties of the complete social network. To address the above issues, we have made the first attempt to explore the joint sampling of multiple OSNs and propose an approach called Quality-guaranteed Multi-network Sampler (QMSampler) that can jointly sample multiple OSNs. QMSampler provides a statistical guarantee on the difference between the sampled real dataset and the ground truth (the perfect integration of all OSNs). Our experimental results demonstrate that the proposed approach generates a much smaller bias than any existing method. QMSampler has also been released as a free download.
AB - Because Online Social Networks (OSNs) have become increasingly important in the last decade, they have motivated a great deal of research on Social Network Analysis (SNA). Currently, SNA algorithms are evaluated on real datasets obtained from large-scale OSNs, which are usually sampled by Breadth-First-Search (BFS), Random Walk (RW), or some variations of the latter. However, none of the released datasets provides any statistical guarantees on the difference between the sampled datasets and the ground truth. Moreover, all existing sampling algorithms only focus on sampling a single OSN, but each OSN is actually a sampling of a complete social network. Hence, even if the whole dataset from a single OSN is sampled, the results may still be skewed and may not fully reflect the properties of the complete social network. To address the above issues, we have made the first attempt to explore the joint sampling of multiple OSNs and propose an approach called Quality-guaranteed Multi-network Sampler (QMSampler) that can jointly sample multiple OSNs. QMSampler provides a statistical guarantee on the difference between the sampled real dataset and the ground truth (the perfect integration of all OSNs). Our experimental results demonstrate that the proposed approach generates a much smaller bias than any existing method. QMSampler has also been released as a free download.
U2 - 10.1109/TBDATA.2017.2715847
DO - 10.1109/TBDATA.2017.2715847
M3 - Article
SN - 2332-7790
VL - 4
SP - 90
EP - 104
JO - IEEE Transactions on Big Data
JF - IEEE Transactions on Big Data
IS - 1
ER -