TY - GEN
T1 - Generalization performance analysis of flow-based peer-to-peer traffic identification
AU - Wang, Yi Hsien
AU - Gau, Victor
AU - Bosaw, Trevor
AU - Hwang, Jenq Neng
AU - Lippman, Alan
AU - Lieberman, Dan
AU - Wu, I-Chen
PY - 2008
Y1 - 2008
N2 - In this paper, we develop a peer-to-peer (P2P) traffic identifier to facilitate quality of service (QoS) control in edge routers. Currently, since P2P applications consume a great percentage of Internet bandwidth, certain network optimization strategies are needed to improve the network performance. Traffic identification is the most important component that could be adopted in these optimization strategies. In this paper, we focus on developing a machine learning strategy to perform quick identification, and continuous tracking of flows associated with various P2P media streaming and file sharing applications. With the use of Random Forests (RF) and evaluated by using 10-fold cross validation, our method achieves greater than 98% accuracy rate and 89% precision rate of identifying the P2P flows, with less than 1% false positive rate. With the help of winner-take-all strategy, the generalization performance of using the RF built with data collected from one network to classify flows in other networks can achieve accuracy of being over 97%, with the precision being over 81% and the FP rate being below 2%.
AB - In this paper, we develop a peer-to-peer (P2P) traffic identifier to facilitate quality of service (QoS) control in edge routers. Currently, since P2P applications consume a great percentage of Internet bandwidth, certain network optimization strategies are needed to improve the network performance. Traffic identification is the most important component that could be adopted in these optimization strategies. In this paper, we focus on developing a machine learning strategy to perform quick identification, and continuous tracking of flows associated with various P2P media streaming and file sharing applications. With the use of Random Forests (RF) and evaluated by using 10-fold cross validation, our method achieves greater than 98% accuracy rate and 89% precision rate of identifying the P2P flows, with less than 1% false positive rate. With the help of winner-take-all strategy, the generalization performance of using the RF built with data collected from one network to classify flows in other networks can achieve accuracy of being over 97%, with the precision being over 81% and the FP rate being below 2%.
UR - http://www.scopus.com/inward/record.url?scp=58049188027&partnerID=8YFLogxK
U2 - 10.1109/MLSP.2008.4685491
DO - 10.1109/MLSP.2008.4685491
M3 - Conference contribution
AN - SCOPUS:58049188027
SN - 9781424423767
T3 - Proceedings of the 2008 IEEE Workshop on Machine Learning for Signal Processing, MLSP 2008
SP - 267
EP - 272
BT - Proceedings of the 2008 IEEE Workshop on Machine Learning for Signal Processing, MLSP 2008
T2 - 2008 IEEE Workshop on Machine Learning for Signal Processing, MLSP 2008
Y2 - 16 October 2008 through 19 October 2008
ER -