TY - GEN
T1 - Replanting Your Forest
T2 - 8th IEEE Non-Volatile Memory Systems and Applications Symposium, NVMSA 2019
AU - Ho, Yu Ting
AU - Wu, Chun Feng
AU - Yang, Ming Chang
AU - Chen, Tseng Yi
AU - Chang, Yuan Hao
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/8
Y1 - 2019/8
N2 - Random forest is effective and accurate in making predictions for classification and regression problems, which constitute the majority of machine learning applications or systems nowadays. However, as the data are being generated explosively in this big data era, many machine learning algorithms, including the random forest algorithm, may face the difficulty in maintaining and processing all the required data in the main memory. Instead, intensive data movements (i.e., data swappings) between the faster-but-smaller main memory and the slowerbut-larger secondary storage may occur excessively and largely degrade the performance. To address this challenge, the emerging non-volatile memory (NVM) technologies are placed great hopes to substitute the traditional random access memory (RAM) and to build a larger-Than-ever main memory space because of its higher cell density, lower power consumption, and comparable read performance as traditional RAM. Nevertheless, the limited write endurance of NVM cells and the read-write asymmetry of NVMs may still limit the feasibility of performing machine learning algorithms directly on NVMs. Such dilemma inspires this study to develop an NVM-friendly bagging strategy for the random forest algorithm, in order to trade the 'randomness' of the sampled data for the reduced data movements in the memory hierarchy without hurting the prediction accuracy. The evaluation results show that the proposed design could save up to 72% of the write accesses on the representative traces with nearly no degradation on the prediction accuracy.
AB - Random forest is effective and accurate in making predictions for classification and regression problems, which constitute the majority of machine learning applications or systems nowadays. However, as the data are being generated explosively in this big data era, many machine learning algorithms, including the random forest algorithm, may face the difficulty in maintaining and processing all the required data in the main memory. Instead, intensive data movements (i.e., data swappings) between the faster-but-smaller main memory and the slowerbut-larger secondary storage may occur excessively and largely degrade the performance. To address this challenge, the emerging non-volatile memory (NVM) technologies are placed great hopes to substitute the traditional random access memory (RAM) and to build a larger-Than-ever main memory space because of its higher cell density, lower power consumption, and comparable read performance as traditional RAM. Nevertheless, the limited write endurance of NVM cells and the read-write asymmetry of NVMs may still limit the feasibility of performing machine learning algorithms directly on NVMs. Such dilemma inspires this study to develop an NVM-friendly bagging strategy for the random forest algorithm, in order to trade the 'randomness' of the sampled data for the reduced data movements in the memory hierarchy without hurting the prediction accuracy. The evaluation results show that the proposed design could save up to 72% of the write accesses on the representative traces with nearly no degradation on the prediction accuracy.
UR - http://www.scopus.com/inward/record.url?scp=85074162801&partnerID=8YFLogxK
U2 - 10.1109/NVMSA.2019.8863525
DO - 10.1109/NVMSA.2019.8863525
M3 - Conference contribution
AN - SCOPUS:85074162801
T3 - Proceedings - 2019 IEEE Non-Volatile Memory Systems and Applications Symposium, NVMSA 2019
BT - Proceedings - 2019 IEEE Non-Volatile Memory Systems and Applications Symposium, NVMSA 2019
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 August 2019 through 21 August 2019
ER -