Accelerating Random Forest on Memory-Constrained Devices Through Data Storage Optimization

Camelia Slimani*, Chun Feng Wu, Stephane Rubini, Yuan Hao Chang, Jalil Boukhobza

*此作品的通信作者

研究成果: Article同行評審

6 引文 斯高帕斯(Scopus)

摘要

Random forests is a widely used classification algorithm. It consists of a set of decision trees each of which is a classifier built on the basis of a random subset of the training data-set. In an environment where the memory work-space is low in comparison to the data-set size, when training a decision tree, a large proportion of the execution time is related to I/O operations. These are caused by data blocks transfers between the storage device and the memory work-space (in both directions). Our analysis of random forests training algorithms showed that there are two major issues : (1) Block Under-utilization: data blocks are poorly used when loaded into memory and have to be reloaded multiple times, meaning that the algorithm exhibits a poor spatial locality; (2) Data Over-read: the data-set is supposed to be fully loaded in memory whereas a large proportion of data are not effectively useful when building a decision tree. Our proposed solution is structured to address these two issues. First, we propose to reorganize the data-set in such a way to enhance spatial locality and second, to remove the assumption that the data-set is entirely loaded into memory and access data only when effectively needed. Our experiments show that this method made it possible to reduce random forest building time by 51 to 95% in comparison to a state-of-the-art method.

原文English
頁(從 - 到)1595-1609
頁數15
期刊IEEE Transactions on Computers
72
發行號6
DOIs
出版狀態Published - 1 6月 2023

指紋

深入研究「Accelerating Random Forest on Memory-Constrained Devices Through Data Storage Optimization」主題。共同形成了獨特的指紋。

引用此