Accelerating Random Forest on Memory-Constrained Devices through Data Storage Optimization

Camelia Slimani, Chun Feng Wu, Stephane Rubini, Yuan Hao Chang, Jalil Boukhobza

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Random forests is a widely used classification algorithm. It consists of a set of decision trees each of which is a classifier built on the basis of a random subset of the training data-set. In an environment where the memory work-space is low in comparison to the data-set size, when training a decision tree, a large proportion of the execution time is related to I/O operations. These are caused by data blocks transfers between the storage device and the memory work-space (in both directions). Our analysis of random forests training algorithms showed that there are two major issues : <bold>(1)</bold> Block Under-utilization: data blocks are poorly used when loaded into memory and have to be reloaded multiple times, meaning that the algorithm exhibits a poor spatial locality; <bold>(2)</bold> Data Over-read: the data-set is supposed to be fully loaded in memory whereas a large proportion of data are not effectively useful when building a decision tree. Our proposed solution is structured to address these two issues. First, we propose to reorganize the data-set in such a way to enhance spatial locality and secondly, to remove the assumption that the data-set is entirely loaded into memory and access data only when effectively needed. Our experiments show that this method made it possible to reduce random forest building time by 51 to 95&#x0025; in comparison to a state-of-the-art method.

Original languageEnglish
Pages (from-to)1-14
Number of pages14
JournalIEEE Transactions on Computers
DOIs
StateAccepted/In press - 2022

Keywords

  • Embedded systems
  • I/O accesses reduction
  • Memory Hierarchy
  • Random Forests

Fingerprint

Dive into the research topics of 'Accelerating Random Forest on Memory-Constrained Devices through Data Storage Optimization'. Together they form a unique fingerprint.

Cite this