Accelerating Random Forest on Memory-Constrained Devices Through Data Storage Optimization

  • Camelia Slimani*
  • , Chun Feng Wu
  • , Stephane Rubini
  • , Yuan Hao Chang
  • , Jalil Boukhobza
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

Random forests is a widely used classification algorithm. It consists of a set of decision trees each of which is a classifier built on the basis of a random subset of the training data-set. In an environment where the memory work-space is low in comparison to the data-set size, when training a decision tree, a large proportion of the execution time is related to I/O operations. These are caused by data blocks transfers between the storage device and the memory work-space (in both directions). Our analysis of random forests training algorithms showed that there are two major issues : (1) Block Under-utilization: data blocks are poorly used when loaded into memory and have to be reloaded multiple times, meaning that the algorithm exhibits a poor spatial locality; (2) Data Over-read: the data-set is supposed to be fully loaded in memory whereas a large proportion of data are not effectively useful when building a decision tree. Our proposed solution is structured to address these two issues. First, we propose to reorganize the data-set in such a way to enhance spatial locality and second, to remove the assumption that the data-set is entirely loaded into memory and access data only when effectively needed. Our experiments show that this method made it possible to reduce random forest building time by 51 to 95% in comparison to a state-of-the-art method.

Original languageEnglish
Pages (from-to)1595-1609
Number of pages15
JournalIEEE Transactions on Computers
Volume72
Issue number6
DOIs
StatePublished - 1 Jun 2023

Keywords

  • I/O accesses reduction
  • Random forests
  • embedded systems
  • memory hierarchy

Fingerprint

Dive into the research topics of 'Accelerating Random Forest on Memory-Constrained Devices Through Data Storage Optimization'. Together they form a unique fingerprint.

Cite this