RaFIO: A random forest I/O-aware algorithm

Camélia Slimani, Chun Feng Wu, Yuan Hao Chang, Stéphane Rubini, Jalil Boukhobza

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Random Forest based classification is a widely used Machine Learning algorithm. Training a random forest consists of building several decision trees that classify elements of the input dataset according to their features. This process is memory intensive. When datasets are larger than the available memory, the number of I/O operations grows significantly, causing a dramatic performance drop. Our experiments showed that, for a dataset that is 8 times larger than the available memory workspace, training a random forest is 25 times slower than the case when the dataset can fit in memory. In this paper, we revisit the tree building algorithm to optimize the performance for the datasets larger than the memory workspace. The proposed strategy aims at reducing the number of I/O operations by smartly taking benefit from the temporal locality exhibited by the random forest building algorithm. Experiments showed that our method reduced the execution time of the tree building by up to 90% and by 60% on average as compared to a state-of-the-art method, when the datasets are larger than the main memory workspace.

Original languageEnglish
Title of host publicationProceedings of the 36th Annual ACM Symposium on Applied Computing, SAC 2021
PublisherAssociation for Computing Machinery
Pages521-528
Number of pages8
ISBN (Electronic)9781450381048
DOIs
StatePublished - 22 Mar 2021
Event36th Annual ACM Symposium on Applied Computing, SAC 2021 - Virtual, Online, Korea, Republic of
Duration: 22 Mar 202126 Mar 2021

Publication series

NameProceedings of the ACM Symposium on Applied Computing

Conference

Conference36th Annual ACM Symposium on Applied Computing, SAC 2021
Country/TerritoryKorea, Republic of
CityVirtual, Online
Period22/03/2126/03/21

Fingerprint

Dive into the research topics of 'RaFIO: A random forest I/O-aware algorithm'. Together they form a unique fingerprint.

Cite this