Background: To facilitate the investigation of the pathogenic roles played by various immune cells in complex tissues such as tumors, a few computational methods for deconvoluting bulk gene expression profiles to predict cell composition have been created. However, available methods were usually developed along with a set of reference gene expression profiles consisting of imbalanced replicates across different cell types. Therefore, the objective of this study was to create a new deconvolution method equipped with a new set of reference gene expression profiles that incorporate more microarray replicates of the immune cells that have been frequently implicated in the poor prognosis of cancers, such as T helper cells, regulatory T cells and macrophage M1/M2 cells. Methods: Our deconvolution method was developed by choosing ϵ-support vector regression (ϵ-SVR) as the core algorithm assigned with a loss function subject to the L1-norm penalty. To construct the reference gene expression signature matrix for regression, a subset of differentially expressed genes were chosen from 148 microarray-based gene expression profiles for 9 types of immune cells by using ANOVA and minimizing condition number. Agreement analyses including mean absolute percentage errors and Bland-Altman plots were carried out to compare the performances of our method and CIBERSORT. Results: In silico cell mixtures, simulated bulk tissues, and real human samples with known immune-cell fractions were used as the test datasets for benchmarking. Our method outperformed CIBERSORT in the benchmarks using in silico breast tissue-immune cell mixtures in the proportions of 30:70 and 50:50, and in the benchmark using 164 human PBMC samples. Our results suggest that the performance of our method was at least comparable to that of a state-of-the-art tool, CIBERSORT. Conclusions: We developed a new cell composition deconvolution method and the implementation was entirely based on the publicly available R and Python packages. In addition, we compiled a new set of reference gene expression profiles, which might allow for a more robust prediction of the immune cell fractions from the expression profiles of cell mixtures. The source code of our method could be downloaded from https://github.com/holiday01/deconvolution-to-estimate-immune-cell-subsets.
- Bulk gene expression profiles
- Immune cells