Modeling the survival of colorectal cancer patients based on colonoscopic features in a feature ensemble vision transformer

Chung Ming Lo, Yi Wen Yang, Jen Kou Lin, Tzu Chen Lin, Wei Shone Chen, Shung Haur Yang, Shih Ching Chang, Huann Sheng Wang, Yuan Tzu Lan, Hung Hsin Lin, Sheng Chieh Huang, Hou Hsuan Cheng, Jeng Kai Jiang, Chun Chi Lin*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

10 Scopus citations


The prognosis of patients with colorectal cancer (CRC) mostly relies on the classic tumor node metastasis (TNM) staging classification. A more accurate and convenient prediction model would provide a better prognosis and assist in treatment. From May 2014 to December 2017, patients who underwent an operation for CRC were enrolled. The proposed feature ensemble vision transformer (FEViT) used ensemble classifiers to benefit the combinations of relevant colonoscopy features from the pretrained vision transformer and clinical features, including sex, age, family history of CRC, and tumor location, to establish the prognostic model. A total of 1729 colonoscopy images were enrolled in the current retrospective study. For the prediction of patient survival, FEViT achieved an accuracy of 94 % with an area under the receiver operating characteristic curve of 0.93, which was better than the TNM staging classification (90 %, 0.83) in the experiment. FEViT reduced the limited receptive field and gradient disappearance in the conventional convolutional neural network and was a relatively effective and efficient procedure. The promising accuracy of FEViT in modeling survival makes the prognosis of CRC patients more predictable and practical.

Original languageEnglish
Article number102242
JournalComputerized Medical Imaging and Graphics
StatePublished - Jul 2023


  • Colon cancer
  • Colonoscopy
  • Prognosis
  • Vision transformer


Dive into the research topics of 'Modeling the survival of colorectal cancer patients based on colonoscopic features in a feature ensemble vision transformer'. Together they form a unique fingerprint.

Cite this