Aim To assess machine-learning models, their methodological quality, compare their performance, and highlight their limitations. Methods The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) recommendations were applied. Electronic databases Science Direct, MEDLINE through (PubMed, Google Scholar), EBSCO, ERIC, and CINAHL were searched for the period of January 2016 to September 2023. Using a pre-designed data extraction sheet, the review data were extracted. Big data, risk assessment, colorectal cancer, and artificial intelligence were the main terms. Results Fifteen studies were included. A total of 3,057,329 colorectal cancer (CRC) health records, including those of adult patients older than 18, were used to generate the results. The curve's area under the curve ranged from 0.704 to 0.976. Logistic regression, random forests, and colon flag were often employed techniques. Overall, these trials provide a considerable and accurate CRC risk prediction. Conclusion An up-to-date summary of recent research on the use of big data in CRC prediction was given. Future research can be facilitated by the review's identification of gaps in the literature. Missing data, a lack of external validation, and the diversity of machine learning algorithms are the current obstacles. Despite having a sound mathematical definition, area under the curve application depends on the modelling context.
Nusinovici S, Tham YC, Yan MY, Ting DS, Li J, Sabanayagam C, et al. Logistic regression was as good as machine learning for predicting major chronic diseases. JCE. 2020;122:56–69.
2.
Bragazzi NL, Dai H, Damiani G, Behzadifar M, Martini M, Wu J. How big data and artificial intelligence can help better manage the COVID-19 pandemic. IJERPH. 2020;17(3176).
3.
M EK, S H, I AA, H AZ, R AM. Digital disruption and big data in healthcare-opportunities and challenges. CEOR. 2022;563–74.
4.
Sammour F, Alkailani H, Sweis GJ, Sweis RJ, Maaitah W, Alashkar A. Forecasting demand in the residential construction industry using machine learning algorithms in Jordan. Constr Innov. 2023;
5.
Nwosu AC, Collins B, Mason S. Big data analysis to improve care for people living with serious illness: the potential to use new emerging technology in palliative care. Palliat Med. 2018;32:164–6.
6.
Yu C, Helwig EJ. The role of AI technology in prediction, diagnosis and treatment of colorectal cancer. Artif Intell Rev. 2022:1–21.
7.
Dlamini Z, Francies FZ, Hull R, Marima R. Artificial intelligence (AI) and big data in cancer and precision oncology. CSBJ. 2020;18:2300–11.
8.
Jones OT, Matin RN, Schaar M, Bhayankaram KP, Ranmuthu CK, Islam MS, et al. Artificial intelligence and machine learning algorithms for early detection of skin cancer in community and primary care settings: a systematic review. Lancet Digit Health. 2022;4:e466-e476.
9.
SH BH, MM A. Machine-learning Algorithms for Ischemic Heart Disease Prediction: A Systematic Review. Curr Cardiol Rev. 2023;19:87–99.
10.
Mangal S, Chaurasia A, Khajanchi A. Convolution neural networks for diagnosing colon and lung cancer histopathological images.
11.
Stefanicka-Wojtas D, Kurpas D. eHealth and mHealth in Chronic Diseases—identification of barriers, existing solutions, and promoters based on a survey of EU stakeholders involved in Regions4PerMed (H2020. J Pers Med. 2022;12(467).
12.
Connelly L. Logistic regression. Medsurg Nurs. 2020;29:353–4.
13.
Speiser JL, Miller ME, Tooze J, Ip E. A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst Appl. 2019;134:93–101.
14.
Biau G. Analysis of a random forests model. JMLR. 2012;13:1063–95.
15.
Costa VG, Pedreira CE. Recent advances in decision trees: An updated survey. Artif Intell Rev. 2023;56:4765–800.
16.
Wang Y, He X, Nie H, Zhou J, Cao P, Ou C. Application of artificial intelligence to the diagnosis and therapy of colorectal cancer. Am J Cancer Res. 2020;10(3575).
17.
Melo F. Area under the ROC Curve. Encyclopedia of systems biology. 2013;38–9.
18.
Muschelli J. ROC and AUC with a binary predictor: a potentially misleading metric. J Classif. 2020;37:696–708.
19.
Liu B, Udell M. Impact of accuracy on model interpretations. In: arXiv CS - Machine Learning 2020. p. 201109903.
20.
Lewandowska A, Rudzki G, Lewandowski T, Stryjkowska-Góra A, Rudzki S. Risk factors for the diagnosis of colorectal cancer. Cancer Control. 2022;29(10732748211056692).
21.
Sun Y, Fan X, Zhao J. Development of colorectal cancer detection and prediction based on gut microbiome big-data. Med Microecol. 2022;12(100053).
22.
Society AC. Colorectal cancer, early detection, diagnosis, and staging. 2023.
23.
Awad H, Abu-Shanab A, Hammad N, Atallah A, Abdulattif M. Demographic features of patients with colorectal carcinoma based on 14 years of experience at Jordan University Hospital. Ann Saudi Med. 2018;38:427–32.
24.
Essentials of Visceral Surgery: For Residents and Fellows. 2023.
25.
Bazira PJ. Anatomy of the caecum, appendix, and colon. Surgery (Oxford. 2022;41:1–6.
26.
Sharkas GF, Arqoub KH, Khader YS, Tarawneh MR, Nimri OFN, OF AZ, et al. Colorectal cancer in Jordan: survival rate and its related factors. J Oncol. 2017;
27.
Amarin JZ, Mansour R, Nimri OF, Al-Hussaini M. Incidence of cancer in adolescents and young adults in Jordan, 2000-2017. JCO Glob Oncol. 2021;7:934–46.
28.
Society AC. Colorectal cancer facts & figures 2020–2022. Atlanta Am Cancer Soc. 2020;66:1–41.
29.
Cervantes A, Adam R, Roselló S, Arnold D, Normanno N, Taïeb J, et al. Metastatic colorectal cancer: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up. Ann Oncol. 2023;34:10–32.
30.
Joloudari JH, Saadatfar H, Dehzangi A, Shamshirband S. Computer-aided decision-making for predicting liver disease using PSO-based optimized SVM with feature selection. IMU. 2019;17(100255).
31.
Ahmad M, Hani SHB, Sabra MA, Almahmoud O. Big data can help prepare nurses and improve patient outcomes by improving quality, safety, and outcomes. Front Nurs. 2023;10:241–8.
32.
Alboaneen D, Alqarni R, Alqahtani S, Alrashidi M, Alhuda R, Alyahyan E, et al. Predicting colorectal cancer using machine and deep learning algorithms: challenges and opportunities. BDCC. 2023;7(74).
33.
Price WN, Cohen IG. Privacy in the age of medical big data. Nat Med. 2019;25:37–43.
34.
Pastorino R, Vito C, Migliara G, Glocker K, Binenbaum I, Ricciardi W, et al. Benefits and challenges of Big Data in healthcare: an overview of the European initiatives. Eur J Public Health. 2019;29(Suppl 3):23–7.
35.
Sangaiah AK, Rezaei S, Javadpour A, Zhang W. Explainable AI in big data intelligence of community detection for digitalization e-healthcare services. Appl Soft Comput. 2023;136(110119).
36.
Srivastava D, Pandey H, Agarwal AK. Complex predictive analysis for health care: a comprehensive review. BEEI. 2023;12:521–31.
37.
Storick V, O’Herlihy A, Abdelhafeez S, Ahmed R, May P. Improving palliative care with machine learning and routine data: a rapid review. HRB Open Research. 2019;2.
38.
Ruiters S, Mombaerts I. Applications of three-dimensional printing in orbital diseases and disorders. Curr Opin Ophthalmol. 2019;30:372–9.
39.
Ahmad M, Hani SHB, Sabra MA, Almahmoud O. Big data can help prepare nurses and improve patient outcomes by improving quality, safety, and outcomes. Front Nurs. 2023;10:241–8.
40.
Knevel R, Liao KP. From real-world electronic health record data to real-world results using artificial intelligence. Ann Rheum Dis. 2023;82:306–11.
41.
Park H, Kang Y. AI-Big Data-Mobile System development of measuring nursing workloads using wearable device and real time location information. 2023.
42.
Oussous A, Benjelloun FZ, Lahcen AA, Belfkih S. Big Data technologies: A survey. J King Saud Univ - Comput Inf Sci. 2018;30:431–48.
43.
Morin L, Onwuteaka-Philipsen BD. The promise of big data for palliative and end-of-life care research. In. 2021;35:1638–40.
44.
Nartowt BJ, Hart GR, Muhammad W, Liang Y, Stark GF, Deng J. Robust machine learning for colorectal cancer risk prediction and stratification. Front Big Data. 2020;3(6).
45.
Zhang L, Zheng C, Li T, Xing L, Zeng H, Li T, et al. Building up a robust risk mathematical platform to predict colorectal cancer 2017. Complexity. 2017;
46.
Hani HSB, Ahmad MM. Large-scale data in health care: a concept analysis. Georgian Med News. 2022;325:33–6.
47.
Seow H, Tanuseputro P, Barbera L, Earle CC, Guthrie DM, Isenberg J, et al. Development and validation of a prediction model of poor performance status and severe symptoms over time in cancer patients (PROVIEW+. Palliat Med. 2021;35:1713–23.
48.
Agrawal R, Prabakaran S. Big data in digital healthcare: lessons learnt and recommendations for general practice. Heredity. 2020;124:525–34.
49.
Palanisamy V, Thirunavukarasu R. Implications of big data analytics in developing healthcare frameworks–A review. J King Saud Univ - Comput Inf Sci. 2019;31:415–25.
50.
Hani SB, Ahmad M. Effective prediction of mortality by heart disease among women in jordan using the Chi-Squared Automatic Interaction Detection Model: retrospective validation study. JMIR Cardio. 2023;7:e48795.
51.
Ahmad M, Alhalaiqa F, Subih M. Constructing and testing the psychometrics of an instrument to measure the attitudes, benefits, and threats associated with the use of Artificial Intelligence tools in higher education. JALT. 2023;6:114–20.
52.
Kanth P, Inadomi JM. Screening and prevention of colorectal cancer. BMJ. 2021;374.
53.
Sawicki T, Ruszkowska M, Danielewicz A, Niedźwiedzka E, Arłukowicz T, Przybyłowicz KE. A review of colorectal cancer in terms of epidemiology, risk factors, development, symptoms and diagnosis. Cancers. 2021;13(2025).
54.
W.H.O. WHO Report on Cancer: Setting Priorities, Investing Wisely and Providing Care for All. 2020.
55.
Siegel RL, Wagle NS, Cercek A, Smith RA, Jemal A. Colorectal cancer statistics 2023. CA Cancer J Clin. 2023;73:233–54.
56.
Xi Y, Xu P. Global colorectal cancer burden in 2020 and projections to 2040. Transl Oncol. 2021;14(101174).
57.
Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71:209–49.
58.
Hoogendoorn M, Szolovits P, Moons LM, Numans ME. Utilizing uncoded consultation notes from electronic medical records for predictive modeling of colorectal cancer. AIM. 2016;69:53–61.
59.
Virdee PS, Patnick J, Watkinson P, Holt T, Birks J. Full blood count trends for colorectal cancer detection in primary care: development and validation of a dynamic prediction model. Cancers. 2022;14(4779).
60.
Tsai PC, Lee TH, Kuo KC, Su FY, Lee TL, Marostica E, et al. Histopathology images predict multi-omics aberrations and prognoses in colorectal cancer patients. Nat Commun. 2023;14(2102).
61.
Tan L, Li H, Yu J, Zhou H, Wang Z, Niu Z, et al. Colorectal cancer lymph node metastasis prediction with weakly supervised transformer-based multi-instance learning. Med Biol Eng Comput. 2023;61:1565–80.
62.
Susič D, Syed-Abdul S, Dovgan E, Jonnagaddala J, Gradišek A. Artificial intelligence based personalized predictive survival among colorectal cancer patients. Comput Methods Programs Biomed. 2023;231(107435).
63.
Nakanishi R, Morooka KI, Omori K, Toyota S, Tanaka Y, Hasuda H, et al. Artificial intelligence-based prediction of recurrence after curative resection for colorectal cancer from digital pathological images. Ann Surg Oncol. 2023;30:3506–14.
64.
Liu C, Wang T, Yang J, Zhang J, Wei S, Guo Y, et al. Distant metastasis pattern and prognostic prediction model of colorectal cancer patients based on big data mining. Front Oncol. 2022;12(878805).
65.
Leonard G, South C, Balentine C, Porembka M, Mansour J, Wang S, et al. Machine learning improves prediction over logistic regression on resected colon cancer patients. J Surg Res. 2022;275:181–93.
66.
Lee E, Jung SY, Hwang HJ, Jung J. Patient-level cancer prediction models from a nationwide patient cohort: model development and validation.
67.
Picard E, Verschoor CP, Ma GW, Pawelec G. Relationships between immune landscapes, genetic subtypes and responses to immunotherapy in colorectal cancer. Front Immunol. 2020;11(369).
68.
Hornbrook MC, Goshen R, Choman E, O’Keeffe-Rosetti M, Kinar Y, Liles EG, et al. Early colorectal cancer detected by machine learning model using gender, age, and complete blood count data. Dig Dis Sci. 2017;62:2719–27.
69.
Hilsden RJ, Heitman SJ, Mizrahi B, Narod SA, Goshen R. Prediction of findings at screening colonoscopy using a machine learning algorithm based on complete blood counts (ColonFlag. PLoS One. 2018;13:e0207848.
70.
Gu Y, Duan B, Sha J, Zhang R, Fan J, Xu X, et al. Serum IgG N‐glycans enable early detection and early relapse prediction of colorectal cancer. Int J Cancer. 2023;152:536–47.
71.
L BC, V CP, S VP, MP C, GA F, V WF, et al. Machine learning for predicting survival of colorectal cancer patients. Sci Rep. 2023;13(8874).
72.
Birks J, Bankhead C, Holt TA, Fuller A, Patnick J. Evaluation of a prediction model for colorectal cancer: retrospective analysis of 2.5 million patient records. Cancer Med. 2017;6:2453–60.
73.
Aromataris E, Fernandez R, Godfrey CM, Holly C, Khalil H, Tungpunkom P. Summarizing systematic reviews: methodological development, conduct and reporting of an umbrella review approach. Int J Evid Based Healthc. 2015;13:132–40.
74.
Wolff RF, Moons KG, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med. 2019;170:51–8.
75.
Moher D, Liberati A, Tetzlaff J, Altman D. Preferred Reporting items for Systematic and Meta-Analysis (PRISMA) Group. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Int J Surg. 2010;8:336–41.
The statements, opinions and data contained in the journal are solely those of the individual authors and contributors and not of the publisher and the editor(s). We stay neutral with regard to jurisdictional claims in published maps and institutional affiliations.