Implications of Age-Based Clustering for Survival and Relapse-Free Analysis in METABRIC Breast Cancer

Alif Azhari, Mauliddin Mauliddin

Abstract


Cox proportional hazards models are widely used for breast cancer survival analysis, but their validity is often limited by violations of the proportional hazard assumption. Machine learning techniques offer potential ways to improve model robustness, yet their combined use remains underexplored. This study aims to compare the proportional hazard assumptions fulfilment and the discriminatory ability of the models before and after age-based clustering. K-medoids was selected for its robustness to outliers. The results demonstrate that clustering significantly improved adherence to the proportional hazards assumption and increased the concordance index, indicating better predictive performance. Number of variables satisfying the assumption increased from 3 in the global model to 5–6 across clusters. Tumor size and positive lymph nodes consistently had a significant effect on all clusters for both survival time and relapse-free time. These findings suggest that age-based clustering can enhance the robustness and predictive performance of Cox models.


Full Text:

PDF

References


International Agency for Research on Cancer (IARC), “Global cancer observatory 2022,” 2022. Accessed: Sep. 19, 2025. [Online]. Available: https://gco.iarc.fr/today/en/fact-sheets-cancers

U. Adiga, S. Vasishta, A. J. Augustine, K. Farzia, E. Venkataravikanth, and L. Ravi, “Transforming breast cancer prediction: advanced machine learning models for accurate prediction and personalized care,” 2025. Accessed: Dec. 21, 2025. [Online]. Available: https://lifescienceglobal.com/pms/index.php/ijsmr/article/view/10575

Y. Li et al., “Integrated prognostic model for young breast cancer patients: insights from SEER, METABRIC, and TCGA databases,” Clin Breast Cancer, Jul. 2025, doi: 10.1016/j.clbc.2025.07.015.

N. R. Pradana Ratnasari, “Comparative study of k-mean, k-medoid and hierarchical clustering using data of tuberculosis indicators in Indonesia,” Indonesian Journal of Life Sciences, vol. 5, no. 2, pp. 9–20, Sep. 2023, doi: 10.54250/ijls.v5i02.181.

H. Thottathyl and K. K. Pavan, “Differential evolution model for identification of most influenced gene in breast cancer data,” Ingenierie des Systemes d’Information, vol. 27, no. 3, pp. 487–493, Jun. 2022, doi: 10.18280/isi.270316.

Y. Gu, M. Wang, Y. Gong, S. Jiang, C. Li, and D. Zhang, “Unveiling breast cancer risk profiles: a comprehensive survival clustering analysis empowered by an online web application for personalized medicine,” May 25, 2023. doi: 10.1101/2023.05.18.23290062.

Z. Zhu, M. Hoag, S. Julien, and S. Cui, “Estimating mortality of insured advanced-age population with Cox regression model,” 2002. Accessed: Oct. 15, 2025. [Online]. Available: https://www.bibsonomy.org/bibtex/043388ab9ec4eb8e48ff187755d72437

S. Berestizhevsky and T. Kolosova, “The Cox hazard model for claims data,” Variance: Advancing the Science of Risk, vol. 13, no. 2, pp. 265–278, Accessed: Oct. 20, 2025. [Online]. Available: https://www.yieldwise.com/Cox-Hazard-Model-Berestizhevsky-Kolosova%20Variance%20Journal.pdf

P. R. Kaukuntla, “Advancing life insurance pricing accuracy through mortality forecasting: a time-series and survival analysis approach,” International Journal of Multidisciplinary Research and Growth Evaluation, vol. 2, no. 1, pp. 729–734, 2021, doi: 10.54660/.ijmrge.2021.2.1.729-734.

L. Breiman, “Random forests,” Mach Learn, vol. 45, no. 1, pp. 5–32, Oct. 2001, doi: 10.1023/A:1010933404324.

D. J. Stekhoven and P. Bühlmann, “Missforest-non-parametric missing value imputation for mixed-type data,” Bioinformatics, vol. 28, no. 1, pp. 112–118, Jan. 2012, doi: 10.1093/bioinformatics/btr597.

E. Schubert and P. J. Rousseeuw, “Fast and eager k-medoids clustering: O(k) runtime improvement of the PAM, CLARA, and CLARANS algorithms,” Information System, vol. 101, Nov. 2021, doi: 10.1016/j.is.2021.101804.

A. V. Ushakov and I. Vasilyev, “Near-optimal large-scale k-medoids clustering,” Information Science, vol. 545, pp. 344–362, Feb. 2021, doi: 10.1016/j.ins.2020.08.121.

A. Sobrinho Campolina Martins, L. Ramos de Araujo, and D. Rosana Ribeiro Penido, “K-medoids clustering applications for high-dimensionality multiphase probabilistic power flow,” International Journal of Electrical Power and Energy Systems, vol. 157, Jun. 2024, doi: 10.1016/j.ijepes.2024.109861.

R. Klar, N. Arvidsson, and D. Rudmark, “Towards a new last-mile delivery system: cost and energy-optimized robot and van allocation,” Transportation Research Part E: Logistics and Transportation Review, vol. 204, Dec. 2025, doi: 10.1016/j.tre.2025.104392.

D. Hartama, W. Wanayumini, and I. S. Damanik, “Pengelompokan algoritma k-means dan k-medoid berdasarkan lokasi daerah rawan bencana di Indonesia dengan optimasi elbow, DBI, dan silhouette,” Building of Informatics, Technology and Science (BITS), vol. 6, no. 2, Sep. 2024, doi: 10.47065/bits.v6i2.5851.

Wildani Eko Nugroho, S. Dwi Kurniawan, Y. Febrian Sabanise, and P. Prayoga, “Use of the k-medoids algorithm for food clustering using nutritional value and evaluation of the elbow method and the Davies-Bouldin index method,” Ultima InfoSys : Jurnal Ilmu Sistem Informasi, vol. 16, no. 1, p. 33, Jun. 2025, doi: 10.31937/si.v16i1.4226.

K. Markhaba, T. Aizhan, A. Karlygash, Z. Zheniskul, and K. Indira, “Identification and characterization of earthquake clusters from seismic historical data,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 36, no. 3, pp. 1594–1604, Dec. 2024, doi: 10.11591/ijeecs.v36.i3.pp1594-1604.

E. T.Lee and J. Wenyu Wang, “Statistical Methods for Survival Data Analysis,” 3rd ed., Hoboken: John Wiley & Sons, Inc., 2003. doi: 10.1002/0471458546.fmatter.

Y. Farida, E. A. Maulida, L. N. Desinaini, W. D. Utami, and D. Yuliati, “Breast cancer survival analysis using Cox proportional hazard regression and Kaplan-Meier method,” Jurnal Teori dan Aplikasi Matematika, vol. 5, no. 2, pp. 340–358, Oct. 2021, doi: 10.31764/jtam.v5i2.4653.

M. S. Molydah S and D. Danardono, “An additive subdistribution hazards model for competing risks data,” Media Statistika, vol. 16, no. 2, pp. 194–205, May 2024, doi: 10.14710/medstat.16.2.194-205.

A. Alabdallah, M. Ohlsson, S. Pashami, and T. Rögnvaldsson, “The concordance index decomposition: a measure for a deeper understanding of survival prediction models,” Artificial Intelligence in Medicine, vol. 148, Feb. 2024, doi: 10.1016/j.artmed.2024.102781.

T. Therneau and E. Atkinson, “Concordance,” 2024.

E. Longato, M. Vettoretti, and B. Di Camillo, “A practical perspective on the concordance index for the evaluation and selection of prognostic time-to-event models,” Journal of Biomed Inform, vol. 108, Aug. 2020, doi: 10.1016/j.jbi.2020.103496.

Faisal Siddiq and Mohammad Dokhi, “Survival analisis durasi menganggur angkatan kerja disabilitas yang mengalami berhenti bekerja akibat pandemi Covid-19,” Jurnal Statistika dan Aplikasinya, vol. 6, no. 2, pp. 326–340, Dec. 2022, doi: 10.21009/JSA.06217.

M. S. Khan et al., “Statistical non-significance, likelihood ratio, and the interpretation of clinical trial evidence: insights from heart failure randomized trials,” Journal of Cardiac Failure, vol. 30, no. 12, pp. 1629–1632, Dec. 2024, doi: 10.1016/j.cardfail.2024.07.026.

K. D. Deane, L. Van Hoovels, V. E. Joy, N. Olschowka, and X. Bossuyt, “From autoantibody test results to decision making: incorporating likelihood ratios within medical practice,” Autoimmunity Reviews, vol.23, May 01, 2024. doi: 10.1016/j.autrev.2024.103537.

A. Basu, A. Ghosh, A. Mandal, N. Martín, and L. Pardo, “A Wald-type test statistic for testing linear hypothesis in logistic regression models based on minimum density power divergence estimator,” Electron J Statist, vol. 11, no. 2, pp. 2741–2772, 2017, doi: 10.1214/17-EJS1295.

cBioPortal, “Breast cancer (METABRIC, Nature 2012 & Nat Commun 2016),” cBioPortal For Cancer Genomics. Accessed: Sep. 15, 2025. [Online]. Available: https://www.cbioportal.org/study/summary?id=brca_metabric

Christina Curtis et al., “The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups,” Nature, vol. 486, pp. 346–352, Apr. 2012, doi: https://doi.org/10.1038/nature10983.

B. Pereira et al., “The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes,” Nat Commun, vol. 7, May 2016, doi: 10.1038/ncomms11479.




DOI: http://dx.doi.org/10.30829/zero.v9i3.26030

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Publisher :
Department of Mathematics
Faculty of Science and Technology
Universitas Islam Negeri Sumatera Utara Medan
📱 WhatsApp:085270009767 (Admin Official)
SINTA 2 Google Scholar CrossRef Garuda DOAJ