Family Clustering Based On Y-Chromosome DNA Profile Using Unweighted Pair Group Method with Arithmetic Mean

Meira Parma Dewi, Nurtami Soedarsono

Abstract


Indonesia is a diverse nation composed of numerous ethnic groups, each with distinct physical and genetic characteristics. Genetic similarities within ethnic populations can be examined through DNA profiling, particularly by analyzing Short Tandem Repeat loci. In Indonesia, DNA profiling has been widely applied in forensic identification and paternity testing. This study focuses on classifying the Javanese population into sub-tribes based on STR profile similarities using divisive hierarchical clustering. The optimal number of clusters was determined by the smallest Sum of Squared Errors (SSE), with the lowest value of 72583.12and the highest Silhouatte coeffisien value is 0.78, yielding seven sub-tribe clusters. Subsequently, these sub-tribe clusters were further classified into family clusters using Y-chromosome STR (YSTR) data, which traces paternal lineage. The clustering process employed the Unweighted Pair Group Method with Arithmetic Mean, resulting in 21 family clusters. Compared to k-means clustering, divisive clustering produced sub-tribe clusters with more balanced population sizes. The establishment of sub-tribe and family clusters enhances the efficiency of individual identification, as DNA profile matching can be performed at the cluster level rather than across the entire population. This approach provides a more systematic framework for forensic applications and victim identification, particularly in cases involving male individuals where YSTR data is critical.


Keywords


STR DNA Profile; Y-Chromosome; Unweighted Pair Group Method With Aritmetic Group;

References


M.S. Anggreainy and M.R. Widyanto and B.H. Widjaja and N. Soedarsono, “Gaussian fuzzy number for STR-DNA similarity calculation involving familial and tribal relationships”, Adv. Bioinformatics. vol. 2018. Available: https://doi.org/10.1155/2018/8602513

M.P. Dewi and A. M. Arymurthy and S. Setiawan and N. Soedarsono, “Triangular fuzzy number for similarity measurement of y-chromosome DNA profile”, Bulletin of Electrical Engineering and Informatics, 13(1), 519-528, 2024. Available : https://doi.org/10.11591/eei.v13i1.5304

M.P. Dewi, “Similarity measurement of human DNA profile using Apriori algorithm and fuzzy inference sysytem”, Disertation, 2025, University of Indonesia.

G. Qu and Y. Zihui and W. Huaming, “Clover: tree structure-based efficient DNA clustering for DNA-based data storage”, Briefings in Bioinformatics, Volume 23, Issue 5, 2022. Available : https://doi.org/10.1093/bib/bbac336

W. Goodwin, “Forensic DNA typing protocol second edition”, Springer Protocols, Humana Press, 2016

A.K. Jain and R.C. Dubes, “Algorithms for clustering data”, Michigan State University, 1988

M.B, John, “Foresic DNA typing biology, technology and genetic of STR marker 2nd Edition”, Elsevier Academic Press, 2005

A. Rastogi, “DNA clustering made more efficient”, Nat Comput Sci 2, 558, 2022, https://doi.org/10.1038/s43588-022-00330-0

P.A. Millán and F. Alipour and K.A. Hill and L. Kari, “DeLUCS: Deep learning for unsupervised clustering of DNA sequences”, PLoS ONE 17(1), 2022, Available: https://doi.org/10.1371/journal.pone.0261531

B.T, James and B.B. Luczak and H.Z. Girgis, “MeShClust: an intelligent tool for clustering DNA sequences”, Nucleic acids research, 46(14), e83, 2018, https://doi.org/10.1093/nar/gky315

Y. A. Badr and K. T. Wassif and M. Othman, "Automatic clustering of DNA sequences with intelligent techniques," in IEEE Access, vol. 9, pp. 140686-140699, 2021, doi: 10.1109/ACCESS.2021.3119560.

M.R. Karim and O. Beyan and A. Zappa and I.G. Costa and D. Rebholz-Schuhmann and M. Cochez, “Deep learning-based clustering approaches for bioinformatics”, Briefings in Bioinformatics, 22(1), 393–415. 2021 Available : https://doi.org/10.1093/bib/bbz170

J. Liu, et al, "The construction and application of a new 17-plex Y-STR system using universal fluorescent PCR," International Journal of Legal Medicine, vol. 134, (6), pp. 2015-2027, 2020. DOI: 10.1007/s00414-020-02291-3

V.K. Malhotra and H. Kaur and M.A. Alam, “An analysis of fuzzy clustering methods”, Int. J. Comput. Appl, vol. 94, no. 19, pp. 9–12, 2014

M. P. Dewi, A. M. Arymurthy, S. Setiawan and N. Soedarsono, "Human DNA Profile Identification Using DNA Database System," 2022 8th International Conference on Science and Technology (ICST), Yogyakarta, Indonesia, 2022, pp. 1-4, doi: 10.1109/ICST56971.2022.10136305.

A. Rastogi, “DNA clustering made more efficient”, Nat Comput Sci 2, 558. https://doi.org/10.1038/s43588-022-00330-0

C.M. Ruitberg and D.J. Reeder and J.M. Butler, “STRBase: a short tandem repeat DNA database human identity testing database”, Nucleic Acid Research, Vol 29, No.1, 2001

Y. Wang and T. Geng and E. Silva and J. -L. Gaudiot, “Hierarchical heterogeneous cluster systems for scalable distributed deep learning”, IEEE 27th International Symposium on Real-Time Distributed Computing (ISORC), Tunis, Tunisia ,1-6, 2024. doi: 10.1109/ISORC61049.2024.10551324

E.U. Oti and M.O. Olusola, “Overview of agglomerative hierachical clustering methods”, British journal of computer Networking and Inf. Tech, vol.7 (2), 2024. Available: https://doi.org/10.52589/BJCNIT-CV9POOGW

V. Ramazanova and M. Sambetbayeva and A. Tokhmetov and Z.H. Lamasheva and S. Sandugash, “Application of agglomerative clustering for forming skill communities of job vacancies”, Вестник КазУТБ. 4. 10.58805/kazutb.v.4.25-646, DOI : 10.58805/kazutb.v.4.25-646.

N. Boyko and O.A. Tkachyk, “Hierarchical clustering algorithm for dendogram constuction and cluster counting”, Informatic And Math Method In Simulation, 13(1-2), 5-15, 2023, DOI: 10.15276/imms.v13.no1-2.5

T. Ardyanti and M.Furqon, “Implementation of the agglometarive hierarchical clustering method in ordering hijab product”, SinkrOn, 8(4), 2479-2489. DOI : 10.33395/sinkron.v8i4.14156.

H. Guan-Jie and H. Che-Lun and L. Chun-Yuan and W. Fu-Che & C. Yu-Wei and T.Chuan, “MGUPGMA: A fast UPGMA algorithm with multiple graphics processing units using NCCL.”, Evolutionary Bioinformatics, 2017, DOI: 13. 10.1177/1176934317734220.

I. Karim and H. Daud and N. Zainuddin and R. Sokkalingam and A. Abdussamad and A. Azad and M. Iqbal and M. Zafar and A. Ullah and M. Elahi and A. A.Suleiman and Ahmad, “Exploring K-means clustering efficiency: accuracy and computational time across multiple datasets”, Journal of Advanced Research in Applied Sciences and Engineering Technology, 65. 1-13. 2024, DOI: 10.37934/araset.65.1.113.

F. Haque, “Emission trajectories in the trade context: a comprehensive machine learning approach using K-means and ARIMA”, Jour of Sustanable Engineering And Reneweble Energy, 2025, https://doi.org/10.54536/jsere.v1i1.4747

S. Sivasankari and S. Sukumaran, “Improved hierarchical agglomerative clustering based multivariate gaussian outlier detection for education data”, 4th International Conference on Ubiquitous Computing and Intelligent Information Systems (ICUIS), 264-273, DOI: 10.1109/ICUIS64676.2024.10866956

A. L. Akash and Sudharshan and Akshay, “A multi-factor machine learning framework for predicting and profiling student academic performance using behavioral, financial, and wearable data”, MethodsX, 15. 2025, DOI: 103673. 10.1016/j.mex.2025.103673.

S. Sivasankari, “Fuzzy membership partition based effective hierarchical agglomerative flat clustering method for high dimensional data”, Communications on Applied Nonlinear Analysis, 31, 538-553, 2024, DOI: 10.52783/cana.v31.1242.

M. Zubair and M.A. Iqbal and A. Shil, “An improved K-means clustering algorithm towards an efficient data-driven modeling”, Ann. Data. Sci., 11, 1525–1544, 2024, https://doi.org/10.1007/s40745-022-00428-2

A. M. Ikotun and A. E. Ezugwu and L. Abualigah and B. Abuhaija and Jia Heming, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data”, Inf. Sci. 622, 178–210, 2023, https://doi.org/10.1016/j.ins.2022.11.139




DOI: http://dx.doi.org/10.30829/zero.v10i1.28505

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Publisher :
Department of Mathematics
Faculty of Science and Technology
Universitas Islam Negeri Sumatera Utara Medan
Email: zero_journal@uinsu.ac.id
WhatsApp: 085270009767 (Admin Official)
SINTA 2 Google Scholar CrossRef Garuda DOAJ