DIACRITIC-AWARE ALIGNMENT AND CLASSIFICATION IN ARABIC SPEECH: A FUSION OF FUZTPI AND ML MODELS

Adel Sabour; Abdeltawab Hendawi; Mohamed Ali

doi:10.30829/jistech.v8i2.17951

DIACRITIC-AWARE ALIGNMENT AND CLASSIFICATION IN ARABIC SPEECH: A FUSION OF FUZTPI AND ML MODELS

Adel Sabour, Abdeltawab Hendawi, Mohamed Ali

Abstract

This paper presents the Quran Speech Recognition (QRSR) system, achieving alignment and classification accuracies up to 96%. The system is designed to advance Arabic Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) by focusing on the Arabic diacritic-annotated text. We address the limitations of existing Arabic ASR systems and introduce the Fuzzy Text Alignment and Rule-based Classifier (FTARC) for segmenting audio files and aligning text. The FuzTPI algorithm is integrated with Machine Learning models like Na¨ıve Bayes, Support Vector Machine, and Random Forest. This research aims to generalize the findings for broader Arabic text and contribute to an expanded audio dataset, thereby enhancing Arabic NLP and speech recognition capabilities.

Keywords

The Arabic Annotated text ; Machine Learning; Classification algorithms; Audio segmentation; Text-audio alignment; Speech Recognition;

Full Text:

PDF

References

Aboalnaser, S. A. (2019). Machine learning algorithms in arabic text classification: A review. In 2019 12th international conference on developments in esystems engineering (dese) (pp. 290–295).

Aldarmaki, H., & Ghannam, A. (2023). Diacritic recognition performance in arabic asr. arXiv preprint arXiv:2302.14022 .

Anguera, X., Perez, N., Urruela, A., & Oliver, N. (2011). Automatic synchronization of electronic and audio books via tts alignment and silence filtering. In 2011 ieee international conference on multimedia and expo (pp. 1–6).

Baer, T., & Kamalnath, V. (2017). Controlling machine-learning algorithms and their biases. McKinsey Insights.

Belete, D. M., & Huchaiah, M. D. (2022). Grid search in hyperparameter optimization of machine learning models for prediction of hiv/aids test results. International Journal of Computers and Applications, 44 (9), 875–886.

Bhogale, K., Raman, A., Javed, T., Doddapaneni, S., Kunchukuttan, A., Kumar, P., & Khapra, M. M. (2023). Effectiveness of mining audio and text pairs from public data for improving asr systems for low-resource languages. In Icassp 2023-2023 ieee international conference on acoustics, speech and signal processing (icassp) (pp. 1–5).

Chan, A. P., Chan, D. W., & Yeung, J. F. (2009). Overview of the application of “fuzzy techniques” in construction management research. Journal of construction engineering and management , 135 (11), 1241–1252.

Dean, D., Sridharan, S., Vogt, R., & Mason, M. (2010). The qut-noise-timit corpus for evaluation of voice activity detection algorithms. In Proceedings of the 11th annual conference of the international speech communication association (pp. 3110–3113).

Dutoit, T. (1997). An introduction to text-to-speech synthesis (Vol. 3). Springer Science & Business Media.

Gu, J., & Lu, S. (2021). An effective intrusion detection approach using svm with na¨ıve bayes feature embedding. Computers & Security, 103 , 102158.

Herrera-Viedma, E., Cabrerizo, F. J., Kacprzyk, J., & Pedrycz, W. (2014). A review of soft consensus models in a fuzzy environment. Information Fusion, 17 , 4–13.

Humayun, M. A., Yassin, H., & Abas, P. E. (2023). Dialect classification using acoustic and linguistic features in arabic speech. IAES International Journal of Artificial Intelligence, 12 (2), 739.

Islam, M. S., Jubayer, F. E. M., & Ahmed, S. I. (2017). A support vector machine mixed with tf-idf algorithm to categorize bengali document. In 2017 international conference on electrical, computer and communication engineering (ecce) (pp. 191–196).

Jiang, M., Liang, Y., Feng, X., Fan, X., Pei, Z., Xue, Y., & Guan, R. (2018). Text classification based on deep belief network and softmax regression. Neural Computing and Applications, 29 , 61–70.

Kim, S.-B., Han, K.-S., Rim, H.-C., & Myaeng, S. H. (2006). Some effective techniques for naive bayes text classification. IEEE transactions on knowledge and data engineering, 18 (11), 1457–1466.

Kostanyan, A. (2017). Fuzzy string matching with finite automat. In 2017 computer science and information technologies (csit) (p. 9-11). DOI: 10.1109/CSITechnol.2017.8312128

Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text classification algorithms: A survey. Information, 10 (4), 150.

Larbi, G. (2013). Voice search in the holy quran. In 2013 taibah university international conference on advances in information technology for the holy quran and its sciences (pp. 413–418).

Liew, C. S., Abbas, A., Jayaraman, P. P., Wah, T. Y., Khan, S. U., et al. (2016). Big data reduction methods: a survey. Data Science and Engineering , 1 (4), 265–284.

Liu, Z., Lv, X., Liu, K., & Shi, S. (2010). Study on svm compared with the other text classification methods. In 2010 second international workshop on education technology and computer science (Vol. 1, pp. 219–222).

Lokhande, N. N., Nehe, N. S., & Vikhe, P. S. (2012). Voice activity detection algorithm for speech recog- nition applications. In Ijca proceedings on international conference in computational intelligence (iccia2012), vol. iccia (Vol. 6, pp. 1–4).

Muhammad, W. M., Muhammad, R., Muhammad, A., & Martinez-Enriquez, A. (2010). Voice content matching system for quran readers. In 2010 ninth mexican international conference on artificial intelligence (pp. 148–153).

Qasim, H., & Abdulbaqi, H. A. (2022). Arabic speech recognition using deep learning methods: Literature review. In Aip conference proceedings (Vol. 2398, p. 050029).

Radzi, S. F. M., Karim, M. K. A., Saripan, M. I., Rahman, M. A. A., Isa, I. N. C., & Ibahim, M. J. (2021). Hyperparameter tuning and pipeline optimization via grid search method and tree-based automl in breast cancer prediction. Journal of Personalized Medicine, 11 (10), 978.

Ramırez, J., Segura, J. C., Benıtez, C., De La Torre, A., & Rubio, A. (2004). Efficient voice activity detection algorithms using long-term speech information. Speech communication, 42 (3-4), 271–287.

Singh, P. (2021). Deploy machine learning models to production. Cham, Switzerland: Springer .

Sun, Y., Li, Y., Zeng, Q., & Bian, Y. (2020). Application research of text classification based on random forest algorithm. In 2020 3rd international conference on advanced electronic materials, computers and software engineering (aemcse) (pp. 370–374).

Sundus, K., Al-Haj, F., & Hammo, B. (2019). A deep learning approach for arabic text classification. In 2019 2nd international conference on new trends in computing sciences (ictcs) (pp. 1–7).

Wahdan, A., Hantoobi, S., Salloum, S. A., & Shaalan, K. (2020). A systematic review of text classification research based on deep learning models in arabic language. Int. J. Electr. Comput. Eng, 10 (6), 6629–6643.

Xu, S., Li, Y., & Wang, Z. (2017). Bayesian multinomial na¨ıve bayes classifier to text classification. In Advanced multimedia and ubiquitous engineering: Mue/futuretech 2017 11 (pp. 347–352).

Yu, D., & Deng, L. (2016). Automatic speech recognition (Vol. 1). Springer.

Zhang, T., & Kuo, C.-C. J. (2001). Audio content analysis for online audiovisual data segmentation and classification. IEEE Transactions on speech and audio processing, 9 (4), 441–457.

DOI: http://dx.doi.org/10.30829/jistech.v8i2.17951

Refbacks

There are currently no refbacks.

Current Indexing

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Username
Password
Remember me