A Transformer-Based Framework and Preliminary Baseline Experiment for Uzbek Text Classification in Low-Resource NLP

Shaxzoda Yusupova

Vol. 4 No. Maxsus son (ILMIY YUTUQLAR E'TIROFI) (2026), Articles

Vol. 4 No. Maxsus son (ILMIY YUTUQLAR E'TIROFI) (2026)

A Transformer-Based Framework and Preliminary Baseline Experiment for Uzbek Text Classification in Low-Resource NLP

Articles

Published 2026-05-15

Shaxzoda Yusupova⁺⁻

Shaxzoda Yusupova

Faculty of Business IT Tashkent University of Information Technologies named after Muhammad al-Khwarizmi Tashkent, Uzbekistan

PDF

DOI

Keywords

Natural Language Processing
Uzbek language
text classification
low-resource language

How to Cite

A Transformer-Based Framework and Preliminary Baseline Experiment for Uzbek Text Classification in Low-Resource NLP. (2026). "XXI ASRDA INNOVATSION TEXNOLOGIYALAR, FAN VA TAʼLIM TARAQQIYOTIDAGI DOLZARB MUAMMOLAR" Nomli Respublika Ilmiy-Amaliy Konferensiyasi, 4(Maxsus son (ILMIY YUTUQLAR E’TIROFI), 778-786. https://universalpublishings.com/index.php/itfttdm/article/view/18599

Abstract

Natural Language Processing is an important area of artificial intelligence, but many low-resource languages still lack sufficient datasets and optimized models. This paper presents a framework and preliminary baseline experiment for Uzbek text classification. The study focuses on text preprocessing, feature extraction, model selection, and evaluation. Two baseline models, TF-IDF with Logistic Regression and TF-IDF with Support Vector Machine, are used for comparison. The models are evaluated using accuracy, precision, recall, and F1-score. The proposed framework can support future Uzbek NLP applications in education, media, document classification, and automated text processing.

PDF

DOI

References

[1] A. Vaswani et al., “Attention Is All You Need,” in Advances in Neural Information Processing Systems, 2017.

[2] J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of NAACL-HLT, 2019.

[3] A. Conneau et al., “Unsupervised Cross-lingual Representation Learning at Scale,” in Proceedings of ACL, 2020.

[4] E. Kuriyozov, U. Salaev, S. Matlatipov, and G. Matlatipov, “Text Classification Dataset and Analysis for Uzbek Language,” arXiv preprint arXiv:2302.14494, 2023.

[5] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” arXiv preprint arXiv:1301.3781, 2013.

[6] Y. Goldberg, Neural Network Methods for Natural Language Processing. Morgan & Claypool Publishers, 2017.

A Transformer-Based Framework and Preliminary Baseline Experiment for Uzbek Text Classification in Low-Resource NLP

Keywords

How to Cite

Download Citation

Abstract

References