An NLP Pipeline for Bangla Text Understanding and Linguistic Data Processing: Framework and Implementation

Ahmad Galib

doi:10.9734/ajl2c/2026/v9i1297

An NLP Pipeline for Bangla Text Understanding and Linguistic Data Processing: Framework and Implementation

Full Article - PDF Review History Discussion

Published: 2026-01-19

DOI: 10.9734/ajl2c/2026/v9i1297

Page: 17-38

Issue: 2026 - Volume 9 [Issue 1]

Ahmad Galib *

Research & Development Department, Panjeree Publications Ltd., Bangladesh and Jahangirnagar University, Dhaka, Bangladesh.

*Author to whom correspondence should be addressed.

Abstract

Raw text is the most prevalent form of human language in digital and electronic formats. This research proposes a comprehensive Bangla language processing framework that transitions raw data into structured data and value-added information through clearly defined annotation guidelines. Unlike existing fragmented approaches, this work offers a unified treatment of corpus development and annotation. It specifically detailed each processing phase and its input-output specifications. The pipeline focuses primarily on the text-understanding components and integrated essential tasks such as Parts of Speech (PoS) tagging, parsing and Named Entity Recognition (NER) etc. Moreover, to establish the semantic state of their linguistic inputs, the framework includes coreference resolution and word sense disambiguation. This end-to-end pipeline is designed for several different uses, including high-precision sentiment analysis, automated content moderation, and developing gold standard datasets that can be used in advanced Bangla NLP research.

Keywords: Bangla text processing system, linguistic corpus, annotated text, NLP pipeline

How to Cite

Galib, Ahmad. 2026. “An NLP Pipeline for Bangla Text Understanding and Linguistic Data Processing: Framework and Implementation”. Asian Journal of Language, Literature and Culture Studies 9 (1):17-38. https://doi.org/10.9734/ajl2c/2026/v9i1297.

Downloads

Download data is not yet available.