Natural language processing is an area of computer science concerned with programming machines to understand and analyse natural or human language. It is a set of techniques that lie at the intersection of Artificial Intelligence(AI) and Computational Linguistics(CL) as shown in the figure below.
It involves a wide range of techniques that are used for automatic manipulation and understanding of data(text and speech) in natural languages. Linguistics analysis such as syntactic, semantic and pragmatic analysis provides a basis for research and development of NLP applications.
Natural Language Processing in Nepali
Nepali language processing research in Nepali started in the year 2005 with three organizations namely Madan Puraskar Pustakalaya(MPP), Central Department of Linguistics – Tribhuvan University(TU) and Department of Computer Science and Engineering – Kathmandu University(KU). The three organizations have collaborated on several language processing projects.
The Bhasa Sanchar or the NeLRaLEC(2005 – 2007) project was funded by Asia IT & C Programme of the European Commission. It was led by the MPP and in collaboration with the TU. The project was focused in the development of annotated corpus(NNC) and unitag for Nepali language.
Nepali National Corpus(NNC) , was developed under the Bhasa Sanchar project. NNC contains four different corpora:
Some other language processing projects include Dobhase(2005 – 2006), a rule-based system for English to Nepali translation system, Sambad(2006 – 2007), a research project with an objective to enable non-literate people to access computers and Nepali Spellchecker.
Clearly, a limited amount of work has been done in the field of natural language processing in Nepali. So, only a few standard toolkits or resources are available for NLP in Nepali. This is definitely a huge setback but the scenario is changing soon, especially when the number of people interested to work in this field is growing.
Watch this space to learn more on Nepali NLP.