Welcome!

Nepali NLP Group conducts research and development activities in the field of natural language processing. Our research combines findings from linguistics with methods in machine leaning to develop efficient algorithms to process texts in Nepali.


Broadly, we work in the following areas:


  • Nepali NLP, morphology, parsing

  • Information extraction, data mining

  • Text analytics, social medial analytics

  • Linguistics resource development: corpora, lexicons




  • A Brief Overview of Natural Language Processing in Nepali

    by Ingroj Shrestha on July 5, 2017


    Natural language processing is an area of computer science concerned with programming machines to understand and analyse natural or human language. It is a set of techniques that lie at the intersection of Artificial Intelligence(AI) and Computational Linguistics(CL ...

    Read More

    Tag: NLP


    Part-of-Speech Tagging with Hidden Markov Models

    by Ingroj Shrestha on Oct. 5, 2017


    Part-of-Speech tagging is a common sequence-tagging problem in natural language processing. It is the process of assigning a single word class label to each token in the input sentence. For example, for input ...

    Read More

    Tag: NLP


    Approaches to Predicting Part-of-Speech Tags of Unknown Words

    by Ingroj Shrestha on Nov. 8, 2017


    One of the challenges faced by statistical part-of-speech taggers is the presence of words in test datasets that do not exist in the training dataset. Such words are called unknown words. In this ...

    Read More

    Tag: NLP , Pre Processing


    Nepali Texts Tokenization

    by Shreeya Singh Dhakal on July 16, 2017


    Tokenization is generally the first step in text analysis applications. It is the process of splitting the given string into units, called tokens. A token is a sequence of character, usually word or sentence that is semantically significant for text ...

    Read More

    Tag: NLP , Pre Processing


    Iterative Rule-based Stemming in Nepali

    by Ingroj Shrestha on Aug. 25, 2017


    Nepali, being a highly inflectional and derivational language, a single word can represent various grammatical forms and meanings. For example a verb root लेख्(lekh) can show different forms such as: लेख्छु(lekh-chu), लेख्छस्(lekh-chas ...

    Read More

    Tag: Text Analysis , NLP , Pre Processing


    NELRALEC Tagset: A Part-of-speech Tagset for Nepali Language

    by Shreeya Singh Dhakal on Sept. 22, 2017


    Part-of-speech tags are word classes or syntactic categories of words. They carry important information about words, their neighbours and how they relate to each other. Other important information carried by part-of-speech is the possible morphological affixes for a given word ...

    Read More

    Tag: NLP


    A New and Reduced Part-of-Speech Tagset for Nepali

    by Shreeya Singh Dhakal on Sept. 29, 2017


    A well chose tagset is very important in part-of-speech tagging. The NELRALEC tagset contains 112 tags, which large in number. Using such large tagset is not always efficient, especially in cases where there is a limited annotated data available. In ...

    Read More

    Tag: NLP