Welcome!

Nepali NLP Group conducts research and development activities in the field of natural language processing. Our research combines findings from linguistics with methods in machine leaning to develop efficient algorithms to process texts in Nepali.


Broadly, we work in the following areas:


  • Nepali NLP, morphology, parsing

  • Information extraction, data mining

  • Text analytics, social medial analytics

  • Linguistics resource development: corpora, lexicons




  • Approaches to Predicting Part-of-Speech Tags of Unknown Words

    by Ingroj Shrestha on Nov. 8, 2017


    One of the challenges faced by statistical part-of-speech taggers is the presence of words in test datasets that do not exist in the training dataset. Such words are called unknown words. In this ...

    Read More

    Tag: NLP , Pre Processing


    Nepali Texts Tokenization

    by Shreeya Singh Dhakal on July 16, 2017


    Tokenization is generally the first step in text analysis applications. It is the process of splitting the given string into units, called tokens. A token is a sequence of character, usually word or sentence that is semantically significant for text ...

    Read More

    Tag: NLP , Pre Processing


    Stop Words Removal(Nepali)

    by Shreeya Singh Dhakal on Aug. 13, 2017


    Removing stop words is a common and important practice when working with text analysis applications. So, what are stop words and why filter them out during pre-processing?


    Stop words are the words used in defining the structure of sentences. These ...

    Read More

    Tag: Text Analysis , Pre Processing


    Iterative Rule-based Stemming in Nepali

    by Ingroj Shrestha on Aug. 25, 2017


    Nepali, being a highly inflectional and derivational language, a single word can represent various grammatical forms and meanings. For example a verb root लेख्(lekh) can show different forms such as: लेख्छु(lekh-chu), लेख्छस्(lekh-chas ...

    Read More

    Tag: Text Analysis , NLP , Pre Processing