Nepali NLP Group conducts research and development activities in the field of natural language processing. Our research combines findings from linguistics with methods in machine leaning to develop efficient algorithms to process texts in Nepali.

Broadly, we work in the following areas:

  • Nepali NLP, morphology, parsing

  • Information extraction, data mining

  • Text analytics, social medial analytics

  • Linguistics resource development: corpora, lexicons

  • Clustering Text Documents: TF-IDF Weighting

    by Ingroj Shrestha on Dec. 13, 2017

    This blog post is the first post in the series "Clustering Text Documents". In this blog post, we'll mathematically define the TF-IDF algorithm along with an example and its python implementation.

    TF-IDF ...

    Read More

    Tags: Data Mining , Information Retrieval , Machine Learning

    Negation in Nepali Verbs

    by Shreeya Singh Dhakal on Nov. 29, 2017

    Negation is used to express the opposite meaning of affirmative sentences. Negation in Nepali verbs takes place due to affixation(suffixation and prefixation). The negative case marker न(na) is either prefixed or suffixed with verb roots or verb forms ...

    Read More

    Tags: Nepali Grammar

    Approaches to Predicting Part-of-Speech Tags of Unknown Words

    by Ingroj Shrestha on Nov. 8, 2017

    One of the challenges faced by statistical part-of-speech taggers is the presence of words in test datasets that do not exist in the training dataset. Such words are called unknown words. In this ...

    Read More

    Tags: NLP , Pre Processing

    Part-of-Speech Tagging with Hidden Markov Models

    by Ingroj Shrestha on Oct. 5, 2017

    Part-of-Speech tagging is a common sequence-tagging problem in natural language processing. It is the process of assigning a single word class label to each token in the input sentence. For example, for input ...

    Read More

    Tags: NLP

    A New and Reduced Part-of-Speech Tagset for Nepali

    by Shreeya Singh Dhakal on Sept. 29, 2017

    A well chose tagset is very important in part-of-speech tagging. The NELRALEC tagset contains 112 tags, which large in number. Using such large tagset is not always efficient, especially in cases where there is a limited annotated data available. In ...

    Read More

    Tags: NLP