Part-of-Speech Tagging with Hidden Markov Models

  Part-of-Speech tagging is a common sequence-tagging problem in natural language processing. It is the process of assigning a single word class label to each token in the input sentence. For example, for input: इराक सीमाबाट सेना हटाइने।, the output of the tagger is इराक-कुवेत/NN सीमा/NN बाट/II सेना/NN हटाइने/NN ।/YF.…

0 Comments

A New and Reduced Part-of-Speech Tagset for Nepali

  A well chose tagset is very important in part-of-speech tagging. The NELRALEC tagset contains 112 tags, which large in number. Using such large tagset is not always efficient, especially in cases where there is a limited annotated data available. In this blog post we'll discuss a new tagset for Nepali, which…

0 Comments

NELRALEC Tagset: A Part-of-speech Tagset for Nepali Language

  Part-of-speech tags are word classes or syntactic categories of words. They carry important information about words, their neighbours and how they relate to each other. Other important information carried by part-of-speech is the possible morphological affixes for a given word. Part-of-speech tagging is an important task in natural language…

0 Comments

Iterative Rule-based Stemming in Nepali

Nepali, being a highly inflectional and derivational language, a single word can represent various grammatical forms and meanings. For example a verb root लेख्(lekh) can show different forms such as: लेख्छु(lekh-chu), लेख्छस्(lekh-chas), लेखछेस्(lekh-ches), लेख्छ(lekh-cha), लेखी(lekh-i), लेख्यो(lekh-yo), लेखे(lekh-e). Stemming is the process of reducing inflectional(or sometimes derivational) forms of words to…

0 Comments

Nepali Texts Tokenization

Tokenization is generally the first step in text analysis applications. It is the process of splitting the given string into units, called tokens. A token is a sequence of character, usually word or sentence that is semantically significant for text analysis. Tokenization is a language specific task, for instance, language…

0 Comments

A Brief Overview of Natural Language Processing in Nepali

Natural language processing is an area of computer science concerned with programming machines to understand and analyse natural or human language. It is a set of techniques that lie at the intersection of Artificial Intelligence(AI) and Computational Linguistics(CL) as shown in the figure below. It involves a wide range of…

0 Comments