Iterative Rule-based Stemming in Nepali

Nepali, being a highly inflectional and derivational language, a single word can represent various grammatical forms and meanings. For example a verb root लेख्(lekh) can show different forms such as: लेख्छु(lekh-chu), लेख्छस्(lekh-chas), लेखछेस्(lekh-ches), लेख्छ(lekh-cha), लेखी(lekh-i), लेख्यो(lekh-yo), लेखे(lekh-e). Stemming is the process of reducing inflectional(or sometimes derivational) forms of words to…

0 Comments

Stop Words Removal(Nepali)

Removing stop words is a common and important practice when working with text analysis applications. So, what are stop words and why filter them out during pre-processing? Stop words are the words used in defining the structure of sentences. These are the most frequent words in a corpus but they do not…

0 Comments

Nepali Texts Tokenization

Tokenization is generally the first step in text analysis applications. It is the process of splitting the given string into units, called tokens. A token is a sequence of character, usually word or sentence that is semantically significant for text analysis. Tokenization is a language specific task, for instance, language…

0 Comments