Part-of-speech tags are word classes or syntactic categories of words. They carry important information about words, their neighbours and how they relate to each other. Other important information carried by part-of-speech is the possible morphological affixes for a given word. Part-of-speech tagging is an important task in natural language processing. In this blog post we’ll discuss the NELRALEC tagset compiled under the Bhasa Sanchar or NELRALEC project.
In linguistics, words can be categorized into two categories: closed and open classes. Closed class includes part-of-speech labels in which it is possible list all the word-forms belonging to that label such as postposition, pronoun, conjunction, and interjection. In an open class such as noun, verb, adjective and adverb, exhaustive listing of the word-forms is not possible. The NELRALEC tagset contains 112 part-of-speech tags that has been compiled with reference various publications on Nepali grammar.
Nepali nouns inflect for number and cases but the inflected forms are not marked with separate part-of-speech labels. This is because the case and plural markers are separated from word and analyzed separately. While gender is also an occurrence in Nepali nouns, they are not included in the tags because they are not grammatical gender.
The NELRALEC tagset has two part-of-speech labels for nouns: common noun(NN) and proper noun(NP).
Nepali pronouns demonstrate lexical variations for person, number and grade of honorifics. NELRALEC tagset contains thirty-nine part-of-speech labels for pronouns.
There are eleven labels for lexical variations that Nepali pronouns show for grade of honorifics. Lexical variants of first person pronouns and reflexive pronouns take four different labels each.
Some pronouns are derived by adding -ai. When morpheme -ai is added to a possessive pronoun the category of the possessive pronoun changes. The newly formed pronouns are without number or gender agreement unlike the original possessive pronoun. The tagset has four different categories of such pronouns.
Pronoun determiners in Nepali are used at the start of noun phrase in order to reference or provide context to the noun phrase. The tagset contains sixteen labels for pronoun determiners, four of which are for marked and unmarked demonstratives. Similarly, marked and unmarked interrogatives, relatives and other general determiners take four labels each.
The question word के(ke) is a special case among the interrogative pronouns. के(ke) receives the question marker(QQ) label when it is not used to refer an unknown entity.
There are five part-of-speech labels for adjectives in the NELRALEC tagset. Two of the tags are for the gendered forms of the adjectives i.e. masculine and feminine adjectives.
There is one label for each of the following category of adjectives.
- Unmarked adjectives
- Adjectives with other agreements
- Comparative or superlative adjectives derived from Sanskrit
Nepali verbs are highly inflected. Also, compounding of two or more verbs is common in Nepali. When tagging a compound verb, the tagging model assigns tag to the verb on the basis of the last identifiable verb in the compound word.
There are twenty-nine tags for verbs in the NELRALEC tagset. Thirteen of the tags are used for different categories of non-finite verb forms, which lack marking for person. The remaining labels are used for finite verb forms.
Adverbs in Nepali are open class words but a subset of Nepali adverbs fall into a closed category. The closed category of adverbs is determiners, which are morphologically related to the pronoun-determiners.
The tagset contains four labels for adverbs, three of which are for adverb-determiners: demonstratives, interrogatives and relatives. One of the labels is for the open class adverbs.
A postposition occurs as an element of a word and not as an independent word. The postposition is separated from the word before tagging.
There are eight labels for postpositions in the NELRALEC tagset. Three of the labels are for the genitive postposition को(ko) and its inflected forms. There is one label for each of the following:
- Plural-collective postposition
- Ergative-instrumental postposition
- Accusative-dative postposition
- Other postpositions
- Unmarked genitive postposition derived using -ai
Numerals and Numeral Classifiers
The tagset contains five labels for Numerals. One of the labels is for cardinal numbers: Nepali digits and Devnagari number. The remaining four labels are for ordinal numbers, among which three are for marked ordinal numbers and one is for unmarked ordinal numbers.
There are four different categories for numeral classifiers in the tagset. Three of the categories are for different marked numeral classifiers and one is for the unmarked numeral classifiers.
Conjunctions can be of two types: coordinating and subordinating conjunctions. Subordinating conjunctions in Nepali can appear before or after the clause it subordinates.
There are three different labels for conjunctions in the NELRALEC tagset, two of which are for the subordinating conjunctions and one is for the coordinating conjunctions.
The NELRALEC tagset contains four different labels for different categories punctuations listed below.
- Punctuations that at the end of sentences
- Punctuations that in the middle of sentences
Particles and Interjections
Particles are a small closed class of uninflected word forms. The NELRALEC tagset contains one label for the particles.
Interjection is also a closed class part-of-speech for independent particles that function as reduced but syntactically complete sentences. There is one part-of-speech label for interjections in the tagset.
There are six labels in the NELRALEC tagset for non-Nepali words.
Table 1 lists all the part-of-speech labels defined in the NELRALEC tagset.
|Category||Category Definition||POS Tag|
|Pronouns||First Person Pronoun||PMX|
|First Person Possessive Pronoun with Masculine Agreement||PMXKM|
|First Person Possessive Pronoun with Feminine Agreement||PMXKF|
|First Person Possessive Pronoun with Other Agreement||PMXKO|
|Non-honorific Second Person Pronoun||PTN|
|Non-honorific Second Person Possessive Pronoun with Masculine Agreement||PTNKM|
|Non-honorific Second Person Possessive Pronoun with Feminine Agreement||PTNKF|
|Non-honorific Second Person Possessive Pronoun with Other Agreement||PTNKO|
|Medial-honorific Second Person Pronoun||PTM|
|Medial-honorific Second Person Possessive Pronoun with Masculine Agreement||PTMKM|
|Medial-honorific Second Person Possessive Pronoun with Feminine Agreement||PTMKF|
|Medial-honorific Second Person Possessive Pronoun with Other Agreement||PTMKO|
|High-honorific Second Person Pronoun||PTH|
|High-honorific Unspecified-person Pronoun||PXH|
|Royal-honorific Unspecified-person Pronoun||PXR|
|Possessive Reflexive Pronoun with Masculine Agreement||PRFKM|
|Possessive Reflexive Pronoun with Feminine Agreement||PRFKF|
|Possessive Reflexive Pronoun with Other Agreement||PRFKO|
|Pronouns Derived using -ai||First Person Possessive Pronoun without Agreement||PMXKX|
|Non-honorific Second Person Possessive Pronoun without Agreement||PTNKX|
|Medial-honorific Second Person Possessive Pronoun without Agreement||PTMKX|
|Possessive Reflexive Pronoun without Agreement||PRFKX|
|Pronoun Determiners||Masculine Demonstrative Determiner||DDM|
|Feminine Demonstrative Determiner||DDF|
|Other-agreement Demonstrative Determiner||DDO|
|Unmarked Demonstrative Determiner||DDX|
|Masculine Interrogative Determiner||DKM|
|Feminine Interrogative Determiner||DKF|
|Other-agreement Interrogative Determiner||DKO|
|Unmarked Interrogative Determiner||DKX|
|Masculine Relative Determiner||DJM|
|Feminine Relative Determiner||DJF|
|Other-agreement Relative Determiner||DJO|
|Unmarked Relative Determiner||DJX|
|Masculine General Determiner-pronoun||DGM|
|Feminine General Determiner-pronoun||DGF|
|Other-agreement General Determiner-pronoun||DGO|
|Unmarked General Determiner-pronoun||DGX|
|Question Marker||Question Marker|
|Sanskrit-derived Comparative or Superlative Adjective||JT|
|Masculine d-participle Verb||VDM|
|Feminine d-participle Verb||VDF|
|Other-agreement d-participle Verb||VDO|
|Unmarked d-participle Verb||VDX|
|Command-form Verb, Non-honorific||VCN|
|Command-form Verb, Mid-honorific||VCM|
|Command-form Verb, High-honorific||VCH|
|Subjunctive/Conditional e-form Verb||VS|
|First Person Singular Verb||VVMX1|
|First Person Plural Verb||VVMX2|
|Second Person Non-honorific Singular Verb||VVTN1|
|Second Person Plural(or Medial-honorific Singular) Verb||VVTX2|
|Third Person Non-honorific Singular Verb||VVYN1|
|Third Person Plural(or Medial-honorific Singular) Verb||VVYX2|
|Feminine Second Person Non-honorific Singular Verb||VVTN1F|
|Feminine Second Person Medial-honorific Singular Verb||VVTM1F|
|Feminine Third Person Non-honorific Singular Verb||VVYN1F|
|Feminine Third Person Medial-honorific Singular Verb||VVYM1F|
|First Person Singular Optative Verb||VOMX1|
|First Person Plural Optative Verb||VOMX2|
|Second Person Non-honorific Singular Optative Verb||VOTN1|
|Second Person Plural(or Medial-honorific Singular) Optative Verb||VOTX2|
|Third Person Non-honorific Singular Optative Verb||VOYN1|
|Third Person Plural(or Medial-honorific Singular) Optative Verb||VOYX2|
|Masculine Genitive Postposition||IKM|
|Feminine Genitive Postposition||IKF|
|Masculine Ordinal Number||MOM|
|Feminine Ordinal Number||MOF|
|Other-agreement Ordinal Number||MOO|
|Unmarked Ordinal Number||MOX|
|Numeral Classifiers||Masculine Numeral Classifier||MLM|
|Feminine Numeral Classifier||MLF|
|Other-agreement Numeral Classifier||MLO|
|Unmarked Numeral Classifier||MLX|
|Subordinating Conjunction appearing after the clause it subordinates||CSA|
|Subordinating Conjunction appearing before the clause it subordinates||CSB|
|Others||Foreign Word in Devnagari||FF|
|Foreign Word not in Devnagari||FS|
|Letter of the Alphabet||FZ|
|Null Tag||Null Tag||NULL|