by Shreeya Singh Dhakal on Sept. 22, 2017


Part-of-speech tags are word classes or syntactic categories of words. They carry important information about words, their neighbours and how they relate to each other. Other important information carried by part-of-speech is the possible morphological affixes for a given word. Part-of-speech tagging is an important task in natural language processing. In this blog post we'll discuss the NELRALEC tagset compiled under the Bhasa Sanchar or NELRALEC project.


In linguistics, words can be categorized into two categories: closed and open classes. Closed class includes part-of-speech labels in which it is possible list all the word-forms belonging to that label such as postposition, pronoun, conjunction, and interjection. In an open class such as noun, verb, adjective and adverb, exhaustive listing of the word-forms is not possible. The NELRALEC tagset contains 112 part-of-speech tags that has been compiled with reference various publications on Nepali grammar.


Nouns


Nepali nouns inflect for number and cases but the inflected forms are not marked with separate part-of-speech labels. This is because the case and plural markers are separated from word and analyzed separately. While gender is also an occurrence in Nepali nouns, they are not included in the tags because they are not grammatical gender.


The NELRALEC tagset has two part-of-speech labels for nouns: common noun(NN) and proper noun(NP).


Pronouns


Nepali pronouns demonstrate lexical variations for person, number and grade of honorifics. NELRALEC tagset contains thirty-nine part-of-speech labels for pronouns.


There are eleven labels for lexical variations that Nepali pronouns show for grade of honorifics. Lexical variants of first person pronouns and reflexive pronouns take four different labels each.

Some pronouns are derived by adding -ai. When morpheme -ai is added to a possessive pronoun the category of the possessive pronoun changes. The newly formed pronouns are without number or gender agreement unlike the original possessive pronoun. The tagset has four different categories of such pronouns.


Pronoun determiners in Nepali are used at the start of noun phrase in order to reference or provide context to the noun phrase. The tagset contains sixteen labels for pronoun determiners, four of which are for marked and unmarked demonstratives. Similarly, marked and unmarked interrogatives, relatives and other general determiners take four labels each.


The question word के(ke) is a special case among the interrogative pronouns. के(ke) receives the question marker(QQ) label when it is not used to refer an unknown entity.


Adjectives


There are five part-of-speech labels for adjectives in the NELRALEC tagset. Two of the tags are for the gendered forms of the adjectives i.e. masculine and feminine adjectives.


There is one label for each of the following category of adjectives.

  • Unmarked adjectives

  • Adjectives with other agreements

  • Comparative or superlative adjectives derived from Sanskrit


  • Verbs


    Nepali verbs are highly inflected. Also, compounding of two or more verbs is common in Nepali. When tagging a compound verb, the tagging model assigns tag to the verb on the basis of the last identifiable verb in the compound word.


    There are twenty-nine tags for verbs in the NELRALEC tagset. Thirteen of the tags are used for different categories of non-finite verb forms, which lack marking for person. The remaining labels are used for finite verb forms.


    Adverbs


    Adverbs in Nepali are open class words but a subset of Nepali adverbs fall into a closed category. The closed category of adverbs is determiners, which are morphologically related to the pronoun-determiners.


    The tagset contains four labels for adverbs, three of which are for adverb-determiners: demonstratives, interrogatives and relatives. One of the labels is for the open class adverbs.


    Postpositions


    A postposition occurs as an element of a word and not as an independent word. The postposition is separated from the word before tagging.


    There are eight labels for postpositions in the NELRALEC tagset. Three of the labels are for the genitive postposition को(ko) and its inflected forms. There is one label for each of the following:

  • Plural-collective postposition

  • Ergative-instrumental postposition

  • Accusative-dative postposition

  • Other postpositions

  • Unmarked genitive postposition derived using -ai


  • Numerals and Numeral Classifiers


    The tagset contains five labels for Numerals. One of the labels is for cardinal numbers: Nepali digits and Devnagari number. The remaining four labels are for ordinal numbers, among which three are for marked ordinal numbers and one is for unmarked ordinal numbers.


    There are four different categories for numeral classifiers in the tagset. Three of the categories are for different marked numeral classifiers and one is for the unmarked numeral classifiers.


    Conjunctions


    Conjunctions can be of two types: coordinating and subordinating conjunctions. Subordinating conjunctions in Nepali can appear before or after the clause it subordinates.


    There are three different labels for conjunctions in the NELRALEC tagset, two of which are for the subordinating conjunctions and one is for the coordinating conjunctions.


    Punctuations


    The NELRALEC tagset contains four different labels for different categories punctuations listed below.

  • Punctuations that at the end of sentences

  • Punctuations that in the middle of sentences

  • Quotations

  • Brackets


  • Particles and Interjections


    Particles are a small closed class of uninflected word forms. The NELRALEC tagset contains one label for the particles.


    Interjection is also a closed class part-of-speech for independent particles that function as reduced but syntactically complete sentences. There is one part-of-speech label for interjections in the tagset.


    There are six labels in the NELRALEC tagset for non-Nepali words.


    Table 1 lists all the part-of-speech labels defined in the NELRALEC tagset.


    Table 1: NELRALEC Tagset

    Category

    Category Definition

    POS Tag

    Noun

    Common Noun

    NN

    Proper Noun

    NP

    Pronouns

    First Person Pronoun

    PMX

    First Person Possessive Pronoun with Masculine Agreement

    PMXKM

    First Person Possessive Pronoun with Feminine Agreement

    PMXKF

    First Person Possessive Pronoun with Other Agreement

    PMXKO

    Non-honorific Second Person Pronoun

    PTN

    Non-honorific Second Person Possessive Pronoun with Masculine Agreement

    PTNKM

    Non-honorific Second Person Possessive Pronoun with Feminine Agreement

    PTNKF

    Non-honorific Second Person Possessive Pronoun with Other Agreement

    PTNKO

    Medial-honorific Second Person Pronoun

    PTM

    Medial-honorific Second Person Possessive Pronoun with Masculine Agreement

    PTMKM

    Medial-honorific Second Person Possessive Pronoun with Feminine Agreement

    PTMKF

    Medial-honorific Second Person Possessive Pronoun with Other Agreement

    PTMKO

    High-honorific Second Person Pronoun

    PTH

    High-honorific Unspecified-person Pronoun

    PXH

    Royal-honorific Unspecified-person Pronoun

    PXR

    Reflexive Pronoun

    PRF

    Possessive Reflexive Pronoun with Masculine Agreement

    PRFKM

    Possessive Reflexive Pronoun with Feminine Agreement

    PRFKF

    Possessive Reflexive Pronoun with Other Agreement

    PRFKO

    Pronouns Derived using -ai

    First Person Possessive Pronoun without Agreement

    PMXKX

    Non-honorific Second Person Possessive Pronoun without Agreement

    PTNKX

    Medial-honorific Second Person Possessive Pronoun without Agreement

    PTMKX

    Possessive Reflexive Pronoun without Agreement

    PRFKX

    Pronoun Determiners

    Masculine Demonstrative Determiner

    DDM

    Feminine Demonstrative Determiner

    DDF

    Other-agreement Demonstrative Determiner

    DDO

    Unmarked Demonstrative Determiner

    DDX

    Masculine Interrogative Determiner

    DKM

    Feminine Interrogative Determiner

    DKF

    Other-agreement Interrogative Determiner

    DKO

    Unmarked Interrogative Determiner

    DKX

    Masculine Relative Determiner

    DJM

    Feminine Relative Determiner

    DJF

    Other-agreement Relative Determiner

    DJO

    Unmarked Relative Determiner

    DJX

    Masculine General Determiner-pronoun

    DGM

    Feminine General Determiner-pronoun

    DGF

    Other-agreement General Determiner-pronoun

    DGO

    Unmarked General Determiner-pronoun

    DGX

    Question Marker

    Question Marker

    QQ

    Adjective

    Masculine Adjective

    JM

    Feminine Adjective

    JF

    Other-agreement Adjective

    JO

    Unmarked Adjective

    JX

    Sanskrit-derived Comparative or Superlative Adjective

    JT

    Verb

    Infinitive Verb

    VI

    Masculine d-participle Verb

    VDM

    Feminine d-participle Verb

    VDF

    Other-agreement d-participle Verb

    VDO

    Unmarked d-participle Verb

    VDX

    e(ko)-participle Verb

    VE

    ne-participle Verb

    VN

    Sequential Participle-converb

    VQ

    Command-form Verb, Non-honorific

    VCN

    Command-form Verb, Mid-honorific

    VCM

    Command-form Verb, High-honorific

    VCH

    Subjunctive/Conditional e-form Verb

    VS

    i-form Verb

    VR

    First Person Singular Verb

    VVMX1

    First Person Plural Verb

    VVMX2

    Second Person Non-honorific Singular Verb

    VVTN1

    Second Person Plural(or Medial-honorific Singular) Verb

    VVTX2

    Third Person Non-honorific Singular Verb

    VVYN1

    Third Person Plural(or Medial-honorific Singular) Verb

    VVYX2

    Feminine Second Person Non-honorific Singular Verb

    VVTN1F

    Feminine Second Person Medial-honorific Singular Verb

    VVTM1F

    Feminine Third Person Non-honorific Singular Verb

    VVYN1F

    Feminine Third Person Medial-honorific Singular Verb

    VVYM1F

    First Person Singular Optative Verb

    VOMX1

    First Person Plural Optative Verb

    VOMX2

    Second Person Non-honorific Singular Optative Verb

    VOTN1

    Second Person Plural(or Medial-honorific Singular) Optative Verb

    VOTX2

    Third Person Non-honorific Singular Optative Verb

    VOYN1

    Third Person Plural(or Medial-honorific Singular) Optative Verb

    VOYX2

    Adverb

    Adverb

    RR

    Demonstrative Adverb

    RD

    Interrogative Adverb

    RK

    Relative Adverb

    RJ

    Postposition

    Postposition

    II

    Plural-collective Postposition

    IH

    Ergative-instrumental Postposition

    IE

    Accusative-dative Postposition

    IA

    Masculine Genitive Postposition

    IKM

    Feminine Genitive Postposition

    IKF

    Other-agreement Genitive

    IKO

    Numerals

    Cardinal Number

    CD

    Masculine Ordinal Number

    MOM

    Feminine Ordinal Number

    MOF

    Other-agreement Ordinal Number

    MOO

    Unmarked Ordinal Number

    MOX

    Numeral Classifiers

    Masculine Numeral Classifier

    MLM

    Feminine Numeral Classifier

    MLF

    Other-agreement Numeral Classifier

    MLO

    Unmarked Numeral Classifier

    MLX

    Conjunctions

    Coordinating Conjunction

    CC

    Subordinating Conjunction appearing after the clause it subordinates

    CSA

    Subordinating Conjunction appearing before the clause it subordinates

    CSB

    Punctuation

    Sentence-final Punctuation

    YF

    Sentence-medial Punctuation

    YM

    Quotation Marks

    YQ

    Brackets

    YB

    Particle

    Particle

    TT

    Interjection

    Interjection

    UU

    Others

    Foreign Word in Devnagari

    FF

    Foreign Word not in Devnagari

    FS

    Abbreviation

    FB

    Mathematical Formula

    FO

    Letter of the Alphabet

    FZ

    Unclassifiable

    UU

    Null Tag

    Null Tag

    NULL


    References


  • Nelralec/Bhasha Sanchar Working Paper 2



  • Tags: NLP