by Shreeya Singh Dhakal on Sept. 29, 2017


A well chose tagset is very important in part-of-speech tagging. The NELRALEC tagset contains 112 tags, which large in number. Using such large tagset is not always efficient, especially in cases where there is a limited annotated data available. In this blog post we'll discuss a new tagset for Nepali, which is a reduced version of the NELRALEC tagset.


The reduced tagset is designed to eliminate the error that the part-of-speech tagger makes due to sparseness of annotated data.


In the new tagset:

  • All personal pronouns are grouped together and distinctions in grades of honorifics are not considered.

  • All possessive pronouns are grouped together and distinctions in grade of honorifics, gender and other inflectional forms is not considered.

  • Sixteen different tags for pronoun determiners and three different tags for adverb determiners are re-grouped into three new tag groups.

  • The distinctions for inflected forms such as gender, number, honorifics and person are not considered for verbs.

  • All the inflected forms of adjectives, ordinal number, numeral classifiers and genitive postpositions are grouped together.

  • Different categories of foreign words are grouped together.

  • All subordination conjunctions are given a single label.


  • Table 1: Reduced Tagset for Nepali

    Category

    Category Definition

    POS Tag

    NELRALEC Tags

    Noun

    Common Noun

    NN

    NN

    Proper Noun

    NP

    NP

    Pronouns

    Personal Pronoun

    PP

    PMX, PTN, PTM, PTH, PXH, PXR

    Possessive Pronoun

    PPP

    PMXKM, PMXKF, PMXKO, PTNKM, PTNKF, PTNKO, PTMKM, PTMKF, PTMKO, PRFKM, PRFKF, PRFKO, PMXKX, PTNKX, PTMKX, PRFKX

    Reflexive Pronoun

    PRF

    PRF

    Determiner

    Marked

    DTM

    DDM, DDF, DKM, DKF, DJM, DJF, DGM, DGF, DDO, DKO, DJO, DGO

    Unmarked

    DTX

    DDX, DKX, DJX, DGX

    Others

    DTO

    RD, RK, RJ

    Verb

    Finite Verbs

    VF

    VVMX1, VVMX2, VVTN1, VVTX2, VVYN1, VVYX2, VVTN1F, VVTM1F, VVYN1F, VVYM1F, VOMX1, VOMX2, VOTN1, VOTX2, VOYN1, aVOYX2

    Infinitive Verb

    VBI

    VI

    Prospective Verb

    VBN

    VN

    Aspect Verb

    VBKO

    VDM, VDF, VDO, VDX

    Others

    VBO

    VE, VQ, VCN, VCM, VCH, VS, VR

    Adjective

    Marked

    JJM

    JM, JF, JO

    Unmarked

    JJX

    JX

    Degree

    JJD

    JT

    Adverb

    Adverb

    RR

    RR

    Postposition

    Postposition

    II

    II

    Plural-collective Postposition

    IH

    IH

    Ergative-instrumental Postposition

    IE

    IE

    Accusative-dative Postposition

    IA

    IA

    Genitive Postposition

    IKO

    IKM, IKO, IKF, IKX

    Numerals

    Cardinal Number

    MM

    MM

    Marked Ordinal Number

    MOM

    MOM, MOF, MOO

    Unmarked Ordinal Number

    MOX

    MOX

    Classifier

    Marked

    MLM

    MLM, MLF, MLO

    Unmarked

    MLX

    MLX

    Conjuction

    Coordinating Conjunction

    CC

    CC

    Subordinating Conjunction

    CS

    CSA, CSB

    Interjection

    Interjection

    UU

    UU

    Question Marker

    Question Marker

    QQ

    QQ

    Particle

    Particle

    TT

    TT

    Punctuation

    Sentence-final Punctuation

    YF

    YF

    Sentence-medial Punctuation

    YM

    YM

    Quotation Marks

    YQ

    YQ

    Brackets

    YB

    YB

    Foreign Word

    Foreign Word

    FW

    FF, FS, FO, FZ

    Unclassifiable

    Unclassifiable

    FU

    FU

    Abbreviation

    Abbreviation

    FB

    FB

    Null Tag

    Null Tag

    NULL

    NULL



    Tags: NLP