Saturday, March 16, 2024

Morphology Primer for the Kankanaey Language SPoken in Benguet Philippines

(Resharing an artivle that I write some years ago) Morphology Primer for the Kankanaey Language of Benguet Dalos D. Miguel Saint Louis University Baguio City ddmiguel@slu.edu.ph Abstract In order to spearhead a natural language processing (NLP) study that involves the Kankanaey language of the people in the Philippine province of Benguet, this paper establishes the morphology of words in the Kankanaey language. Affixation and reduplication are the morphological phenomena that are evident in Kankanaey. Infix, prefix, suffix and replacive affix strings are used in generating derivations and inflections in Kankanaey. It is expected that the data inferable from the study will be used in developing more encompassing NLP ventures. Keywords. Natural Language Processing, Kankanaey, Morphology, Affixes, Reduplication, Introduction Kankanaey is one of the indigenous natural languages spoken in the province of Benguet in the Philippines. Like the other natural languages (NL) of the globe, Kankanaey should be considered when furthering the field of natural language processing (NLP) or computational linguistics (CL). NLP or CL refers to the discipline of utilizing information technology(IT) to deal with languages. It suffices to say that the development of programs (software) as well as hardware components of a computer system for analyzing or synthesizing spoken or written languages (Jackson et al., 2002) is the ultimate goal of NLP. Inherent with NLP or CL are Lexical Analysis (LA) and Morphological Analysis (MA). The determination of the features of individual words included in a text is the concern of LA (Sproat, 2000). LA is usually performed as an initial step of a broader task, that is, to generate the semantics of a group of words or a document. LA is a broad process and MA is one element of LA that may be considered independently. The basic constituent of MA is morphology. Morphology is the study of word structure (Spencer, 1998). MA involves determining the morphemes composing a word. Simply put, the morphemes are the root word and the affixes applied to the root word. The classification or the part of speech (POS) under which a word belongs; noun, verb, adjective or otherwise, is another function of MA. There are two suggested MA schools for Philippine languages: Stem-based Morphological Analysis (SMA) and Root-based Morphological Analysis (RMA). Dr. Ricardo Ma. Nolasco (2007) of the Komisyon sa Wikang Filipino (KWF), a Philippine government agency that handles concerns pertaining to the Philippine languages, referred to the two approaches as maka-tangkay (SMA) and maka-ugat (RMA). Dr. Nolasco compared the mechanics of SMA and RMA using Filipino words. The commissioner recommended that SMA be explored because SMA seems to offer more possibilities than the RMA. To posit the essentials for studying the applicability of SMA for the Kankanaey language, this study reports the rudiments of the morphology of the Kankanaey language. Samples of Kankanaey words are gathered from actual speakers as well as from secondary sources and a set of morphological phenomena on the gathered words is inferred. Although Kankanaey is also spoken in some parts of Mountain Province, Ilocos Sur, La Union and Nueva Vizcaya, this initial study is focused on the Kankanaey speakers of Benguet. However, it is an understatement to say that the study is delimited on Benguet Kankanaey. It is observed that speaking Kankanaey is not unique for the province. There are variants in the enunciations and word semantics among the people from the municipalities of Atok, Bakun, Buguias, Itogon, Kapangan, and Sablan. In linguistics theory, regional variants of a language constitute dialects (Simons et al, 2008). Hence, dialects of the Kankanaey language exist in the relatively small province Benguet. Despite the recognition of Kankanaey dialects, this primer for NLP for Kankanaey makes no distinction on the dialects. It is expected that the differences among the dialects are minimal and that they will not hurt a single MA and NLP framework for the language. In addition, because no formal orthography ( manner of writing ) for the Kankanaey language has been defined, the orthography for Filipino is adopted. Therefore, the 28 letters defined for Filipino language (www.tagaloglang) are potential letters of Kankanaey words. The decision is acceptable because samples of Kankanaey text in the form of song lyrics and a translation of the Holy Bible from which some data for this study are taken are based on the Tagalog orthography. Besides, Kankanaey is one of the Filipino languages. Derivations and Inflections in Kankanaey Gregory T. Stump (1998) presented a description of Inflection as compared to Derivation while Robert Beard (1998) detailed the rudiments of Derivation vis-à-vis Inflection. While the highly analytical exposition of Stump and Beard cannot be replicated in this report, the following broad distinction of Inflection and Derivation is adequate. Inflection is what happens when a word that has the same classification (part of speech) as a root word is generated. The generation of an English present tense verb for a singular subject by suffixing an s to the verb for a plural subject, such as fears from fear, is an example of an inflection. The generation of the past tense verb talked from the present tense verb talk is also a case of inflection. The words fear and fears are both in the class of verbs and so are talk and talked. On the other hand, Derivation is what happens when a word that has a different meaning or classification as an originating word is derived. Forming the adjective malicious from the noun malice is an example of a derivation. Like other languages, the Kankanaey vocabulary is enriched by Derivations and Inflections. Cases believed to be realizations of Incorporation (Gerdst, 1998) and Clitics (Halpern, 1998) may be present in the Kankanaey language. Incorporation is the concatenation of a word such as a verb with another word such as a noun, pronoun, or adverb in order to realize a combined syntactic function. The Kankanaey word “kinanko” is the result of concatenating “kinan” (ate) and “ko” (I) and it results to a single word counterpart to the english statement ‘I ate it’. Similarly, the Kankanaey word “edwani” can mean ‘in the present time’ and it is formed out of “ed”(in) and “nuwani”(present). The disappearance of the “nu” syllable of “nuwani” when fused with “ed” seems to be a realization of clitic case. Compounding, as ilucidated in (Fabb, 1998), involves a sequence of two or more words but the sequence corresponds to a single meaning. The English word ‘Green House’ is an example of a compound. Some Kankanaey words correspond to an English compound. From the root word “talak” which translates to ‘car’, the Kankanaey word “taltalak” translates to ‘toy car’. In this report, no further elaboration on Incorporation, Clitics and Compounding will be done. Instead, Inflection and Derivation are given utmost attention. In Kankanaey, nouns and verbs are repositories of several semantics. In fact, one noun can be a root word for a number of words. For example, the word “gabyon” which is the Kankanaey term that refers to ‘any hand operated hoe’ that is normally used in the farm is a repository of about 50 words. Table 1 renders some words that are generated from “gabyon” that may be inflections or Table 1. Sample Inflections and Derivations from "gabyon" Kankanaey word Category English Translation Gabyon Noun Hoe, hand operated soil digging implement for a vegetable farm gabyonan Verb to use a hoe to dig a portion of an object(typically land area) gabyonen Verb will use a hoe to dig an object ginabgabyon Adjective referring to an object for which a hoe had been used ginabyon Adjective Referring to an object for which a hoe was used ginmabyon Verb did the act of using a hoe (the person talking is the object of the action) ginmanabyon Verb had been digging at the same time ( referring to the act many) gumabgabyon Verb doing the act of using a hoe ( the person talking is the object of the action) gumabyon Verb will do the act of using a hoe (the person talking is the object of the action) gumanabyon Verb doing the act of using a hoe at the same time (referring to the act of many) igabyon Verb will use the hoe ingabgabyon Verb had been using the hoe Ingabyon Adjective referring to the hoe used kagabyon Verb sudden act of digging using the hoe magabyon Adjective referring to an area for which the hoe will be used maigabyon Adjective referring to a hoe that will be used makigabgabyon Verb using a hoe alongside other actors makigabyon Verb will use a hoe alongside other actors makigagabyon Verb will attempt to use a hoe alongside other actors mangabgabyon Verb using the hoe mangabyon Verb to use a hoe/ will use a hoe nakigabyon Verb used a hoe alongside other actors nakigagabyon Verb attempted to use a hoe alongside others nangabgabyon Verb had been using a hoe nangabyon Verb used a hoe ingabgabyonan Adjective referring to a reason for a continuous act of using the hoe in the past ingabyonan Adjective referring to a person for whom the act of using the hoe was done ingabyonan Adjective referring to the income/proceeds of the act of using the hoe ingabyonan Adjective referring to the reason why the act of using a hoe was done pangabyon Adjective referring to an instrument for digging ( the hoe) Gabyonan Adjective referring to a place where a hoe is used pangabyonan Adjective Referring to place where a hoe will be used kagabgabyon Adjective referring to a place on which the act of using the hoe was recently done kagabyogabyon Adjective referring to a fast act doing work with a hoe kaigabyogabyon Adjective Referring to a speedy using of a hoe magabyon Adjective referring to a person who is capable of using a hoe efficiently magabyon Adjective referring to an area/object that is feasible for using a hoe makagabgabyon Adjective referring to the eagerness to use a hoe makagabyon Adjective referring to being a habitual user of the hoe makigabyonan Adjective referring to the place or other people with which using a hoe will be done Nagabyon Adjective referring to an object or area for which a hoe was used naigabyon Adjective referring to a hoe that was used nakigabgabyon Adjective referring to a person who had been a companion in using a hoe nakigabyon Adjective referring to a person who was a companion in using a hoe nakigabyonan Adjective referring to a place/person where/whom a hoe was used alongside others derivations and Table 2 shows additional word formations from the same. It must be noted that “gabyon” qualifies for a noun and each of the generated words belongs to a category that is not necessarily noun. Furthermore, every generated word has an associated meaning. Every generated word can function as an independent word. The suffixation of ko, mo, da, na, mi,yo, ka, ak, kayo, kami among others to a base word, which may be the root word, an inflected word or a derived word as illustrated in Table 2 is another orthographic assumption that can be made a standard practice. While the practice of writing “mo”, Table 2. Other words (Compound,Incorporations) generated from the root word "gabyon" Kankanaey word Category English Translation Gabgabyon Noun Small hoe / miniature hoe / toy hoe Mangabyonda verb + pronoun They will use a hoe Mangabyonka verb + pronoun you will use a hoe Mangabyonkami verb + pronoun we will use a hoe Mangabyonkayo verb + pronoun you(plural) will use a hoe Nangabyonak verb + pronoun I used a hoe Nangabyonda verb + pronoun They used a hoe Nangabyonka verb + pronoun You used a hoe Nangabyonkami verb + pronoun We used a hoe pangabyonda Noun+verb+pronoun hoe for them to use pangabyonko Noun+verb+pronoun hoe for me to use pangabyonmi Noun+verb+pronoun hoe for us to use pangabyonmo Noun+verb+pronoun hoe for your to use pangabyonna Noun+verb+pronoun hoe for him/her to use pangabyonyo Noun+verb+pronoun hoe for you(pl) to use Gabyonda Noun + pronoun their hoe Gabyonko Noun + pronoun my hoe Gabyonmi Noun + pronoun our hoe Gabyonmo Noun + pronoun your hoe Gabyonna Noun + pronoun his hoe/ her hoe Gabyonyo Noun + pronoun your (pl) hoe “kayo”, and “kami” as separate word, “ak” is always maintained as a suffix. It looks unnatural to treat “ak” as a separate word when it is only meaningful for its being a suffix an so are “mo”, “kami” , “kayo” and the others. If it were not for the purpose of identifying whether the word formation is inflectional or derivational, it is reasonable to consider “gabyon” as simply a root word. Afterall, the ‘noun-ness’ of “gabyon” does not contribute to the classification of a word constructed from it. In a way, the category (POS) of the generated word depends on the affixes used and not on the classification of the root word. However, tracking the POS of the root word may find relevance for a higher level NLP function such as machine translation. Besides, the POS of a root word implies the feasibility of subjecting such word to inflection or derivation. It is then a good idea to integrate the POS of the root words with the morphology backbone. Table 3 lists some words that may be formed out of the root word “ali”. The root word “ali” can translate to ‘bring’ or ‘come’. Once again, the table demonstrates the irrelevance of any classification to the root word. Whether “ali” is a verb or not, it does not matter when classifying any generated word out of “ali”. In general, verbs and adjectives may be generated from any root word by applying affixes to a root word as well as to already generated words. Such process is what Dr. Ricardo Nolasco (2007) advocated which he illustrated using the Tagalog language. Dr. Nolasco prescribed affixes for generating Tagalog verbs (verbal affixes) which are the infix –um-, the replacive affix ~m-, the suffix –in, the suffix –an and the prefix i-. Using the affixes –um- and ~m- results to intransitive verbs while using the –in, -an and i- results to transitive verbs. Dr. Nolasco also cited that are not intended for generating meaningful words- “panlaping pantangkay” (stemming affixes). Stemming affixes are used to produce constructs for further affixation. These affixes include pag-, pang-, paki-, paka-, and ka-. After a stem is constructed by applying any of pag-, pang-, paki-, paka-, and ka- to a root word, a verb may be generated by applying any of verbal affixes. To reiterate one of his examples, pag- is applied to the root “aral” to derive “pag-aral”. As is, “pag-aral” is meaningless but using the replacive affix ~m- results to the intransitive verb “mag-aral” which connotes ‘learning’. On the other hand, using the suffix –an to “pag-aral” results to the transitive verb “pag-aralan” which connotes ‘learning something-an object’. For Kankanaey, verbal affixes are also observed and among them are –um-, i-. However, the likes of stemming affixes seem not present. As illustrated in the examples summarized in Table 1, Table 2 and Table 3, all constructs generated by any affixation are meaningful words. It is not the case that some affixes have the sole purpose of preparing constructs for further affixation. While some constructs are formed for further affixation, every construct convey a sense. Furthermore, despite the conviction that some, if not all, verbs formed out of affixation are corresponding to adjectives, Kankanaey words constructed through affixation may not be verbs or adjectives. Most cases listed in Table 2 support this claim. Table 3. Word formations out of the root word “ali” which is taken to mean ‘come’ Kanakaney Word Category (POS) English Translation Ali root word to come inmal-ali verb-past perfect had come Inmali verb-simple past came Inmaliali verb-past continuous had always been coming kaal-ali Adjective referring to an object who/that came recently Kaali Adjective referring to an object who came immediately umal-ali verb-present continuous Coming Umali verb-simple present come Umali verb-simple future will come Umaliak verb + pronoun I come / I will come Umaliali Adjective referring to an object who is always coming Umalida verb + pronoun They will come Umalika verb + pronoun you will come Umalikami verb + pronoun we will come Umalikayo verb + pronoun you(pl) will come Umanali Adjective referring to many objects who are coming simultaneously Umalian Adjective referring to the time of coming Makiali verb-simple future will come as a companion Makialian Adjective referring to an object whom coming will be done with Nakiali verb-simple past came as a companion Nakialian Adjective referring to an object whom coming was done with maki-al-ali verb-present continuous coming along with other objects makialiali Adjective referring to an object who is always coming along with others makial-alian Adjective referring to an object that creates an opportunity to come along Kaanali Adjective referring to objects who suddenly came It is imperative to defy the approach of classifying affixes into verbal and stemming in the case of Kankanaey. It is sufficient to define a reference list of specific affixation cases and other morphological phenomena. The reference list shall serve as a look-up table when doing morphological analysis. The reference list is developed with the ideal goal that any word; inflection, derivation or otherwise, must be rendered its classification (POS) and/or translation. Table 4 shows the list of observed morphological phenomena for Kankanaey words. Table 5 shows the affixes used in the actual words gathered. Table 4. List of morphological phenomena for Kankanaey Infixation – application of infixes to root stems Prefixation – application of prefixes to root stems Suffixation – application of suffixes to root words Partial Reduplication – Repetition of some letters or syllables of a root stem Full Reduplication – Repetition of root stems Replacive Affixation – Replacing, insertiion, or deletion of a letter in the root stem Table 5. List of Affixes Infixes -in-, -n-, -um- Prefixes i-, ka-, m-, ma-, maika-, maka-, maki-, man-, manag-, manaka-, mang-, mas-, na-, naka-, naki-, naika-, nan-, ni-, pa-, paki-, pan-,taga-, um- Suffixes -ak, -an, -da, -en, -k, -ka, -kayo, -ko, -m, -mi, -mo, -yo, -sisya, -na (Addendum 1) Verb Derivation from Noun ( may be seen as incorporation verb + noun) Process effect Example Remarks Prefixation of man Form verb (actually an infinitive or a future tense) which means “using the object meant by the noun” a. mangabyonwill use hoe b. mangabyon to use hoe From the root word (noun) gabyon which means hoe, a personal property Prefixation of man Form verb (actually an infinitive or a future tense) which means “ having the object meant by the noun” a. manbeeyto have a house b. manbeeywill have a house From the root word (noun) beey which means house, a real property Infixation of um Form verb (actually an infinitive or a future tense) which means “using the object meant by the noun” a. gumabyon  will use hoe b. gumabyon  to farm From the root word (noun) gabyon which means hoe, a personal property Suffixation of en Form verb (actually an infinitive or a future tense) a. gabyonen  to use hoe b. gabyonen  will use hoe (addendum 2) Verb Inflection by prefixation Prefix effect Example Remarks man Form future tense verb Mangabyonwill farm From gabyonto farm nan Form past tense verb Nangabyon did farm From gabyonto farm i Form future tense verb ialiwill bring From infinitive verb alito bring, action has a thing as an object in Form past tense verb in-alibrought From infinitive verb alito bring, action has a thing as an object um Form future tense verb Umila  will see From infinitive verb ilato see, action has a person as an object Maybe used to future tense verbs that do not begin with um ( i.e. umiali) to incorporate the person speaking as the object of the future action of the root verb (iali). Umiali will bring for me/us inm Form past tense verb Inmila  saw = did see From infinitive verb ilato see, action has a person as an object Maybe used to future tense verbs that do not begin with um ( i.e. inmiali) to incorporate the person speaking as the object of the past action of the root verb (iali). inmiali brought for me/us (addendum 3) Reduplication generates present tense. Mangabyon  will farm as mangabgabyon  farming Umali  will come as umalali  coming Umila  will see me as umil-ila  seeing me Iali  will bring as ial-ali  bringing Table 6. Snapshot of the Kankanaey lexicon. root word Category (POS) English Translation ad-ado Adjective Many Akki Noun Monkey Alagey Verb to stand Asis Particle it is dirty. Ayasak Noun Whisper bang-et Verb to cook Baro Adjective New dad-an Verb to walk Dakami Pronoun we Dakayo Pronoun You Dakdake Adjective Big en conjunction And Etek Verb to lie Gabyon Noun Hoe Gipan Noun Knife Ipogaw/Ipugaw Noun Person Kanayon Adverb Always Kitkitoy Adjective Small ono conjunction Or The reference list of specific morphological phenomena functions with selected elements of a collection of words. For the purpose of this study, the collection is called the lexicon. Ideally, the lexicon includes all possible root words, their classifications and their translations. A table with the attributes local word, classification (part of speech), and English translation is adequate. It is the case that all nouns, verbs and some adverbs can serve the purpose of being a root word (a.k.a. root stem) for Inflections and Derivations. Then, the category (POS) attributed to every entry in the lexicon can become basis for doing stem-based morphology. Entries in the lexicon which are assigned with POS other than noun, verb or adverb should be considered final words and that they cannot undergo affixation. Such practice will allow a heuristics for handling words with complex structures. Derived words with complex structures can be collected and be made static entries in the lexicon with categories that will not allow affixation. Dr. Ricardo Nolasco (2007) classified words that cannot undergo affixation as particles. Such classification should be adopted. The collection of all possible root stems including all derived words with complex structure can only happen over time. The limited time allotted for conducting this study is just a small fraction of the ideal time requirement. It is advocated that an incremental lexicon be put in order. As such, an initial list of words that may be included in the conceptualized Kankanaey lexicon has been produced and is presented in another manuscript (Miguel, 2009a). Table 6 shows a snapshot of the lexicon that needs to be incremented as time go by. Surely, the initial lexicon is enough for making experiments leading to the realization of a Kankanaey-based NLP. Conclusions Infixation, prefixation, suffixation, reduplication and replacive affixation are the morphological phenomena for the Kankanaey language. The infixes, prefixes and suffixes can be defined as sets. Regular rules that support the morphological phenomena can be formulated. Future Directions To validate the plausibility of the defined affix sets and the formulated morphological rules for the Kankanaey language, computer-based systems using the same definitions and formulations will be developed. Furthermore, the same definitions and formulations will be performed for the Ibaloi, the Kalanguya and the other indigenous languages of the Cordilleras and its neighboring regions. References Nolasco, R. (2007). Si Maka-tangkay at si Maka-ugat: Dalawang Tagasuri ng Morpholohiyang Pilipino. In 4th National Natural Language Processing Research Symposium Proceedings. DLSU. Manila. Spencer A. and Zwicky A. (1998). The Handbook of Morphology. Blackwell Publishers. Stump, G.T. (1998). Inflection. In The Handbook of Morphology. Edited by Andrew Spencer and Arnold M. Zwicky. Blackwell Publishers. UK. Beard R. (1998). Derivation. In The Handbook of Morphology. Edited by Andrew Spencer and Arnold M. Zwicky. Blackwell Publishers. UK. Fabb N. (1998). Compounding. In The Handbook of Morphology. Edited by Andrew Spencer and Arnold M. Zwicky. Blackwell Publishers. UK. Gerdts, D.B. (1998). Incorporation. In The Handbook of Morphology. Edited by Andrew Spencer and Arnold M. Zwicky. Blackwell Publishers. UK. Halpern, A.L. (1998). Clitics. In The Handbook of Morphology. Edited by Andrew Spencer and Arnold M. Zwicky. Blackwell Publishers. UK. Miguel, D. (2009a). Natural Language Processing (NLP) Resources for Three Indigenous Languages in Benguet. Research Report for URG.CICS01. Saint Louis University. Modern Filipino Alphabet. http://www.tagaloglang.com/The-Philippines/Language/modern-filipino-alphabet.html. Date Accessed: June 8, 2009. Simons G.F. and S. Bird. (2008). Toward a Global Infrastructure for the Sustainability of Language Resources. Proceedings of the 22nd Pacific Asia Conference on Language, Information, and Computation. DLSU. Manila, Phils. Sproat, R. (2000). Lexical Analysis. In Handbook of Natural Language Processing editied by Robert Dale, Hermann Moisl and Harold Somers. Marcel Dekker. New York, USA.

No comments:

Post a Comment