jagomart
digital resources
picture1_Language Pdf 102337 | A00 2012


 124x       Filetype PDF       File size 0.66 MB       Source: aclanthology.org


File: Language Pdf 102337 | A00 2012
arabic morphology generation using a concatenative strategy violetta cavalli sforza abdelhadi soudi teruko mitamura carnegie technology computer science department language technologies education ecole nationale de l industrie institute 4615 forbes ...

icon picture PDF Filetype PDF | Posted on 23 Sep 2022 | 3 years ago
Partial capture of text on file.
                              Arabic Morphology Generation Using a Concatenative Strategy 
                                 Violetta Cavalli-Sforza                 Abdelhadi Soudi                      Teruko Mitamura 
                                  Carnegie Technology             Computer Science Department             Language Technologies 
                                        Education                 Ecole Nationale de L'Industrie                   Institute 
                                   4615 Forbes Avenue                        Minerale                   Carnegie Mellon University 
                                  Pittsburgh, PA, 15213                   Rabat, Morocco                    Pittsburgh, PA 15213 
                                   violetta@cs.cmu.edu                 asoudi@enim.ac.ma                     teruko @cs.cmu.edu 
                                                                                      the 2 tenses (perfect and imperfect), the 2 voices 
                                               Abstract                               (active  and  passive),  and  the  5  moods 
                             Arabic  inflectional  morphology  requires               (indicative, subjunctive, jussive, imperative and 
                             infixation,   prefixation    and     suffixation,        energetic). ~ The stem used in the conjugation of 
                             giving rise to a large space of morphological            the  verb  may differ depending  on  the person, 
                             variation.    In  this  paper  we  describe  an          number, gender, tense, mood, and the presence 
                             approach  to  reducing  the  complexity  of              of  certain  root  consonants.       Stem  changes 
                             Arabic     morphology       generation     using         combine with suffixes in the perfect indicative 
                             discrimination  trees  and  transformational             (e.g.,  katab-naa  'we  wrote',  kutib-a  'it  was 
                             rules.  By decoupling the problem of stem                written')  and  the  imperative  (e.g.  uktub-uu 
                             changes from that of prefixes and suffixes,              'write',  plural),  and  with  both  prefixes  and 
                             we  gain  a  significant  reduction  in  the             suffixes for the imperfect tense in the indicative, 
                             number  of  rules  required,  as  much  as  a            subjunctive, and jussive moods (e.g. ya-ktub-na 
                             factor of three  for  certain  verb  types.  We          'they  write,  feminine  plural')  and  in  the 
                             focus on hollow verbs but discuss the wider              energetic mood (e.g. ya-ktub-unna or ya-ktub-un 
                             applicability of the approach.                           'he certainly writes').  There are a  total  of 13 
                                                                                      person-number-gender combinations.          Distinct 
                                                                                      prefixes are used in the active and passive voices 
                           Introduction                                               in  the  imperfect,  although  in  most  cases  this 
                           Morphologically, Arabic is a non-concatenative             results  in  a  change in  the written form only if 
                           language.  The  basic  problem  with  generating           diacritic marks are used. 2 
                           Arabic verbal morphology is the large number of            Most  previous  computational  treatments  of 
                           variants  that must be generated.  Verbal  stems           Arabic  morphology  are  based  on  linguistic 
                           are based on triliteral or quadriliteral roots (3- or      models  that  describe  Arabic  in  a  non- 
                           4-radicals).  Stems are formed by a derivational           concatenative  way  and  focus  primarily  on 
                           combination of a  root morpheme and a  vowel               analysis.  Beesley (1991) describes a system that 
                           melody;  the  two  are  arranged  according  to            analyzes Arabic words based on Koskenniemi's 
                           canonical  patterns.        Roots     are   said   to 
                           interdigitate  with patterns  to  form stems.  For         1 The jussive is  used in  specific  constructions,  for 
                           example, the Arabic  stem  katab  (he  wrote) is           example,  negation  in  the  past  with  the  negative 
                           composed  of  the  morpheme  ktb  (notion  of              particle  tam  (e.g.,  tam  aktub  'I  didn't  write').  The 
                           writing) and the vowel melody morpheme 'a-a'.              energetic expresses corroboration  of an action taking 
                           The two are coordinated according to the pattern           place.  The indicative is common to both perfect and 
                           CVCVC (C=consonant, V=vowel).                              imperfect tenses, but the subjunctive  and the jussive 
                                                                                      are restricted  to the imperfect tense.  The imperative 
                           There are 15 triliteral patterns, of which at least        has a special form, and the energetic can be derived 
                           9  are  in  common  use,  and  4  much  rarer              from either the imperfect or the imperative. 
                           quadriliteral patterns.  All these patterns undergo        z  Diacritic  marks  are  used  in  Arabic  language 
                           some stem changes with respect to voweling in              textbooks and occasionally in regular texts to resolve 
                                                                                      ambiguous words (e.g. to mark a passive verb use). 
                                                                                86 
                         (1983)  two-level  morphology.          In  Beesley        To  illustrate  our  approach,  we  focus  on  a 
                         (1996) the system is reworked into a finite-state          particular  type  of verbs,  termed hollow  verbs, 
                         lexical  transducer  to  perform  analysis  and            and show how we integrate their treatment with 
                         generation.    In  two-level  systems,  the  lexical       that of more regular verbs.  We also discuss how 
                         level includes short vowels that are typically not         the approach can be extended to other classes of 
                         realized on the the surface level.  Kiraz (1994)           verbs and other parts of speech. 
                         presents  an  analysis  of  Arabic  morphology 
                         based  on  the  CV-,  moraic-,  and  affixational          1      Arabic Verbal Morphology 
                         models.  He  introduces  a  multi-tape  two-level          Verb roots in Arabic can be classified as shown 
                         model and  a  formalism where three tapes  are             in  Figure  1. 3  A  primary  distinction  is  made 
                         used  for  the  lexical  level  (root,  pattern,  and      between  weak  and  strong  verbs.  Weak  verbs 
                         vocalization) and one tape for the surface level.          have a  weak consonant ('w'  or  'y')  as  one or 
                         In  this  paper,  we  propose  a  computational            more of their radicals; strong verbs do not have 
                         approach that applies a concatenative treatment            any weak radicals. 
                         to Arabic morphology generation by separating              Strong  verbs  undergo  systematic  changes  in 
                         the  issue  of infixation  from  other  inflectional       stem voweling from the perfect to the imperfect. 
                         variations.   We  are  developing  an  Arabic              The  first  radical  vowel  disappears  in  the 
                         morphological      generator     using    MORPHE           imperfect.  Verbs whose middle radical vowel in 
                         (Leavitt, 1994), a tool for modeling morphology            the  perfect  is  'a'  can  change  it  to  'a'  (e.g., 
                         based  on  discrimination  trees  and  regular             qaTa'a  'he cut' -> yaqTa'u 'he cuts'), 4 'i'  (e.g., 
                         expressions.  MORPHE is part of a suite of tools           Daraba 'he hit' -> yaDribu 'he hits'), or 'u' (e.g., 
                         developed  at  the  Language  Technologies                 kataba  'he wrote' -> yaktubu 'he writes') in the 
                         Institute,  Carnegie  Mellon  University,  for             imperfect.  Verbs whose middle radical vowel in 
                         knowledge-based  machine  translation.         Large       the perfect is  'i'  can only change it to  'a'  (e.g., 
                         systems  for  MT  from  English  to  Spanish,              shariba 'he drank' -> yashrabu 'he drinks') or 'i' 
                         French, German, Portuguese and a prototype for             (e.g.,  Hasiba  'he  supposed'  ->  yaHsibu  'he 
                         Italian have already been developed.  Within this          supposes').  Verbs with middle radical vowel 'u' 
                         framework, we are exploring English to Arabic              in  the perfect do not change it in the imperfect 
                         translation    and     Arabic      generation     for      (e.g., Hasuna  'he was beautiful'  -> yaHsunu 'he 
                         pedagogical  purposes.       We  generate  Arabic          is beautiful').  For strong verbs, neither perfect 
                         words  including  short  vowels  and  diacritic            nor imperfect stems change with person, gender, 
                         marks, since they are pedagogically useful and             or number. 
                         can always be stripped before display. 
                         Our  approach  seeks  to  reduce  the  number  of          Hollow  verbs  are  those  with  a  weak  middle 
                         rules  for  generating  morphological  variants  of        radical. In both perfect and imperfect tenses, the 
                         Arabic verbs by breaking the problem into two              underlying stem is realized by two characteristic 
                         parts.  We observe that, with the exception of a           allomorphs, one short and one long, whose use 
                         few verb  types,  there  is  very  little  interaction     depends on the person, number and gender. 
                         between  stem  changes  and  the  processes  of 
                         prefixation  and  suffixation.  It  is  therefore          3  Grammars  of  Arabic  are  not  uniform  in  their 
                         possible to decouple, in large part, the problem           classification  of "hamzated" verbs,  verbs  containing 
                         of  stem  changes  from  that  of  prefixes  and           the glottal stop as one of the radicals (e.g. [sa?a[]  'to 
                         suffixes.  The gain is  a  significant reduction in        ask').  Wright  (1968)  includes  them as  weak verbs, 
                         the  size  number  of transformational  rules,  as         but Cowan (1964) doesn't.  Hamzated verbs change 
                         much as a factor of three for certain verb classes.        the written  'seat' of the hamza from 'alif' to  'waaw' 
                         This improves the space efficiency of the system           or 'yaa?', depending on the phonetic context. 
                         and its  maintainability by reducing duplication           4 In the Arabic  transcription  capital  letters  indicate 
                         of rules,  and  simplifies  the  rules  by  isolating      emphatic consonants;  'H' is the voiceless pharyngeal 
                         different types of changes.                                fricative ;  "'  the voiced pharyngeal fricative ; '?'  is 
                                                                                    the glottal stop  'hamza'. 
                                                                               87 
                                                                                              triliteral 
                                                                                                  I 
                                                            I 
                                                         strong                                                                      weak 
                                                                                                                                       I 
                                          ,                 I                 I                                      I                 [                 I 
                                       regular         hamzated            doubled                             weak initial      weak middle        weak final 
                                                                           radical                                radical           radical           radical 
                                                                                                               (assimilated)       (hollow)         (defective) 
                                                                                                   I                I                 I                  I 
                                                            I                                                                  I 
                                                          tense                                                              mood 
                                          I                 I                 ,                 ,               I              I               I           I 
                                         reterit       present           participle         indicative      imperative    subjunctive       jussive     energetic 
                                           ffect)      (imperfect) 
                                          '          I      I      I 
                                                  active         passive 
                                                       Figure 1: Classification of Arabic Verbal Roots and Mood Tense System 
                                  Hollow verbs fall into four classes:                                              Stem allomorphs : 
                                                                                                                       Perfect: -bi'- and -baa'- 
                                    .   Verbs of the pattern CawaC  or  CawuC                                          Imperfect: and  -bi'- and -bii'- 
                                        (e.g.  [Tawut]  'to  be  long'),  where  the 
                                        middle radical is  'w'.  Their characteristic                           .   Verbs of the pattern CayiC,  where middle 
                                        is  a  long  'uu'  between the  first  and  last                            radical is 'y'.  E.g., 
                                        radical in the imperfect.  E.g.,                                            From the underlying root [hayib]: 
                                        From the underlying root [zawar]:                                              haaba 'he feared' and yahaabu 'he fears' 
                                           zaara 'he visited' and yazuuru 'he visits'                               Stem allomorphs : 
                                        Stem allomorphs:                                                               Perfect: -bib- and-haab- 
                                           Perfect: -zur- and -zaar-                                                   Imperfect: -hab- and-haab- 
                                           Imperfect:-zur-  and-zuur- 
                                                                                                              In the relevant literature (e.g., Beesley,  1998; 
                                    .   Verbs of the pattern CawiC, where the                                 Kiraz,  1994),  verbs  belonging  to  the  above 
                                        middle radical is 'w'.  Their characteristic                          classes  are  all  assumed  to  have  the  pattern 
                                        is a long 'aa' between the first and last                             CVCVC.  The pattern does not show the verb 
                                        radical in the imperfect. E.g.,                                       conjugation  class  and  makes  it  difficult  to 
                                        From the underlying root [nawim]:                                     predict the type of stem allomorph to use. To 
                                           naama 'he slept and yanaamu 'he sleeps'                            avoid these problems, we keep information on 
                                        Stem aUomorphs :                                                      the middle radical and vowel in the base form 
                                           Perfect: -nirn- and -naam-                                         of the verb.  In generation, classes 2  and 4  of 
                                           Imperfect:-ham- and-naam-                                          the verb can be handled as one because they 
                                                                                                              have the same perfect and imperfect stemsP 
                                    .   Verbs  of  the  pattern  CayaC,  where  the                           5 The only exception is the passive participle. Verbs 
                                        middle radical is  'y'.  Their characteristic                         of classes  1  and  2  behave  the  same (e.g.  Class  1: 
                                        is a long 'ii' before the first and last radical                       [zawar]:  mazuwr          'visited';  Class  2  [nawil]  --) 
                                        in the imperfect. E.g.,                                               manuwt  'obtained'), as do verbs of classes 3 and 4 
                                        From the underlying root [baya" ]:                                    (e.g.  Class  3:  [baya']  --)  mabii"        'sold',  Class  4: 
                                           baa" a  'he sold' and yabii" u 'he sells'                          [hayib] --) mahiib  'feared'). 
                                                                                                     88 
                         We describe our approach to modeling  strong                   morphological  forms  in  the  language.        Each 
                         and    hollow     verbs    below,     following     a          node  in  the  tree  below  the  root  is  built  by 
                         description of the implementation framework.                   specifying  the  parent  of  the  node  and  the 
                                                                                        conjunction or disjunction of FVPs that define 
                         2      The MORPHE System                                       the  node.  Portions  of  the  Arabic  MFH  are 
                         MORPHE  (Leavitt,  1994)  is  a  tool  that                    shown in Figures 2-4. 
                         compiles  morphological  transformation  rules                 Transformational  Rules. A  rule  attached  to 
                         into either a  word parsing  program or a  word                each leaf node of the MFH effects the desired 
                         generation  program. 6  In  this  paper  we  will              morphological  transformations  for  that  node. 
                         focus on the use of MORPHE in generation.                      A  rule  consists  of  one  or  more  mutually 
                         Input  and  Output.  MORPHE's  output  is                      exclusive clauses.  The 'if' part of a clause is a 
                         simply  a  string.   Input  is  a  feature  structure          regular  expression  pattern,  which  is  matched 
                         (FS) which describes the item that MORPHE                      against the value of the feature ROOT (a string). 
                         must  transform.     A  FS  is  implemented  as  a             The 'then' part includes one or more operators, 
                         recursive Lisp list.  Each element of the FS is a              applied in the given order.  Operators  include 
                         feature-value pair (FVP),  where the value can                 addition, deletion, and replacement of prefixes, 
                         be  atomic  or  complex.  A  complex  value  is                infixes,  and  suffixes.      The  output  of  the 
                         itself a FS.  For example, the FS for generating               transformation  is the transformed ROOT string. 
                         the Arabic zurtu 'I visited' would be:                         An example of a rule attached to a node in the 
                                                                                        MFH is given in Section 3.1 below. 
                            ((ROOT "zawar")                                             Process Logic.  In generation, the MFH acts as 
                             (CAT V) (PAT CVCVC) (VOW HOL)                              a discrimination  network.  The specified FS is 
                             (TENSE PERF) (MOOD IND)                                    matched  against  the  features  defining  each 
                             (VOICE ACT)                                                subtree until  a  leaf is reached.  At that  point, 
                             (NI/MBER SG) (PERSON i)) 
                                                                                        MORPHE first checks  in  the  irregular  forms 
                         The choice of feature names and values, other                  lexicon for an entry indexed by the name of the 
                         than ROOT, which identifies the lexical item to                leaf node (i.e.,  the  MF)  and  the  value  of the 
                         be transformed, is entirely up to the user.  The               ROOT feature in the FS.  If an irregular form is 
                         FVPs in a FS come from one of two sources.                     not found, the transformation  rule  attached  to 
                         Static  features,  such  as  CAT  (part  of speech)            the leaf node is tried.  If  no rule is found or 
                         and  ROOT, come  from  the  syntactic  lexicon,                none  of  the  clauses  of  the  applicable  rule 
                         which,  in  addition to the base form of words,                match,  MORPHE returns  the  value  of ROOT 
                         can  contain  morphological  and  syntactic                    unchanged. 
                         features.  Dynamic features, such as TENSE and 
                         NUMBER, are set by MORPHE's caller.                            3      Handling            Arabic          Verbal 
                         The     Morphological         Form      Hierarchy.             Morphology in MORPHE 
                         MORPHE  is  based  on  the  notion  of  a                      Figure  2  sketches  the  basic  MFH  and  the 
                         morphological  form hierarchy  (MFH) or tree.                  division of the verb subtree into stem changes 
                         Each internal node of the tree specifies a piece               and prefix/suffix additions. 7 The inflected verb 
                         of  the  FS  that  is  common  to        that  entire          is  generated  in  two  steps.  MORPHE is  first 
                         subtree.  The root of the tree is a special node               called  with  the  feature  CHG set to STEM. The 
                         that  simply  binds  all  subtrees  together.    The           required  stem  is  returned  and  temporarily 
                         leaf nodes  of the  tree  correspond  to  distinct             substituted  for the  value of the  ROOT feature. 
                                                                                        7 The use of two parts of the same tree for the two 
                        6 MORPHE is  written  in  Common  Lisp  and  the                problems     is   a   constraint    of   MORPHE's 
                        compiled  MFH  and  transformation          rules  are          implementation, which  does  not  permit  multiple 
                        themselves a set of Common Lisp functions.                      trees with separate roots. 
                                                                                89 
The words contained in this file might help you see if this file matches what you are looking for:

...Arabic morphology generation using a concatenative strategy violetta cavalli sforza abdelhadi soudi teruko mitamura carnegie technology computer science department language technologies education ecole nationale de l industrie institute forbes avenue minerale mellon university pittsburgh pa rabat morocco cs cmu edu asoudi enim ac ma the tenses perfect and imperfect voices abstract active passive moods inflectional requires indicative subjunctive jussive imperative infixation prefixation suffixation energetic stem used in conjugation of giving rise to large space morphological verb may differ depending on person variation this paper we describe an number gender tense mood presence approach reducing complexity certain root consonants changes combine with suffixes discrimination trees transformational e g katab naa wrote kutib it was rules by decoupling problem written uktub uu from that prefixes write plural both gain significant reduction for required as much ya ktub na factor three typ...

no reviews yet
Please Login to review.