jagomart
digital resources
picture1_Language Pdf 102850 | Elixir Acl


 128x       Filetype PDF       File size 0.21 MB       Source: ufal.mff.cuni.cz


File: Language Pdf 102850 | Elixir Acl
elixirfm implementationoffunctionalarabicmorphology otakarsmrz institute of formal and applied linguistics faculty of mathematics and physics charles university in prague otakar smrz mff cuni cz abstract in section 3 we survey some ...

icon picture PDF Filetype PDF | Posted on 23 Sep 2022 | 3 years ago
Partial capture of text on file.
                          ElixirFM—ImplementationofFunctionalArabicMorphology
                                                                            ˇ
                                                             OtakarSmrz
                                              Institute of Formal and Applied Linguistics
                                                  Faculty of Mathematics and Physics
                                                      Charles University in Prague
                                                   otakar.smrz@mff.cuni.cz
                                     Abstract                            In Section 3, we survey some of the categories of
                                                                       the syntax–morphologyinterfaceinModernWritten
                    FunctionalArabicMorphologyisaformula-              Arabic, as described by the Functional Arabic Mor-
                    tion of the Arabic inflectional system seek-        phology. Inpassing,wewillintroducethebasiccon-
                    ing the working interface between morphol-         cepts of programming in Haskell, a modern purely
                    ogy and syntax. ElixirFM is its high-level         functional language that is an excellent choice for
                    implementation that reuses and extends the         declarative generative modeling of morphologies, as
                    Functional Morphology library for Haskell.         Forsberg and Ranta (2004) have shown.
                    Inflection and derivation are modeled in              Section4willbedevotedtodescribingthelexicon
                    terms of paradigms, grammatical categories,        of ElixirFM. We will develop a so-called domain-
                    lexemes and word classes. The computation          specific language embedded in Haskell with which
                    of analysis or generation is conceptually dis-     we will achieve lexical definitions that are simulta-
                    tinguished from the general-purpose linguis-       neously a source code that can be checked for con-
                    tic model. The lexicon of ElixirFM is de-          sistency, a data structure ready for rather indepen-
                    signed with respect to abstraction, yet is no      dent processing, and still an easy-to-read-and-edit
                    more complicated than printed dictionaries.        document resembling the printed dictionaries.
                    It is derived from the open-source Buckwal-          In Section 5, we will illustrate how rules of in-
                    ter lexicon and is enhanced with information       flection and derivation interact with the parameters
                    sourcing from the syntactic annotations of         of the grammar and the lexical information. We will
                    the Prague Arabic Dependency Treebank.             demonstrate, also with reference to the Functional
                 1   Overview                                          Morphologylibrary (Forsberg and Ranta, 2004), the
                                                                       reusability of the system in many applications, in-
                 Onecanobserveseveraldifferentstreamsbothinthe         cluding computational analysis and generation in
                 computational and the purely linguistic modeling of   various modes, exploring and exporting of the lex-
                 morphology. Somearemotivatedbytheneedtoana-           icon, printing of the inflectional paradigms, etc.
                 lyze word forms as to their compositional structure,
                 others consider word inflection as being driven by     2   Morphological Models
                 the underlying system of the language and the for-    According to Stump (2001), morphological theories
                 malrequirements of its grammar.                       can be classified along two scales. The first one
                   In Section 2, before we focus on the principles of  deals with the core or the process of inflection:
                 ElixirFM, we briefly follow the characterization of
                 morphological theories presented by Stump (2001)      lexical theories associate word’s morphosyntactic
                 and extend the classification to the most promi-            properties with affixes
                 nent computational models of Arabic morphology
                 (Beesley, 2001; Buckwalter, 2002; Habash et al.,      inferential theories consider inflection as a result of
                 2005; El Dada and Ranta, 2006).                            operations on lexemes; morphosyntactic prop-
                                                                             erties are expressed by the rules that relate the                                                                                                                               tem. The Arabic resource grammar in the Grammat-
                                                                             form in a given paradigm to the lexeme                                                                                                                                          ical Framework (El Dada and Ranta, 2006) is per-
                                                                    The second opposition concerns the question of                                                                                                                                           haps the most complete inferential–realizational im-
                                                           inferability of meaning, and theories divide into:                                                                                                                                                plementation to date. Its style is compatible with
                                                                                                                                                                                                                                                             the linguistic description in e.g. (Fischer, 2001) or
                                                           incremental words acquire morphosyntactic prop-                                                                                                                                                   (Badawi et al., 2004), but the lexicon is now very
                                                                             erties only in connection with acquiring the in-                                                                                                                                limited and some other extensions for data-oriented
                                                                             flectional exponents of those properties                                                                                                                                         computational applications are still needed.
                                                                                                                                                                                                                                                                      ElixirFMisinspiredbythemethodologyin(Fors-
                                                           realizational association of a set of properties with                                                                                                                                             berg and Ranta, 2004) and by functional program-
                                                                             a word licenses the introduction of the expo-                                                                                                                                   ming, just like the Arabic GF is (El Dada and Ranta,
                                                                             nents into the word’s morphology                                                                                                                                                2006). Nonetheless, ElixirFM reuses the Buckwal-
                                                                    Evidence favoring inferential–realizational theo-                                                                                                                                        ter lexicon (2002) and the annotations in the Prague
                                                                                                                                                                                                                                                                                                                                                                                               ˇ
                                                           ries over the other three approaches is presented by                                                                                                                                              Arabic Dependency Treebank (Hajic et al., 2004),
                                                           Stump (2001) as well as Baerman et al. (2006) or                                                                                                                                                  and implements a yet more refined linguistic model.
                                                           Spencer (2004). In trying to classify the implemen-                                                                                                                                               3              Morphosyntactic Categories
                                                           tations of Arabic morphological models, let us re-                                                                                                                                                Functional Arabic Morphology and ElixirFM re-
                                                           consider this cross-linguistic observation:                                                                                                                                                       establish the system of inflectional and inher-
                                                                             The morphosyntactic properties associ-                                                                                                                                          ent morphosyntactic properties (alternatively named
                                                                             ated with an inflected word’s individ-                                                                                                                                           grammatical categories or features) and distinguish
                                                                             ual inflectional markings may underdeter-                                                                                                                                        precisely the senses of their use in the grammar.
                                                                             mine the properties associated with the                                                                                                                                                  In Haskell, all these categories can be represented
                                                                             wordasawhole.                                                                 (Stump, 2001, p. 7)                                                                               as distinct data types that consist of uniquely identi-
                                                                                                                                                                                                                                                             fiedvalues. Wecanforinstancedeclarethatthecate-
                                                                    How do the current morphological analyzers in-                                                                                                                                           gory of case in Arabic discerns three values, that we
                                                           terpret, for instance, the number and gender of the                                                                                                                                               also distinguish three values for number or person,
                                                                                                                                                                                                                           
                                                           Arabic broken masculine plurals gudud XYg new
                                                                                                                                                                                     ˇ                                     .                               or two values of the given names for verbal voice:
                                                                                                                        	  
                                                           ones or qudah èA’¯ judges, or the case of mustawan
                                                                                              . ¯                                                                                                                                                          data Case = Nominative | Genitive |
                                                                                                                                                                                                                                                                                                               Accusative
                                                           øñJ‚Óalevel? Do they identify the values of these                                                                                                                                                 data Number = Singular | Dual | Plural
                                                           features that the syntax actually operates with, or is                                                                                                                                            data Person = First | Second | Third
                                                           theresolutionhinderedbysometoogenericassump-                                                                                                                                                      data Voice = Active | Passive
                                                           tions about the relation between meaning and form?                                                                                                                                                         All these declarations introduce new enumerated
                                                                    Many of the computational models of Arabic                                                                                                                                               types, and we can use some easily-defined meth-
                                                           morphology,includinginparticular(Beesley, 2001),                                                                                                                                                  ods of Haskell to work with them. If we load this
                                                           (Ramsay and Mansur, 2001) or (Buckwalter, 2002),                                                                                                                                                                                                                                                                                                                         1
                                                           are lexical in nature. As they are not designed in                                                                                                                                                (slightly extended) program into the interpreter, we
                                                           connection with any syntax–morphology interface,                                                                                                                                                  can e.g. ask what category the value Genitive be-
                                                           their interpretations are destined to be incremental.                                                                                                                                             longs to (seen as the :: type signature), or have it
                                                                    Some signs of a lexical–realizational system can                                                                                                                                         evaluate the list of the values that Person allows:
                                                           be found in (Habash, 2004). The author mentions                                                                                                                                                    ? :type Genitive                                                                  → Genitive :: Case
                                                           and fixes the problem of underdetermination of in-                                                                                                                                                  ? enum :: [Person] → [First,Second,Third]
                                                           herent number with broken plurals, when develop-                                                                                                                                                           Lists in Haskell are data types that can be
                                                           ing a generative counterpart to (Buckwalter, 2002).                                                                                                                                               parametrized by the type that they contain. So, the
                                                                    The computational models in (Soudi et al., 2001)                                                                                                                                         value[Active, Active, Passive]isalistofthree
                                                           and (Habash et al., 2005) attempt the inferential–                                                                                                                                                elementsoftypeVoice,andwecanwritethisifnec-
                                                           realizational direction. Unfortunately, they imple-                                                                                                                                               essary as the signature :: [Voice]. Lists can also
                                                           mentonlysectionsoftheArabicmorphologicalsys-                                                                                                                                                                 1http://www.haskell.org/
                                                                       beemptyorhavejustonesingleelement. Wedenote                                                                                                                                                                                                  state in the sense of Fischer, and adding a boolean
                                                                       lists containing some type a as being of type [a].                                                                                                                                                                                           feature for the presence of the definite article...
                                                                                  Haskell provides a number of useful types al-                                                                                                                                                                                     However, we would get one unacceptable combina-
                                                                       ready, such as the enumerated boolean type or the                                                                                                                                                                                            tion of the values claiming the presence of the def-
                                                                       parametric type for working with optional values:                                                                                                                                                                                            inite article and yet the indefinite state, i.e. possibly
                                                                       data Bool = True | False                                                                                                                                                                                                                     the indefinite article or the diptotic declension.
                                                                       data Maybe a = Just a | Nothing                                                                                                                                                                                                                        Functional Arabic Morphology refactors the six
                                                                                  Similarly, we can define a type that couples other                                                                                                                                                                                 different kinds of forms (if we consider all inflec-
                                                                       values together. In the general form, we can write                                                                                                                                                                                           tional situations) depending on two parameters. The
                                                                       data Couple a b = a :-: b                                                                                                                                                                                                                    first controls prefixation of the (virtual) definite arti-
                                                                       which introduces the value :-: as a container for                                                                                                                                                                                            cle, the other reduces some suffixes if the word is a
                                                                       somevalueoftypeaandanotheroftypeb.2                                                                                                                                                                                                          head of an annexation. In ElixirFM, we define these
                                                                                  Let us return to the grammatical categories. In-                                                                                                                                                                                  parameters as type synonyms to what we recall:
                                                                       flection of nominals is subject to several formal re-                                                                                                                                                                                         type Definite = Maybe Bool
                                                                                                                                                                                                                                                                                                                    type Annexing = Bool
                                                                       quirements, which different morphological models                                                                                                                                                                                                       The Definite values include Just True for
                                                                       decompose differently into features and values that                                                                                                                                                                                          forms with the definite article, Just False for
                                                                       are not always complete with respect to the inflec-                                                                                                                                                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                                                    forms in some compounds or after la B or ya AK
                                                                       tional system, nor mutually orthogonal. We will ex-                                                                                                                                                                                                                                                                                                                                                                ¯                                       ¯            

                                                                       plain what we meanbyrevisitingthenotionsofstate                                                                                                                                                                                              (absolute negatives or vocatives), and Nothing for
                                                                       and definiteness in contemporary written Arabic.                                                                                                                                                                                              formsthatrejectthedefinitearticleforotherreasons.
                                                                                  To minimize the confusion of terms, we will de-                                                                                                                                                                                             Functional Arabic Morphology considers state as
                                                                       part from the formulation presented in (El Dada and                                                                                                                                                                                          aresult of coupling the two independent parameters:
                                                                       Ranta, 2006). In there, there is only one relevant                                                                                                                                                                                           type State = Couple Definite Annexing
                                                                       category, which we can reimplement as State’:                                                                                                                                                                                                          Thus, the indefinite state Indef describes a word
                                                                       data State’ = Def | Indef | Const                                                                                                                                                                                                            void of the definite article(s) and not heading an an-
                                                                       Variation of the values of State’ would enable gen-                                                                                                                                                                                          nexation, i.e. Nothing :-: False. Conversely, ar-
                                                                                                                                                                                                                                                                                                                                                              	    
                                                                                                                                                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                                                                                    rafı֒u ñªJ¯QË@ is in the state Just True :-: True.
                                                                                                                                                                                                                                                                                                                            ¯ ¯                        

                                                                       eratingtheformsal-kitabu HAJºË@def.,kitabun HAJ»
                                                                                                                                                                        ¯                      .                                                        ¯                           .           
                                                                                                                                                                                                                                                                                                                   The classical construct state is Nothing :-: True.
                                                                                                                                                                        
                                                                       indef., and kitabu HAJ» const. for the nominative
                                                                                                                                       ¯                        .                                                                                                                                                  The definite state is Just _ :-: False, where _ is
                                                                       singular of book. This seems fine until we explore                                                                                                                                                                                            True for El Dada and Ranta and False for Fischer.
                                                                       more inflectional classes. The very variation for the                                                                                                                                                                                         We may discover that now all the values of State
                                                                       nominative plural masculine of the adjective high                                                                                                                                                                                                                                                               3
                                                                                                                                                                      	                                                                                                   	                                      are meaningful.
                                                                                                                                                   	                                                                                                   	                        
                                                                       gets ar-rafı֒una àñªJ¯QË@ def., rafı֒una àñªJ¯P in-
                                                                                                                      ¯ ¯                                   	     
                                                       ¯ ¯                                          
                                                   Type declarations are also useful for defining in
                                                                                                                                                                   
                                                                       def., and rafı֒u ñªJ¯P const. But what value does
                                                                                                                              ¯ ¯                        
          	                                                                                                                                             what categories a given part of speech inflects. For
                                                                                                                                                                         
                                                                       the form ar-rafı֒u ñªJ¯QË@, found in improper annex-
                                                                                                                                        ¯ ¯                      
                                                                                                                                                 verbs, this is a bit more involved, and we leave it for
                                                                       ations such as in al-mas֓uluna ’r-rafı֒u ’l-mustawa
                                                                                                                              	                                                ¯ ¯                                                ¯ ¯                                                       ¯                  Figure 2. For nouns, we set this algebraic data type:
                                                                                                 Ï                                             	                    
            Ï
                                                                        øñJ‚Ü@ ñªJ¯QË@ àñËðñ‚Ü @ the-officials the-highs-
                                                                                                                           
                                                                                                                                                                                       data ParaNoun = NounS Number Case State
                                                                       of the-level, receive?
                                                                                  It is interesting to consult for instance (Fischer,                                                                                                                                                                                         In the interpreter, we can now generate all 54
                                                                       2001), where state has exactly the values of State’,                                                                                                                                                                                         combinations of inflectional parameters for nouns:
                                                                       but where the definite state Def covers even forms                                                                                                                                                                                            ? [ NounS n c s | n <- enum, c <- enum,
                                                                       without the prefixed al- Ë@ article, since also some                                                                                                                                                                                                                                                                                         s <- values ]
                                                                                                                                                                                                                            
                                                                       separate words like la B no or ya AK oh can have the
                                                                                                                                                                 ¯                                                ¯ 
                                                                                               Thefunction values is analogous to enum, and both
                                                                       effects on inflection that the definite article has. To                                                                                                                                                                                        need to know their type before they can evaluate.
                                                                       distinguish all the forms, we might think of keeping                                                                                                                                                                                                     3
                                                                                                                                                                                                                                                                                                                                   WithJust False :-: True,wecanannotatee.g.the
                                                                                   2                                                                                                                                                                                                                                                                                                                                                                            	    
                                                                                       Infixoperators can also be written as prefix functions if en-                                                                                                                                                                  ‘incorrectly’ underdetermined rafı֒uñªJ¯P in hum-u ’l-mas֓ulu-
                                                                                                                                                                                                                                                                                                                                                                                                                                      	¯ ¯ 	               
 
                                                                   ¯ ¯
                                                                                                                                                                                                                                                                                                                                                                                                                     Ï                                                           Ï
                                                                       closed in (). Functions can be written as operators if enclosed                                                                                                                                                                              narafı֒u ’l-mustawa øñJ‚Ö @ ñªJ¯P àñËðñ‚Ö @ Ñë they-are the-
                                                                                                                                                                                                                                                                                                                                        ¯ ¯                                            ¯                                             

                                                                       in ‘‘. Wewillexploitthiswhendefiningthelexicon’snotation.                                                                                                                                                                                     officials highs-of the-level, i.e. they are the high-level officials.
                              The ‘magic’ is that the bound variables n, c, and s                                                   The whole generative model adopts the multi-
                              havetheirtypedeterminedbytheNounSconstructor,                                                     purpose notation of ArabT X (Lagally, 2004) as a
                                                                                                                                                                                E
                              soweneednottypeanythingexplicitly. Weusedthe                                                      meta-encoding of both the orthography and phonol-
                              list comprehension syntax to cycle over the lists that                                            ogy. Therefore, instantiation of the "’" hamza car-
                              enum and values produce, cf. (Hudak, 2000).                                                       riers or other merely orthographic conventions do
                              4      ElixirFMLexicon                                                                            not obscure the morphological model. With Encode
                                                                                                                                Arabic4 interpreting the notation, ElixirFM can at
                              Unstructuredtextisjustalistofcharacters,orstring:                                                 the surface level process the original Arabic script
                              type String = [Char]                                                                              (non-)vocalized to any degree or work with some
                              Yet words do have structure, particularly in Arabic.                                              kind of transliteration or even transcription thereof.
                              We will work with strings as the superficial word                                                      Morphophonemic patterns represent the stems of
                              forms, but the internal representations will be more                                              words. The various kinds of abstract prefixes and
                              abstract (and computationally more efficient, too).                                                suffixes can be expressed either as atomic values, or
                                  The definition of lexemes can include the deriva-                                              as literal strings wrapped into extra constructors:
                              tional root and pattern information if appropriate,                                               data Prefix = Al | LA | Prefix String
                              cf. (Habash et al., 2005), and our model will encour-                                             data Suffix = Iy | AT | At | An | Ayn |
                                                                                               
                                                                                                                                                              Un | In | Suffix String
                              age this. The surface word kitab HAJ»book can de-
                                                                                    ¯      .            
                              composetothetriconsonantal root k t b IJ»and the                                                  al = Al; lA = LA                          -- function synonyms
                                                                                                       .
                              morphophonemicpattern FiCAL of type PatternT:
                              data PatternT = FaCaL | FAL | FaCY |                                                              aT = AT; ayn = Ayn; aN = Suffix "aN"
                                                                 FiCAL | FuCCAL | {- ... -}                                         Affixes and patterns are arranged together via
                                                                 MustaFCaL | MustaFaCL                                          the Morphs a data type, where a is a triliteral pat-
                                         deriving (Eq, Enum, Show)
                              The deriving clause associates PatternT with                                                      tern PatternT or a quadriliteral PatternQ or a non-
                              methodsfortestingequality,enumeratingalltheval-                                                   templatic word stem Identity of type PatternL:
                              ues, and turning the names of the values into strings:                                            data PatternL = Identity
                                                                                                                                data PatternQ = KaRDaS | KaRADiS {- ... -}
                              ? show FiCAL → "FiCAL"                                                                            data Morphs a = Morphs a [Prefix] [Suffix]
                                  Wechoose to build on morphophonemic patterns                                                                                                    
                                                                                                                                    The word la-silkıy ú¾ÊƒB wireless can thus be
                              rather than CV patterns and vocalisms. Words like                                                                           ¯        ¯      
    
                                                                                                                   
                                                                                                                             decomposed as the root s l k ½Êƒ and the value
                              istagab HAjJƒ@ to respond and istagwab HñjJƒ@
                                     ˇ ¯       .   .                                           ˇ            .     .     
                              to interrogate have the same underlying VstVCCVC                                                  Morphs FiCL [LA] [Iy]. Shunning such concrete
                              pattern, so information on CV patterns alone would                                                representations, we define new operators >| and |<
                              notbeenoughtoreconstructthesurfaceforms. Mor-                                                     that denote prefixes, resp. suffixes, inside Morphs a:
                              phophonemic patterns, in this case IstaFAL and                                                    ? lA >| FiCL |< Iy → Morphs FiCL [LA][Iy]
                              IstaFCaL, can easily be mapped to the hypothetical                                                    Implementing>|and| b where
                              pact way. Of course, ElixirFM provides functions                                                           morph :: a -> Morphs b
                              for properly interlocking the patterns with the roots:
                              ? merge "k t b"                      FiCAL               → "kitAb"                                instance Morphing (Morphs a) a where
                              ? merge "ˆg w b" IstaFAL                                 → "istaˆgAb"                                      morph = id
                              ? merge "ˆg w b" IstaFCaL → "istaˆgwab"                                                           instance Morphing PatternT PatternT where
                              ? merge "s ’ l"                      MaFCUL              → "mas’Ul"                                        morph x = Morphs x [] []
                              ? merge "z h r"                      IFtaCaL             → "izdahar"
                                                          	                                                                   The instance declarations ensure how the morph
                              The izdahar QëXP@ to flourish case exemplifies that
                                                                                                                               method would turn values of type a into Morphs b.
                              exceptionless assimilations need not be encoded in
                              the patterns, but can instead be hidden in rules.                                                      4http://sf.net/projects/encode-arabic/
The words contained in this file might help you see if this file matches what you are looking for:

...Elixirfm implementationoffunctionalarabicmorphology otakarsmrz institute of formal and applied linguistics faculty mathematics physics charles university in prague otakar smrz mff cuni cz abstract section we survey some the categories syntax morphologyinterfaceinmodernwritten functionalarabicmorphologyisaformula arabic as described by functional mor tion inectional system seek phology inpassing wewillintroducethebasiccon ing working interface between morphol cepts programming haskell a modern purely ogy is its high level language that an excellent choice for implementation reuses extends declarative generative modeling morphologies morphology library forsberg ranta have shown inection derivation are modeled sectionwillbedevotedtodescribingthelexicon terms paradigms grammatical will develop so called domain lexemes word classes computation specic embedded with which analysis or generation conceptually dis achieve lexical denitions simulta tinguished from general purpose linguis neously ...

no reviews yet
Please Login to review.