157x Filetype PDF File size 0.12 MB Source: www.kcgcollege.ac.in
1 A Rule Based Approach for Connective in Malayalam Language 1, 2 1 1 Kumari Sheeja S , Lakshmi S , Sobha Lalitha Devi 1 AU-KBC Research Centre, Anna University, Chrompet, Chennai, sheeja@kcgcollege.com, slakshmi@au-kbc.org , sobha@au-kbc.org 2 KCG College of Technology, Karapakkam,Chennai, Abstract. Discourse connectives signal the relationship between two coherent spans of text. Connective arguments are the text spans they relate. Discourse relations link clauses in text and compose overall text structure. Discourse connectives are an important part of modeling the Malayalam discourse structure.We present our work on rule based approach in identifying the Discourse connective in Malayalam language. Discourse connectives may or may not be explicitly present in the relation. In our work we have focused on the rule based identification of particular connective in Malayalam text and showed encouraging results. Keywords:Discourse connectives. rule based approach. Malayalam Discourse . Connective arguments 1 Introduction Discourse relations connect clauses and sentences in the text and compose the overall text structure. Discourse analysis is concerned with analyzing how clause or sentence level units of text are related to each other within a larger unit of text. The two basic units of discourse relations are discourse markers and their arguments. The discourse markers are the words or phrases which connect two clauses or sentences and establish a relation between two discourse units. Kamala went to hospital but doctor was not there. In the this example the connective “but” makes a relation between two clauses or sentences and making the text coherent. Discourse relations are used in NLP applications and it is important for discourse analysis. Identification of discourse relation in natural language processing is a challenging task. Discourse connectives, despite their common function of connecting the contents of two different clauses, also acts as a conjunction [11]. So it is difficult to distinguish discourse and non-discourse markers. The identification of argument boundaries in text is even more difficult in large text. Malayalam is a South Indian or Dravidian language and also free word order language but maintains the verb in final position. Discourse connectives are important for producing or interpreting text in malayalam language . The content of the paper is organized as follows. Section2 describes the related work. Section 3 gives an overview of discourse relations and section 4 explains the rule based approach. Finally the paper ends with the conclusion of the work. 2 Related Work Relevant work on the annotation of discourse connectives and their arguments have been explored in various languages such as Turkish ([12], Arabic [2], English [7], etc. PDTB is the first to follow the lexically grounded approach to annotation of discourse relations and it is unique in adopting a theory-neutral approach to annotation. PDTB provides argument structure of discourse relations and sense labels of each relation in text which follows hierarchical classification scheme. Elwell et.al, [9] worked using maximum entropy rankers and achieved 3.6% improvement over the state of art on identifying arguments of discourse connectives. Versley [11] worked on tagging German discourse connectives and arguments using English training data and a German_ English parallel corpus.Versely’s approaches were to transfer a tagger for English discourse connectives.They have done this work by annotation projection using a freely accessible list of connectives. He achieved 2 the result as F-score of 68.7% for the identification of discourse connectives. Ghosh [5] used a data driven approach to identify arguments of explicit discourse connectives in the PDTB corpus. Al Saif’s work [1] used machine learning algorithms for automatically identifying explicit discourse connectives and its arguments in Arabic language. Wang et al.,[12] used sub-trees as features and achieved a significant improvement in identifying arguments, explicit and implicit discourse relations. Published works on discourse relation annotations in Indian languages are available for Hindi, Malayalam and Tamil by Sobha et.al,[3].They have also worked on automatic identification of Discourse Relations in the mentioned three Indian Languages [10] using CRFs technique. Other published works in Indian languages are in Hindi [6];[7] and Tamil [8]. In this paper we have explored various Discourse connectives and rule based approach for particular connective in Malayalam language. 3 Discourse Connectives In Malayalam Malayalam is a free-word order language and words are seen agglutinated, hence most of the connectives are seen in agglutinated form.The discourse relation in Malayalam language can be syntactic (a suffix) or lexical[10]. It can be within a clause, inter-clausal or inter-sentential. Discourse connectives are an important part of modeling discourse structure. In this paper,we now describe various connectives present in Malayalam language and a rule based approach to figure out the connective “pakshe” (But). 3.1 Discourse Relation categorization The discourse markers can be realized in any of the following ways. There are two major category Explicit and Implicit relations. We also observed other types of relations. 3.2 Explicit connectives The explicit connectives are morphemes or free words that trigger discourse relations in Malayalam language .Explicit connectives signal the presence of discourse connectives between sentences or clauses. The connectives can occur at the initial, final or medial position in an argument in Malayalam language [12]. Below are the examples for explicit connectives in malayalam language. [prameham oru nishabdha diabetes one silent kolayaaLiyaaN.]/arg1 killer ennaal [niyanthrichu nirthiyaal but control kept if kuzhappamilla]/arg2 no problem (Diabetes is a silent killer. But when kept in control it is not a problem.) In the above example, the connective “ennaal” occurs inter sententially by connecting the two sentences. Connective occur at the initial position in the second argument. We see that the connectives are explicitly realizing relations between two arguments. Four types of explicit connectives have been observed. 3 3.3 Explicit connective Types Subordinate Conjunctions. This type of conjunctions conjunctions connect the main clause with the adverbial clause , noun or an adjectival clause. Most commonly observed subordinate conjunctions in all three languages are since, because and when. Consider the following examples which give the distribution of subordinate conjunctions in malayalam language. [pachakkarikaL vevichu Vegetables boil kazhikkumpoL]/arg1 when eat [athiluLLa poshakam nashtamaakum]/arg2 In that nutrients loss (When vegetables are boiled and consumed, the nutrients in it are lost) In the above examples both lexical and morpheme can become the connectives Co-ordinate Conjunctions. This conjunction give equal emphasis for two clauses. They connect two words, phrases and clauses. The most commonly observed co-ordinate conjunction in the corpus are “but” and “and”. The conjunction is “pakshe” which is the co-ordinate conjunction.The intra sentential coordinating conjunction can occur between the clauses. Conjunct Adverbs. These are said to modify the clauses or sentences in which they occur. They join independent clauses together. These are special type of conjunctions as they are part of adverbs and conjunction. Given below are the examples of such a relation. [kazhuth, mukham, kaiviralukal ennivitangalil Neck, face, fingers all+these+palces karuthaniramuNtaakaan kozhuppu black+color+come fat kaaraNamaakum.]/arg1 athinaal [eNNayil reason+will+be Therefore oil varutha aahaaram, kozhuppulla Bakshanam fried food fatty food enniva ozhivaakkaNam.]/arg2 all+these avoid (Fat can make the neck, face and fingers turn to black color. Therefore we have to avoid oily foods and fatty stuffs.) In the above example “athinaal” is the adverbial conjunction which actually shows a cause and effect relationship where arg1 is effect and arg2 is the cause. Correlative conjunction. Correlative conjunctions are another type of simple pair of conjunctions that is used in a sentence to join different words or group of words. This conjunction is not used to connect sentences themselves.But they link two or more words or clauses of equal importance within a sentence itself. They always occur within a sentence. [indyayennaal innu sachin 4 india means today sachin maathramalla,]/arg1 [pakshe innum not only but also today Sachinillaathe indyaye sachin without india sankalppikkaan prayaasam.]/arg2 think cannot (Today India means not only Sachin, but also cannot think of an India without Sachin.) Here “maathramalla-pakshe” is the correlative connective. But the “pakshe” is even said to be dropped in certain cases. Complementizer clause.This clause is considered as a special type of connective. It is a type of conjunction which marks a complement clause. [avare vila kalppikkunnilla]/arg1ennu [nethaakkal they value not given that leaders abhinayichu]/arg2 pretend (The leaders pretended that they were not given a value.) 3.4 Implicit Connectives An implicit relation can be inferred if there exist a relationship between adjacent pair of sentences and explicit connective is not present in the text. We have labeled as “IMPLICIT” label where an implicit relation was inferred[12]. (7) [pilkaalath niravadhi svadeshikal bekkarute later many people bekkar's paatha pinthutarnnu.]/arg1 IMPLICIT [mattu way followed some chilaraakatte kaayalil svadesheeyamaaya People backwater traditional Reethiyil kayal nikathi krishi bhoomi style backwater filled farm land uNdaakkiyetuthu.]/arg2 made (Later many people followed bekkar's path. Some people in their traditional style filled up back waters and made their farm land.) In the above example two sentences are not explicitly connected but a relationship can be inferred implicitly. 4 Rule Based Approach Malayalam is a language of the Dravidian family and words are seen agglutinated. In this work, we have collected Malayalam sentences from websites and the document consists of 3000 sentences.
no reviews yet
Please Login to review.