Cogprints: No conditions. Results ordered -Date, Title. 2018-01-17T14:25:11ZEPrintshttp://cogprints.org/images/sitelogo.gifhttp://cogprints.org/2015-02-24T18:36:37Z2015-02-24T18:36:37Zhttp://cogprints.org/id/eprint/9827This item is in the repository with the URL: http://cogprints.org/id/eprint/98272015-02-24T18:36:37ZSentence syntax trees should be made from morphemes. Semantically ordered treesSome critique of usage of sentence parse trees in modern linguistics. Two propositions on constructing trees, as mentioned in the title. Introduction of an English-to-Tatar translator program that is being developed by the author. Precedence by specificity.Dinar Qurbanovqdinar@gmail.com2013-11-18T21:08:04Z2013-11-18T21:08:04Zhttp://cogprints.org/id/eprint/9094This item is in the repository with the URL: http://cogprints.org/id/eprint/90942013-11-18T21:08:04ZIndonesian Innovations on Information Technology 2013: Between Syntactic and Semantic Textual Network
Network and graph model is a good alternative to analyze huge collective textual data for the ability to reduce the dimensionality of the data. Texts can be seen as syntactic and semantic network among words and phrases seen as concepts. The model is implemented to observe the proposals of Indonesian innovators for implementation of information technology. From the analysis some interesting insights are outlined. Hokky Situngkirhs@compsoc.bandungfe.net2013-11-18T21:01:54Z2013-11-18T21:01:54Zhttp://cogprints.org/id/eprint/9068This item is in the repository with the URL: http://cogprints.org/id/eprint/90682013-11-18T21:01:54ZImproving the quality of Gujarati-Hindi Machine Translation through part-of-speech tagging and stemmer-assisted transliterationMachine Translation for Indian languages is an emerging research area. Transliteration is one such module that we design while designing a translation system. Transliteration means mapping of source language text into the target language. Simple mapping decreases the efficiency of overall translation system. We propose the use of stemming and part-of-speech tagging for transliteration. The effectiveness of translation can be improved if we use part-of-speech tagging and stemming assisted transliteration.We have shown that much of the content in Gujarati gets transliterated while being processed for translation to Hindi language.Juhi AmetaNisheeth JoshiIti Mathur2013-11-18T21:01:21Z2013-11-18T21:01:21Zhttp://cogprints.org/id/eprint/9058This item is in the repository with the URL: http://cogprints.org/id/eprint/90582013-11-18T21:01:21ZDevelopment of a Hindi LemmatizerWe live in a translingual society, in order to communicate with people from different parts of the world we need to have an expertise in their respective languages. Learning all these languages is not at all possible; therefore we need a mechanism which can do this task for us. Machine translators have emerged as a tool which can perform this task. In order to develop a machine translator we need to develop several different rules. The very first module that comes in machine translation pipeline is morphological analysis. Stemming and lemmatization comes under morphological analysis. In this paper we have created a lemmatizer which generates rules for removing the affixes along with the addition of rules for creating a proper root word.Snigdha Paulsnigdha.pal18@gmail.comNisheeth Joshinisheeth.joshi@rediffmail.comIti Mathurmathur_iti@rediffmail.com2013-11-18T21:01:58Z2013-11-18T21:01:58Zhttp://cogprints.org/id/eprint/9069This item is in the repository with the URL: http://cogprints.org/id/eprint/90692013-11-18T21:01:58ZPart of Speech Tagging of Marathi Text Using Trigram MethodIn this paper we present a Marathi part of speech tagger. It is a morphologically rich language. It is spoken by the native people of Maharashtra. The general approach used for development of tagger is statistical using trigram Method. The main concept of trigram is to explore the most likely POS for a token based on given information of previous two tags by calculating probabilities to determine which is the best sequence of a tag. In this paper we show the development of the tagger. Moreover we have also shown the evaluation done.Jyoti Singhjyoti.singh132@gmail.comNisheeth Joshinisheeth.joshi@rediffmail.comIti Mathurmathur_iti@rediffmail.com2013-11-18T21:02:02Z2013-11-18T21:02:02Zhttp://cogprints.org/id/eprint/9070This item is in the repository with the URL: http://cogprints.org/id/eprint/90702013-11-18T21:02:02ZRule Based Transliteration Scheme for English to PunjabiMachine Transliteration has come out to be an emerging and a very important research area in the field of machine translation. Transliteration basically aims to preserve the phonological structure of words. Proper transliteration of name entities plays a very significant role in improving the quality of machine translation. In this paper we are doing machine transliteration for English-Punjabi language pair using rule based approach. We have constructed some rules for syllabification. Syllabification is the process to extract or separate the syllable from the words. In this we are calculating the probabilities for name entities (Proper names and location). For those words which do not come under the category of name entities, separate probabilities are being calculated by using relative frequency through a statistical machine translation toolkit known as MOSES. Using these probabilities we are transliterating our input text from English to Punjabi.Deepti Bhalladeeptibhalla0600@gmail.comNisheeth Joshinisheeth.joshi@rediffmail.comIti Mathurmathur_iti@rediffmail.com2012-11-09T19:41:09Z2012-11-09T19:41:09Zhttp://cogprints.org/id/eprint/8267This item is in the repository with the URL: http://cogprints.org/id/eprint/82672012-11-09T19:41:09ZGetting the most from a surname study: semantics, DNA and computer modellingWe here address such questions as: what does a surname mean; is it single origin; and, why do some surnames grow abnormally large? Though most surnames are rare, most people have populous surnames. In 1881 for example, 90% of the population of England and Wales had the most populous 4% of surnames; and, in 1998, 80% had the 1% most populous. In this paper, we consider the evidence that some frequent surnames could be single source; this would imply that a single family has grown abnormally large. For some populous surnames, they have a geographical distribution that might be thought to be consistent with a single origin though, as yet, such supposition generally lacks support from adequate DNA evidence.
With the onset of DNA testing, some scientists are becoming more active in surname studies and they might be more reluctant than some traditionalists to infer too much from categories of surname meaning. Instead, they are likely to maintain that statistical analyses of the data should be properly performed. For example, King and Jobling (2009) considered forty English surnames and found no statistically significant correlation between the supposed semantic category of a surname and its degree of DNA matching into single male-line families. As a specific example that we here describe in some detail, little can be deduced about the inter-relatedness of those called Plant from the assumption of a semantic category, such as by arguing that it is locative and hence single-origin, or occupational and hence multi-origin. By comparison, more surely, we discuss the DNA evidence that this name’s main family grew unusually. Though motivated initially by the evidence of unusual growth for Plant, we extend our deliberations more generally to other surnames.
Guided by the empirical evidence, our computer simulations identify various reasons for a surname family’s prolific growth. In particular, chance is a main factor, along with favourable conditions during the Industrial Age when overall population growth took off, evidently earlier in some regions than in others. Also, the modelling suggests that some additional factor such as polygyny or resilience to plague or favourable economic circumstance, after an early start to a hereditary surname, is beneficial in seeing a family through initial precarious times, sustaining its survival through to a small but real chance of subsequent proliferation in favourable Industrial Age conditions.
Dr John S Plantplant@one-name.orgProf Richard E Plantreplant@ucdavis.edu2013-09-17T14:35:23Z2013-09-17T14:35:23Zhttp://cogprints.org/id/eprint/9057This item is in the repository with the URL: http://cogprints.org/id/eprint/90572013-09-17T14:35:23ZHuman and Automatic Evaluation of English to Hindi Machine Translation SystemsMachine Translation Evaluation is the most formidable activity in Machine Translation Development. We present the MT evaluation results of some of the machine translators available online for English-Hindi machine translation. The systems are measured on automatic evaluation metrics and human subjectivity measures.Nisheeth Joshinisheeth.joshi@rediffmail.comHemant Darbaridarbari@cdac.inIti Mathurmathur_iti@rediffmail.com2013-11-18T21:01:33Z2013-11-18T21:01:33Zhttp://cogprints.org/id/eprint/9060This item is in the repository with the URL: http://cogprints.org/id/eprint/90602013-11-18T21:01:33ZDesign of English-Hindi Translation Memory for Efficient TranslationDeveloping parallel corpora is an important and a difficult activity for Machine Translation. This requires manual annotation by Human Translators. Translating same text again is a useless activity. There are tools available to implement this for European Languages, but no such tool is available for Indian Languages. In this paper we present a tool for Indian Languages which not only provides automatic translations of the previously available translation but also provides multiple translations, in cases where a sentence has multiple translations, in ranked list of suggestive translations for a sentence. Moreover this tool also lets translators have global and local saving options of their work, so that they may share it with others, which further lightens the task.Nisheeth Joshinisheeth.joshi@rediffmail.comIti Mathurmathur_iti@rediffmail.com2012-11-09T18:50:42Z2012-11-09T18:50:42Zhttp://cogprints.org/id/eprint/7881This item is in the repository with the URL: http://cogprints.org/id/eprint/78812012-11-09T18:50:42ZExpectations eclipsed in foreign language education: learners and educators on an ongoing journey / edited by Hülya Görür-Atabaş, Sharon Turner.Between June 2-4, 2011 Sabancı University School of Languages welcomed colleagues from 21 different countries to a collaborative exploration of the challenging and inspiring journey of learners and educators in the field of language education.
The conference provided an opportunity for all stakeholders to share their views on language education. Colleagues met with world-renowned experts and authors in the fields of education and psychology, faculty and administrators from various universities and institutions, teachers from secondary educational backgrounds and higher education, as well as learners whose voices are often not directly shared but usually reported.
The conference name, Eclipsing Expectations, was inspired by two natural phenomena, a solar eclipse directly before the conference, and a lunar eclipse, immediately after. Learners and educators were hereby invited to join a journey to observe, learn and exchange ideas in orderHülya Görür Atabaşhulyag@sabanciuniv.eduSharon Turnershturner@sabanciuniv.edu2010-11-22T14:17:39Z2011-03-11T08:57:49Zhttp://cogprints.org/id/eprint/7133This item is in the repository with the URL: http://cogprints.org/id/eprint/71332010-11-22T14:17:39ZSome Inquiries to Spontaneous Opinions: A case with Twitter in Indonesia
The paper discusses opportunities to utilize the series of micro-blogs as provided by the Twitter in observation of opinion dynamics. The spontaneity of tweets is more, as the service is attached more to the mobile communications. The extraction of information in the series of tweets is demonstrated as in conceptual map and mention map. From the latter, the social network stylized properties, i.e.: power law distribution is shown. The exemplification of the methodology is on the 82nd commemoration of Indonesian Youth Pledge and the participatory movement of Indonesian capitol city, Jakarta.Ardian Maulanaai@compsoc.bandungfe.netHokky Situngkirhs@compsoc.bandungfe.net2013-11-18T21:01:37Z2013-11-18T21:01:37Zhttp://cogprints.org/id/eprint/9061This item is in the repository with the URL: http://cogprints.org/id/eprint/90612013-11-18T21:01:37ZEvaluation of Computational Grammar Formalisms for Indian LanguagesNatural Language Parsing has been the most prominent research area since the genesis of Natural Language Processing. Probabilistic Parsers are being developed to make the process of parser development much easier, accurate and fast. In Indian context, identification of which Computational Grammar Formalism is to be used is still a question which needs to be answered. In this paper we focus on this problem and try to analyze different formalisms for Indian languages.Nisheeth Joshinisheeth.joshi@rediffmail.comIti Mathurmathur_iti@rediffmail.com2013-11-18T21:01:41Z2013-11-18T21:01:41Zhttp://cogprints.org/id/eprint/9062This item is in the repository with the URL: http://cogprints.org/id/eprint/90622013-11-18T21:01:41ZInput Scheme for Hindi Using Phonetic MappingWritten Communication on Computers requires knowledge of writing text for the desired language using Computer. Mostly people do not use any other language besides English. This creates a barrier. To resolve this issue we have developed a scheme to input text in Hindi using phonetic mapping scheme. Using this scheme we generate intermediate code strings and match them with pronunciations of input text. Our system show significant success over other input systems available.Nisheeth Joshinisheeth.joshi@rediffmail.comIti Mathurmathut_iti@rediffmail.com2012-11-09T19:24:55Z2012-11-09T19:24:55Zhttp://cogprints.org/id/eprint/8026This item is in the repository with the URL: http://cogprints.org/id/eprint/80262012-11-09T19:24:55ZExploring the N-th Dimension of LanguageThis paper is aimed at exploring the hidden fundamental
computational property of natural language that has been so elusive that it has made all attempts to characterize its real computational property ultimately fail. Earlier natural language was thought to be context-free. However, it was gradually realized that this does not hold much water given that a range of natural language phenomena have been found as being of non-context-free character that they have almost scuttled plans to brand natural language contextfree. So it has been suggested that natural language is mildly context-sensitive and to some extent context-free. In all, it seems that the issue over the exact computational property has not yet been solved. Against this background it will be proposed that this exact computational property of natural language is perhaps the N-th dimension of language, if what we mean by dimension is
nothing but universal (computational) property of natural language.Prakash Mondalmndlprksh@yahoo.co.in2010-07-29T01:51:45Z2011-03-11T08:57:38Zhttp://cogprints.org/id/eprint/6879This item is in the repository with the URL: http://cogprints.org/id/eprint/68792010-07-29T01:51:45ZRepresentation and computationThis is an encyclopedia entry and does not include an abstract.Maurizio Tirassamaurizio.tirassa@unito.itMarianna Vallana2010-07-02T03:30:00Z2011-03-11T08:57:38Zhttp://cogprints.org/id/eprint/6864This item is in the repository with the URL: http://cogprints.org/id/eprint/68642010-07-02T03:30:00ZResearch on Social Engagement with a Rabbitic User InterfaceCompanions as interfaces to smart rooms need not only to be easy to interact with, but also to maintain long-term relationships with their users. The FP7-funded project SERA (Social Engagement with Robots and Agents) contributes to knowledge about and modeling of such relationships. One focal activity is an iterative field study to collect real-life long-term interaction data with a robotic interface. The first stage of this study has been completed. This paper reports on the set-up and the first insights.Sabine Payrsabine.payr@ofai.atPeter WallisStuart CunninghamMark Hawley2012-12-22T13:16:19Z2013-02-18T15:10:19Zhttp://cogprints.org/id/eprint/8772This item is in the repository with the URL: http://cogprints.org/id/eprint/87722012-12-22T13:16:19ZA Constructive Mathematic approach for Natural Language formal grammarsA mathematical description of natural language grammars has been proposed first by Leibniz. After the definition given by Frege of unsaturated expression and the foundation of a logical grammar by Husserl, the application of logic to treat natural language grammars in a computational way raised the interest of linguists, for example applying Lambek's categorial calculus. In recent years, the most consolidated formal grammars (e.g., Minimalism, HPSG, TAG, CCG, Dependency Grammars) began to show an interest in giving a strong psychological interpretation to the formalism and hence to natural language data on which they are applied. Nevertheless, no one seems to have paid much attention to cognitive linguistics, a branch of linguistics that actively uses concepts and results from cognitive sciences. Apparently unrelated, the study of computational concepts and formalisms has developed in pair with constructive formal systems, especially in the branch of logic called proof theory, see, e.g., the Curry-Howard isomorphism and the typed functional languages. In this paper, we want to bridge these worlds and thus present our natural language formalism, called Adpositional Grammars (AdGrams), that is founded over both cognitive linguistics and constructive mathematics.Dr Federico Gobbofederico.gobbo@uninsubria.itDr Marco Beninimarco.benini@uninsubria.it2009-04-24T15:26:29Z2011-03-11T08:57:21Zhttp://cogprints.org/id/eprint/6431This item is in the repository with the URL: http://cogprints.org/id/eprint/64312009-04-24T15:26:29ZLETEC (Learning and Teaching Corpus) SimuligneLearning and Teaching Corpus of the online educational experiment Simuligne (2001). Its scenario is based on a global simulation for the learning of French as a foreign language. It also includes an intercultural activity, "Interculture", based on the Cultura project. The corpus includes the pedagogical scenario, described in several formats, the research protocol, participant's online interactions and productions (structured in XML), list of participants, licences of use.
The LETEC corpus associated (mce.simu.all.all-CP.zip) is organized as an IMS-CP archive. We define a Learning & Teaching Corpus as a structured entity containing all the elements resulting from a communicative on-line learning situation, whose context is described by an educational scenario and a research protocol. The core data collection includes all the interaction data, the productions of the course participants, and the tracks, resulting from the participants’ actions in the learning environment and stored according to the research protocol. In order to be able to be shared, and to respect participant privacy, these data should be anonymised and a license for its use be provided in the corpus. A derived analysis can be linked to a given set of data under consideration, used or computerized for this analysis. An analysis consisting in data annotation/transcription/transformation, accurately connected to its original data, can be merged with the corpus itself, in order for other researchers to compare their own results on a concurrent analysis or to build their complementary analysis upon these results.
The definition of a Learning & Teaching Corpus as a whole entity comes from the need of explicit links, between interaction data, context and analyses. This explicit context is crucial for an external researcher to interpret the data and to perform its own analyses.
This definition seeks to capture the context of the data stemming from the course in order to allow a researcher to look for, understand and connect this information whether or not he/she was involved in the original course. More details about a LETEC corpus an ist structure at : http://mulce.univ-fcomte.fr/metadata/LETECorpus-en.pdfThierry Chanierthierry.chanier@univ-fcomte.frMarie-Noelle LamyM.N.Lamy@open.ac.ukChristophe ReffayChristophe.Reffay@univ-fcomte.frMarie-Laure Betbedermarie-laure.betbeder@univ-fcomte.frMaud Ciekanskimaud.ciekanski@univ-fcomte.fr2009-01-05T23:58:22Z2011-03-11T08:57:17Zhttp://cogprints.org/id/eprint/6305This item is in the repository with the URL: http://cogprints.org/id/eprint/63052009-01-05T23:58:22ZThe Latent Relation Mapping Engine: Algorithm and ExperimentsMany AI researchers and cognitive scientists have argued that analogy is the core of cognition. The most influential work on computational modeling of analogy-making is Structure Mapping Theory (SMT) and its implementation in the Structure Mapping Engine (SME). A limitation of SME is the requirement for complex hand-coded representations. We introduce the Latent Relation Mapping Engine (LRME), which combines ideas from SME and Latent Relational Analysis (LRA) in order to remove the requirement for hand-coded representations. LRME builds analogical mappings between lists of words, using a large corpus of raw text to automatically discover the semantic relations among the words. We evaluate LRME on a set of twenty analogical mapping problems, ten based on scientific analogies and ten based on common metaphors. LRME achieves human-level performance on the twenty problems. We compare LRME with a variety of alternative approaches and find that they are not able to reach the same level of performance.Peter D. Turneypeter.turney@nrc-cnrc.gc.ca2008-08-31T12:24:12Z2011-03-11T08:57:11Zhttp://cogprints.org/id/eprint/6181This item is in the repository with the URL: http://cogprints.org/id/eprint/61812008-08-31T12:24:12ZA Uniform Approach to Analogies, Synonyms, Antonyms, and AssociationsRecognizing analogies, synonyms, antonyms, and associations appear to be four
distinct tasks, requiring distinct NLP algorithms. In the past, the four
tasks have been treated independently, using a wide variety of algorithms.
These four semantic classes, however, are a tiny sample of the full
range of semantic phenomena, and we cannot afford to create ad hoc algorithms
for each semantic phenomenon; we need to seek a unified approach.
We propose to subsume a broad range of phenomena under analogies.
To limit the scope of this paper, we restrict our attention to the subsumption
of synonyms, antonyms, and associations. We introduce a supervised corpus-based
machine learning algorithm for classifying analogous word pairs, and we
show that it can solve multiple-choice SAT analogy questions, TOEFL
synonym questions, ESL synonym-antonym questions, and similar-associated-both
questions from cognitive psychology.Peter D. Turneypeter.turney@nrc-cnrc.gc.ca2008-08-30T23:21:12Z2011-03-11T08:57:11Zhttp://cogprints.org/id/eprint/6177This item is in the repository with the URL: http://cogprints.org/id/eprint/61772008-08-30T23:21:12ZA MDL-based Model of Gender Knowledge AcquisitionThis paper presents an iterative model of
knowledge acquisition of gender information
associated with word endings in
French. Gender knowledge is represented
as a set of rules containing exceptions.
Our model takes noun-gender pairs as input
and constantly maintains a list of
rules and exceptions which is both coherent
with the input data and minimal with
respect to a minimum description length
criterion. This model was compared to
human data at various ages and showed a
good fit. We also compared the kind of
rules discovered by the model with rules
usually extracted by linguists and found
interesting discrepancies.Harmony MarchalBenoit LemaireMaryse BiancoPhilippe Dessus2008-01-27T03:55:57Z2011-03-11T08:57:03Zhttp://cogprints.org/id/eprint/5911This item is in the repository with the URL: http://cogprints.org/id/eprint/59112008-01-27T03:55:57ZBoundary effects in a three-state modified voter model for languages
The standard three-state voter model is enlarged by including the outside pressure favouring one of the three language choices and by adding some biased internal random noise. The Monte Carlo simulations are motivated by states with the population divided into three groups of various affinities to each other. We show the crucial influence of the boundaries for moderate lattice sizes like 500 x 500. By removing the fixed boundary at one side, we demonstrate that this can lead to the victory of one single choice. Noise in contrast stabilizes the choices of all three populations. In addition, we compute the persistence probability, i.e., the number of sites who have never changed their opinion during the simulation, and we consider the case of ”rigid-minded” decision makers.Tarik Hadzibeganovictarik@edu.uni-graz.atDietrich Staufferstauffer@thp.uni-koeln.deChristian Schulze2007-12-19T03:04:38Z2011-03-11T08:57:02Zhttp://cogprints.org/id/eprint/5876This item is in the repository with the URL: http://cogprints.org/id/eprint/58762007-12-19T03:04:38ZOntology and Formal Semantics - Integration OverdueIn this note we suggest that difficulties encountered in natural language semantics are, for the most part, due to the use of mere symbol manipulation systems that are devoid of any content. In such systems, where there is hardly any link with our common-sense view of the world, and it is quite difficult to envision how one can formally account for the considerable amount of content that is often implicit, but almost never explicitly stated in our everyday discourse.
The solution, in our opinion, is a compositional semantics grounded in an ontology that reflects our commonsense view of the world and the way we talk about it in ordinary language. In the compositional logic we envision there are ontological (or first-intension) concepts, and logical (or second-intension) concepts, and where the ontological concepts include not only Davidsonian events, but other abstract objects as well (e.g., states, processes, properties, activities, attributes, etc.)
It will be demonstrated here that in such a framework, a number of challenges in the semantics of natural language (e.g., metonymy, intensionality, metaphor, etc.) can be properly and uniformly addressed.
Walid Sabawalid.saba@gmail.com2007-11-22T21:41:22Z2011-03-11T08:57:00Zhttp://cogprints.org/id/eprint/5841This item is in the repository with the URL: http://cogprints.org/id/eprint/58412007-11-22T21:41:22ZEmpirical Evaluation of Four Tensor Decomposition AlgorithmsHigher-order tensor decompositions are analogous to the familiar Singular Value Decomposition (SVD), but they transcend the limitations of matrices (second-order tensors). SVD is a powerful tool that has achieved impressive results in information retrieval, collaborative filtering, computational linguistics, computational vision, and other fields. However, SVD is limited to two-dimensional arrays of data (two modes), and many potential applications have three or more modes, which require higher-order tensor decompositions. This paper evaluates four algorithms for higher-order tensor decomposition: Higher-Order Singular Value Decomposition (HO-SVD), Higher-Order Orthogonal Iteration (HOOI), Slice Projection (SP), and Multislice Projection (MP). We measure the time (elapsed run time), space (RAM and disk space requirements), and fit (tensor reconstruction accuracy) of the four algorithms, under a variety of conditions. We find that standard implementations of HO-SVD and HOOI do not scale up to larger tensors, due to increasing RAM requirements. We recommend HOOI for tensors that are small enough for the available RAM and MP for larger tensors.Peter D. Turneypeter.turney@nrc-cnrc.gc.ca2007-07-14Z2011-03-11T08:56:52Zhttp://cogprints.org/id/eprint/5593This item is in the repository with the URL: http://cogprints.org/id/eprint/55932007-07-14ZAn Alternative Postulate to see Melody as “Language”
The paper proposes a way to see melodic features in music/songs in the terms of “letters” constituting “words”, while in return investigating the fulfillment of Zipf-Mandelbrot Law in them. Some interesting findings are reported including some possible conjectures for classification of melodic and musical artifacts considering several aspects of culture. The paper ends with some discussions related to further directions, be it enrichment in musicology and the possible plan for musical generative art.Hokky Situngkir2007-05-28Z2011-03-11T08:56:51Zhttp://cogprints.org/id/eprint/5563This item is in the repository with the URL: http://cogprints.org/id/eprint/55632007-05-28ZConjecture to Statistical Proximity with Tree of Language (?): Report on Few Austronesian Languages of Indonesian Ethnics
We continue some steps showing the distinctions and proximities of languages over statistical facts as it has been pioneered previously [3]. In the paper, we construct the homology tree from the distance matrix yielded from the transformation of some statistical aspects of the empirical observations into binary sequences in order to conform to the concepts of memetics [2]. The resulting visualizations show interesting facts and possibly challenge some further steps for the advancement of our understanding to the discourse of languages and ethnicities.Hokky SitungkirDeni Khanafiah2007-05-19Z2011-03-11T08:56:50Zhttp://cogprints.org/id/eprint/5544This item is in the repository with the URL: http://cogprints.org/id/eprint/55442007-05-19ZA Note on Ontology and Ordinary LanguageWe argue for a compositional semantics grounded in a strongly typed ontology that reflects our commonsense view of the world and the way we talk about it. Assuming such a structure we show that the semantics of various natural language phenomena may become nearly trivial.Walid Saba2007-04-04Z2011-03-11T08:56:49Zhttp://cogprints.org/id/eprint/5482This item is in the repository with the URL: http://cogprints.org/id/eprint/54822007-04-04ZRegimes in Babel are Confirmed: Report on Findings in Several Indonesian Ethnic Biblical Texts The paper introduces the presence of three statistical regimes in the Zipfian analysis of texts in quantitative linguistics: the Mandelbrot, original Zipf, and Cancho- Solé-Montemurro regimes. The work is carried out over nine different languages of the same intention semantically: the bible from different languages in Indonesian ethnic and national language. As always, the same analysis is also brought in English version of the Bible for reference. The existence of the three regimes are confirmed while in advance the length of the texts are also becomes an important issue. We outline some further works regarding the quantitative analysis for parameterization used to analyze the three regimes and the task to have broad explanation, especially the microstructure of the language in human decision or linguistic effort – emerging the robustness of them.Hokky Situngkir2007-04-04Z2011-03-11T08:56:49Zhttp://cogprints.org/id/eprint/5481This item is in the repository with the URL: http://cogprints.org/id/eprint/54812007-04-04ZAn Observational Framework to the Zipfian Analysis among Different Languages: Studies to Indonesian Ethnic Biblical Texts
The paper introduces the used of Zipfian statistics to observe the human languages by using the same (meaning) corpus/corpora but different in grammatical and structural utterances. We used biblical texts since they contain corpuses that have been most widely and carefully translated into many languages. The idea is to reduce the possibility of noise came from the meaning of the texts in distinctive language. The result is that the robustness of the Zipfian law is observable and some statistical differences are discovered between English and widely used national and several ethnic languages in Indonesia. The paper ends by modestly propose further possible framework in interdisciplinary approaches to human language evolution.Hokky Situngkir2007-03-16Z2011-03-11T08:56:48Zhttp://cogprints.org/id/eprint/5455This item is in the repository with the URL: http://cogprints.org/id/eprint/54552007-03-16ZDesigning Domain Ontology: A Study in Lexical SemanticsPreparing a multi-purpose lexicon requires a systematic analysis of inter-conceptual relations. These relations are of two types, namely (i) syntactic and (ii) semantic, which can further be decomposed to capture the greater explanatory adequacy. But the exploration of the lexical structure becomes intricate because of the hidden dynamics of the context; since traditionally, language has been viewed as a totality of lexicon and computation system, and major emphasis has been given to the designing of the computational system, considering the designing of the lexicon internal domain ontology as a mere metaphysical game, when in reality it is a serious epistemic concern, because of having the capacity of licensing inferences. Therefore a lexical level representation should have enough scope to incorporate the contextual information.
Designing domain ontology is important since it tells us about the conceptual constellation within the coherent whole of which the related terms are meaningful. Isolating a term from the corresponding constellation will results into the evaporation of meaning. Furthermore it provides the basis, upon which the entire linguistic structure rests. If so, then how is it possible to construct a lexicon, by divorcing the ontological issues? And at the same time, ontology by itself is not enough, again because of the reason that the higher order typifications of those (grounded) concepts and their corresponding interrelations among the types ultimately results into the consequent super-ordinating levels, containing the syntactic information pertinent to a symbol manipulating system.
In this paper I would show that the representation of a lexical structure, should include both kind of information which are pertinent to the closed class and as well as the open class semantics, on the basis of examples, cited from English and Bengali.
Samir Karmakar2007-07-28Z2011-03-11T08:56:55Zhttp://cogprints.org/id/eprint/5626This item is in the repository with the URL: http://cogprints.org/id/eprint/56262007-07-28ZFast & Confident Probabilistic CategorizationWe describe NRC's submission to the Anomaly Detection/Text Mining competition organised at the Text Mining Workshop 2007. This submission relies on a straightforward implementation of the probabilistic categoriser described in (Gaussier et al., ECIR'02). This categoriser is adapted to handle multiple labelling and a piecewise-linear confidence estimation layer is added to provide an estimate of the labelling confidence. This technique achieves a score of 1.689 on the test data.
Cyril Goutte2007-05-08Z2011-03-11T08:56:50Zhttp://cogprints.org/id/eprint/5535This item is in the repository with the URL: http://cogprints.org/id/eprint/55352007-05-08ZLanguage, logic and ontology: uncovering the
structure of commonsense knowledgeThe purpose of this paper is twofold: (i) we argue that the structure of commonsense knowledge must be discovered, rather than invented; and (ii) we argue that natural
language, which is the best known theory of our (shared) commonsense knowledge, should itself be used as a guide to discovering the structure of commonsense knowledge. In addition to suggesting a systematic method to the discovery of the structure of commonsense knowledge, the method we propose seems to also provide an explanation for a number of phenomena in natural language, such as metaphor, intensionality, and the semantics of nominal compounds. Admittedly, our ultimate goal is quite ambitious, and it is no less than the systematic ‘discovery’ of a well-typed
ontology of commonsense knowledge, and the subsequent formulation of the longawaited goal of a meaning algebra.Walid Saba2007-07-28Z2011-03-11T08:56:56Zhttp://cogprints.org/id/eprint/5627This item is in the repository with the URL: http://cogprints.org/id/eprint/56272007-07-28ZStatistical Phrase-based Post-editingWe propose to use a statistical phrase-based machine translation system in a post-editing task: the system takes as input raw machine translation output (from a commercial rule-based MT system), and produces post-edited target-language text. We report on experiments that were performed on data collected in precisely such a setting: pairs of raw MT output and their manually post-edited versions. In our evaluation, the output of our automatic post-editing (APE) system is not only better quality than the rule-based MT (both in terms of the BLEU and TER metrics), it is also better than the output of a state-of-the-art phrase-based MT system used in standalone translation mode. These results indicate that automatic post-editing constitutes a simple and efficient way of combining rule-based and statistical MT technologies.
Michel SimardCyril GouttePierre Isabelle2007-11-13T00:51:03Z2011-03-11T08:57:00Zhttp://cogprints.org/id/eprint/5817This item is in the repository with the URL: http://cogprints.org/id/eprint/58172007-11-13T00:51:03ZExperiments on predictability of word in context and information rate in natural languageBased on data from a large-scale experiment with human subjects, we conclude that the logarithm of probability to guess a word in context (unpredictability) depends linearly on the word length. This result holds both for poetry and prose, even though with prose, the subjects don't know the length of the omitted word. We hypothesize that this effect reflects a tendency of natural language to have an even information rate.Dmitrii Maninmanin@pobox.com2006-03-16Z2011-03-11T08:56:21Zhttp://cogprints.org/id/eprint/4764This item is in the repository with the URL: http://cogprints.org/id/eprint/47642006-03-16ZThe Missing Link between Morphemic Assemblies and Behavioral Responses:a Bayesian Information-Theoretical model of lexical processingWe present the Bayesian Information-Theoretical (BIT) model of lexical processing: A mathematical model illustrating a novel approach to the modelling of language processes. The model shows how a neurophysiological theory of lexical processing relying on Hebbian association and neural assemblies can directly account for a variety of effects previously observed in behavioural experiments. We develop two information-theoretical measures of the distribution of usages of a morpheme or word, and use them to predict responses in three visual lexical decision datasets investigating inflectional morphology and polysemy. Our model offers a neurophysiological basis for the effects of
morpho-semantic neighbourhoods. These results demonstrate how distributed patterns of activation naturally result in the arisal of symbolic structures. We conclude by arguing that the modelling framework exemplified here, is
a powerful tool for integrating behavioural and neurophysiological results.Dr Fermin Moscoso del Prado MartinProf Aleksandar KosticDusica Filipovic-Djurdjevic2006-03-06Z2011-03-11T08:56:21Zhttp://cogprints.org/id/eprint/4754This item is in the repository with the URL: http://cogprints.org/id/eprint/47542006-03-06ZThe Missing Link between Morphemic Assemblies and Behavioral Responses:a Bayesian Information-Theoretical model of lexical processingWe present the Bayesian Information-Theoretical (BIT) model of lexical processing: A mathematical model illustrating a novel approach to the modelling of language processes. The model shows how a neurophysiological theory of lexical processing relying on Hebbian association and neural assemblies can directly account for a variety of eects previously observed in behavioral experiments. We develop two information-theoretical measures of the distribution of usages of a word or morpheme. These measures are calculated through unsupervised means from corpora. We show that our measures succesfully predict responses in three visual lexical decision datasets investigating the processing of in
ectional morphology in Serbian and English languages, and the eects of polysemy and homonymy in English. We discuss how our model provides a neurophysiological grounding for the facilitatory and inhibitory eects of dierent types of lexical neighborhoods. In addition, our results show how, under a model based on neural assemblies, distributed patterns of activation naturally result in the arisal of discrete symbol-like structures. Therefore, the BIT model oers a point of reconciliation in the debate between distributed connectionist and discrete localist models. Finally, we argue that the modelling framework exemplied by the BIT model, is a powerful tool for integrating the different levels of the description of the human language
processing system.Fermin Moscoso del Prado MartinKostic AleksandarFilipovic-Djurdjevic Dusica2006-08-01Z2011-03-11T08:56:33Zhttp://cogprints.org/id/eprint/5039This item is in the repository with the URL: http://cogprints.org/id/eprint/50392006-08-01ZExpressing Implicit Semantic Relations without SupervisionWe present an unsupervised learning algorithm that mines large
text corpora for patterns that express implicit semantic relations.
For a given input word pair X:Y with some unspecified semantic
relations, the corresponding output list of patterns <P1,...,Pm>
is ranked according to how well each pattern Pi expresses the
relations between X and Y. For example, given X=ostrich and
Y=bird, the two highest ranking output patterns are "X is the
largest Y" and "Y such as the X". The output patterns are intended
to be useful for finding further pairs with the same relations, to
support the construction of lexicons, ontologies, and semantic
networks. The patterns are sorted by pertinence, where the pertinence
of a pattern Pi for a word pair X:Y is the expected relational
similarity between the given pair and typical pairs for Pi. The
algorithm is empirically evaluated on two tasks, solving
multiple-choice SAT word analogy questions and classifying semantic
relations in noun-modifier pairs. On both tasks, the algorithm
achieves state-of-the-art results, performing significantly better
than several alternative pattern ranking algorithms, based on tf-idf.Peter D. Turney21752006-09-01Z2011-03-11T08:56:35Zhttp://cogprints.org/id/eprint/5098This item is in the repository with the URL: http://cogprints.org/id/eprint/50982006-09-01ZSimilarity of Semantic RelationsThere are at least two kinds of similarity. Relational similarity is
correspondence between relations, in contrast with attributional similarity,
which is correspondence between attributes. When two words have a high
degree of attributional similarity, we call them synonyms. When two pairs
of words have a high degree of relational similarity, we say that their
relations are analogous. For example, the word pair mason:stone is analogous
to the pair carpenter:wood. This paper introduces Latent Relational Analysis (LRA),
a method for measuring relational similarity. LRA has potential applications in many
areas, including information extraction, word sense disambiguation,
and information retrieval. Recently the Vector Space Model (VSM) of information
retrieval has been adapted to measuring relational similarity,
achieving a score of 47% on a collection of 374 college-level multiple-choice
word analogy questions. In the VSM approach, the relation between a pair of words is
characterized by a vector of frequencies of predefined patterns in a large corpus.
LRA extends the VSM approach in three ways: (1) the patterns are derived automatically
from the corpus, (2) the Singular Value Decomposition (SVD) is used to smooth the frequency
data, and (3) automatically generated synonyms are used to explore variations of the
word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the
average human score of 57%. On the related problem of classifying semantic relations, LRA
achieves similar gains over the VSM. Peter D. Turney21752005-08-24Z2011-03-11T08:56:09Zhttp://cogprints.org/id/eprint/4518This item is in the repository with the URL: http://cogprints.org/id/eprint/45182005-08-24ZCorpus-based Learning of Analogies and Semantic RelationsWe present an algorithm for learning from unlabeled text, based on the Vector Space Model (VSM) of information retrieval, that can solve verbal analogy questions of the kind found in the SAT college entrance exam. A verbal analogy has the form A:B::C:D, meaning "A is to B as C is to D"; for example, mason:stone::carpenter:wood. SAT analogy questions provide a word pair, A:B, and the problem is to select the most analogous word pair, C:D, from a set of five choices. The VSM algorithm correctly answers 47% of a collection of 374 college-level analogy questions (random guessing would yield 20% correct; the average college-bound senior high school student answers about 57% correctly). We motivate this research by applying it to a difficult problem in natural language processing, determining semantic relations in noun-modifier pairs. The problem is to classify a noun-modifier pair, such as "laser printer", according to the semantic relation between the noun (printer) and the modifier (laser). We use a supervised nearest-neighbour algorithm that assigns a class to a given noun-modifier pair by finding the most analogous noun-modifier pair in the training data. With 30 classes of semantic relations, on a collection of 600 labeled noun-modifier pairs, the learning algorithm attains an F value of 26.5% (random guessing: 3.3%). With 5 classes of semantic relations, the F value is 43.2% (random: 20%). The performance is state-of-the-art for both verbal analogies and noun-modifier relations. Peter D. Turney2175Michael L. Littman2005-08-11Z2011-03-11T08:56:09Zhttp://cogprints.org/id/eprint/4501This item is in the repository with the URL: http://cogprints.org/id/eprint/45012005-08-11ZMeasuring Semantic Similarity by Latent Relational AnalysisThis paper introduces Latent Relational Analysis (LRA), a method for measuring semantic similarity. LRA measures similarity in the semantic relations between two pairs of words. When two pairs have a high degree of relational similarity, they are analogous. For example, the pair cat:meow is analogous to the pair dog:bark. There is evidence from cognitive science that relational similarity is fundamental to many cognitive and linguistic tasks (e.g., analogical reasoning). In the Vector Space Model (VSM) approach to measuring relational similarity, the similarity between two pairs is calculated by the cosine of the angle between the vectors that represent the two pairs. The elements in the vectors are based on the frequencies of manually constructed patterns in a large corpus. LRA extends the VSM approach in three ways: (1) patterns are derived automatically from the corpus, (2) Singular Value Decomposition is used to smooth the frequency data, and (3) synonyms are used to reformulate word pairs. This paper describes the LRA algorithm and experimentally compares LRA to VSM on two tasks, answering college-level multiple-choice word analogy questions and classifying semantic relations in noun-modifier expressions. LRA achieves state-of-the-art results, reaching human-level performance on the analogy questions and significantly exceeding VSM performance on both tasks.Peter D. Turney21752005-04-12Z2011-03-11T08:55:55Zhttp://cogprints.org/id/eprint/4204This item is in the repository with the URL: http://cogprints.org/id/eprint/42042005-04-12ZOn Parsing CHILDESResearch on child language acquisition would benefit from the availability of a large body of syntactically parsed utterances between parents and children. We consider the problem of generating such a ``treebank'' from the CHILDES corpus, which currently contains primarily orthographically transcribed speech tagged for lexical category.Aarre Laakso2004-12-11Z2011-03-11T08:55:45Zhttp://cogprints.org/id/eprint/3981This item is in the repository with the URL: http://cogprints.org/id/eprint/39812004-12-11ZHuman-Level Performance on Word Analogy Questions by Latent Relational AnalysisThis paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, machine translation, and information retrieval. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes. When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason/stone is analogous to the pair carpenter/wood; the relations between mason and stone are highly similar to the relations between carpenter and wood. Past work on semantic similarity measures has mainly been concerned with attributional similarity. For instance, Latent Semantic Analysis (LSA) can measure the degree of similarity between two words, but not between two relations. Recently the Vector Space Model (VSM) of information retrieval has been adapted to the task of measuring relational similarity, achieving a score of 47% on a collection of 374 college-level multiple-choice word analogy questions. In the VSM approach, the relation between a pair of words is characterized by a vector of frequencies of predefined patterns in a large corpus. LRA extends the VSM approach in three ways: (1) the patterns are derived automatically from the corpus (they are not predefined), (2) the Singular Value Decomposition (SVD) is used to smooth the frequency data (it is also used this way in LSA), and (3) automatically generated synonyms are used to explore reformulations of the word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the average human score of 57%. On the related problem of classifying noun-modifier relations, LRA achieves similar gains over the VSM, while using a smaller corpus.Peter D. Turney21752005-01-10Z2011-03-11T08:55:49Zhttp://cogprints.org/id/eprint/4027This item is in the repository with the URL: http://cogprints.org/id/eprint/40272005-01-10ZCombining Independent Modules in Lexical Multiple-Choice ProblemsExisting statistical approaches to natural language problems are very
coarse approximations to the true complexity of language processing.
As such, no single technique will be best for all problem
instances. Many researchers are examining ensemble methods that
combine the output of multiple modules to
create more accurate solutions. This paper examines three merging
rules for combining probability distributions: the familiar mixture
rule, the logarithmic rule, and a novel product rule.
These rules were applied with state-of-the-art results to two
problems used to assess human mastery of lexical
semantics -- synonym questions and analogy questions. All three
merging rules result in ensembles that are more accurate than any of
their component modules. The differences among the three rules are not statistically
significant, but it is suggestive that the popular mixture rule
is not the best rule for either of the two problems.Peter D. Turney2175Michael L. LittmanJeffrey BighamVictor Shnayder2004-06-05Z2011-03-11T08:55:36Zhttp://cogprints.org/id/eprint/3657This item is in the repository with the URL: http://cogprints.org/id/eprint/36572004-06-05ZFrequency Value Grammar and Information TheoryI previously laid the groundwork for Frequency Value Grammar (FVG) in papers I submitted in the proceedings of the 4th International Conference on Cognitive Science (2003), Sydney Australia, and Corpus Linguistics Conference (2003), Lancaster, UK. FVG is a formal syntax theoretically based in large part on Information Theory principles. FVG relies on dynamic physical principles external to the corpus which shape and mould the corpus whereas generative grammar and other formal syntactic theories are based exclusively on patterns (fractals) found occurring within the well-formed portion of the corpus. However, FVG should not be confused with Probability Syntax, (PS), as described by Manning (2003). PS is a corpus based approach that will yield the probability distribution of possible syntax constructions over a fixed corpus. PS makes no distinction between well and ill formed sentence constructions and assumes everything found in the corpus is well formed. In contrast, FVG’s primary objective is to distinguish between well and ill formed sentence constructions and, in so doing, relies on corpus based parameters which determine sentence competency. In PS, a syntax of high probability will not necessarily yield a well formed sentence. However, in FVG, a syntax or sentence construction of high ‘frequency value’ will yield a well-formed sentence, at least, 95% of the time satisfying most empirical standards. Moreover, in FVG, a sentence construction of ‘high frequency value’ could very well be represented by an underlying syntactic construction of low probability as determined by PS. The characteristic ‘frequency values’ calculated in FVG are not measures of probability but rather are fundamentally determined values derived from exogenous principles which impact and determine corpus based parameters serving as an index of sentence competency. The theoretical framework of FVG has broad applications beyond that of formal syntax and NLP. In this paper, I will demonstrate how FVG can be used as a model for improving the upper bound calculation of entropy of written English. Generally speaking, when a function word precedes an open class word, the backward n-gram analysis will be homomorphic with the information source and will result in frequency values more representative of co-occurrences in the information source.
Asa M Stepak2004-07-30Z2011-03-11T08:55:39Zhttp://cogprints.org/id/eprint/3732This item is in the repository with the URL: http://cogprints.org/id/eprint/37322004-07-30ZWord Sense Disambiguation by Web Mining for Word Co-occurrence ProbabilitiesThis paper describes the National Research Council (NRC)
Word Sense Disambiguation (WSD) system, as applied to the
English Lexical Sample (ELS) task in Senseval-3. The NRC system
approaches WSD as a classical supervised machine learning problem,
using familiar tools such as the Weka machine learning software
and Brill's rule-based part-of-speech tagger. Head words are
represented as feature vectors with several hundred features.
Approximately half of the features are syntactic and the other
half are semantic. The main novelty in the system is the method for
generating the semantic features, based on word co-occurrence
probabilities. The probabilities are estimated using
the Waterloo MultiText System with a corpus of about one terabyte of
unlabeled text, collected by a web crawler.Peter D. Turney2003-07-16Z2011-03-11T08:55:18Zhttp://cogprints.org/id/eprint/3054This item is in the repository with the URL: http://cogprints.org/id/eprint/30542003-07-16ZAnchoring of semiotic symbolsThis paper presents arguments for approaching the anchoring problem using {\em semiotic symbols}. Semiotic symbols are defined by a triadic relation between forms, meanings and referents, thus having an implicit relation to the real world.Anchors are formed between these three elements rather than between `traditional' symbols and sensory images. This allows an optimization between the form (i.e. the `traditional' symbol) and the referent. A robotic experiment based on adaptive language games illustrates how the anchoring of semiotic symbols can be achieved in a bottom-up fashion. The paper concludes that applying semiotic symbols is a potentially valuable approach toward anchoring.Paul Vogt2003-09-19Z2011-03-11T08:55:20Zhttp://cogprints.org/id/eprint/3163This item is in the repository with the URL: http://cogprints.org/id/eprint/31632003-09-19ZCombining independent modules to solve multiple-choice synonym and analogy problems
Existing statistical approaches to natural language problems are very
coarse approximations to the true complexity of language processing.
As such, no single technique will be best for all problem instances.
Many researchers are examining ensemble methods that combine the
output of successful, separately developed modules to create more
accurate solutions. This paper examines three merging rules for
combining probability distributions: the well known mixture rule, the
logarithmic rule, and a novel product rule. These rules were applied
with state-of-the-art results to two problems commonly used to assess
human mastery of lexical semantics -- synonym questions and analogy
questions. All three merging rules result in ensembles that are more
accurate than any of their component modules. The differences among the
three rules are not statistically significant, but it is suggestive
that the popular mixture rule is not the best rule for either of the
two problems.Peter TurneyMichael LittmanJeffrey BighamVictor Shnayder2003-07-16Z2011-03-11T08:55:19Zhttp://cogprints.org/id/eprint/3059This item is in the repository with the URL: http://cogprints.org/id/eprint/30592003-07-16ZGrounded lexicon formation without explicit reference transfer: who's talking to who?This paper presents a first investigation regarding lexicon grounding and evolution under an iterated learning regime without an explicit transfer of reference. In the original iterated learning framework, a population contains adult speakers and learning hearers. In this paper I investigate the effects of allowing both adults and learners to take up the role of speakers and hearers with varying probabilities. The results indicate that when adults and learners can be selected as speakers and hearers, their lexicons become more similar but at the cost of reduced success in communication.Paul Vogt2003-07-16Z2011-03-11T08:55:18Zhttp://cogprints.org/id/eprint/3053This item is in the repository with the URL: http://cogprints.org/id/eprint/30532003-07-16ZInvestigating social interaction strategies for bootstrapping lexicon developmentThis paper investigates how different modes of social interactions influence the bootstrapping and evolution of lexicons. This is done by comparing three language game models that differ in the type of social interactions they use. The simulations show that the language games which use either joint attention or corrective feedback as a source of contextual input are better capable of bootstrapping a lexicon than the game without such directed interactions. The simulation of the latter game, however, does show that it is possible to develop a lexicon without using directed input when the lexicon is transmitted from generation to generation.Paul VogtHans Coumans2003-07-16Z2011-03-11T08:55:18Zhttp://cogprints.org/id/eprint/3057This item is in the repository with the URL: http://cogprints.org/id/eprint/30572003-07-16ZIterated learning and grounding: from holistic to compositional languagesThis paper presents a new computational model for studying the origins and evolution of compositional languages grounded through the interaction between agents and their environment. The model is based on previous work on adaptive grounding of lexicons and the iterated learning model. Although the model is still in a developmental phase, the first results show that a compositional language can emerge in which the structure reflects regularities present in the population's environment.Paul Vogt2003-07-25Z2011-03-11T08:55:19Zhttp://cogprints.org/id/eprint/3084This item is in the repository with the URL: http://cogprints.org/id/eprint/30842003-07-25ZLearning Analogies and Semantic RelationsWe present an algorithm for learning from unlabeled text, based on the
Vector Space Model (VSM) of information retrieval, that can solve verbal
analogy questions of the kind found in the Scholastic Aptitude Test (SAT).
A verbal analogy has the form A:B::C:D, meaning "A is to B as C is to D";
for example, mason:stone::carpenter:wood. SAT analogy questions provide
a word pair, A:B, and the problem is to select the most analogous word
pair, C:D, from a set of five choices. The VSM algorithm correctly
answers 47% of a collection of 374 college-level analogy questions
(random guessing would yield 20% correct). We motivate this research by
relating it to work in cognitive science and linguistics, and by applying
it to a difficult problem in natural language processing, determining
semantic relations in noun-modifier pairs. The problem is to classify a
noun-modifier pair, such as "laser printer", according to the semantic
relation between the noun (printer) and the modifier (laser). We use a
supervised nearest-neighbour algorithm that assigns a class to a given
noun-modifier pair by finding the most analogous noun-modifier pair in
the training data. With 30 classes of semantic relations, on a collection
of 600 labeled noun-modifier pairs, the learning algorithm attains an F
value of 26.5% (random guessing: 3.3%). With 5 classes of semantic
relations, the F value is 43.2% (random: 20%). The performance is
state-of-the-art for these challenging problems.Peter TurneyMichael Littman2003-09-19Z2011-03-11T08:55:20Zhttp://cogprints.org/id/eprint/3164This item is in the repository with the URL: http://cogprints.org/id/eprint/31642003-09-19ZMeasuring praise and criticism: Inference of semantic orientation from associationThe evaluative character of a word is called its semantic orientation. Positive semantic orientation indicates praise (e.g., "honest", "intrepid") and negative semantic orientation indicates criticism (e.g., "disturbing", "superfluous"). Semantic orientation varies in both direction (positive or negative) and degree (mild to strong). An automated system for measuring semantic orientation would have application in text classification, text filtering, tracking opinions in online discussions, analysis of survey responses, and automated chat systems (chatbots). This paper introduces a method for inferring the semantic orientation of a word from its statistical association with a set of positive and negative paradigm words. Two instances of this approach are evaluated, based on two different statistical measures of word association: pointwise mutual information (PMI) and latent semantic analysis (LSA). The method is experimentally tested with 3,596 words (including adjectives, adverbs, nouns, and verbs) that have been manually labeled positive (1,614 words) and negative (1,982 words). The method attains an accuracy of 82.8% on the full test set, but the accuracy rises above 95% when the algorithm is allowed to abstain from classifying mild words. Peter TurneyMichael Littman2003-04-15Z2011-03-11T08:55:15Zhttp://cogprints.org/id/eprint/2874This item is in the repository with the URL: http://cogprints.org/id/eprint/28742003-04-15ZA Proposed Mathematical Theory Explaining Word Order Typology In this paper I attempt to lay the groundwork for an algorithm that measures sentence competency.
Heretofore, competency of sentences was determined by interviewing speakers of the language. The data compiled forms the basis for grammatical rules that establish the generative grammar of a language. However, the generative grammar, once established, does not filter out all incompetent sentences. Chomsky has noted that there are many sentences that are grammatical but do not satisfy the notion of competency and, similarly, many non-grammatical constructions that do.
I propose that generative grammar constructions as well as formal theory frameworks such as Transformational Grammar, Minimalist Theory, and Government and Binding do not represent the most irreducible component of a language that determines sentence competency. I propose a Mathematical Theory governing word order typology that explains not only the established generative grammar rules of a language but, also, lays the groundwork for understanding sentence competency in terms of irreducible components that has not been accounted for in previous formal theories. I have done so by relying on a mathematical analysis of word frequency relationships based upon large, representative corpuses that represents a more basic component of sentence construction overlooked by current text processing and artificial intelligence parsing systems and unaccounted for by the generative grammar rules of a language.
Asa Stepak2003-07-16Z2011-03-11T08:55:18Zhttp://cogprints.org/id/eprint/3058This item is in the repository with the URL: http://cogprints.org/id/eprint/30582003-07-16ZTHSim v3.2: The Talking Heads simulation toolThe field of language evolution and computation may benefit from using efficient and robust simulation tools that are based on widely exploited principles within the field. The tool presented in this paper is one that could fulfil such needs. The paper presents an overview of the tool -- THSim v3.2 -- and discusses some research questions that can be investigated with it.Paul Vogt2003-03-12Z2011-03-11T08:55:07Zhttp://cogprints.org/id/eprint/2658This item is in the repository with the URL: http://cogprints.org/id/eprint/26582003-03-12ZPhonemic Coding Might Result From
Sensory-Motor Coupling DynamicsHuman sound systems are invariably phonemically coded. Furthermore,
phoneme inventories follow very particular tendancies. To explain
these phenomena, there existed so far three kinds of approaches :
``Chomskyan''/cognitive innatism, morpho-perceptual innatism
and the more recent approach of ``language as a complex cultural system
which adapts under the pressure of efficient communication''.
The two first approaches are clearly not satisfying, while
the third, even if much more convincing,
makes a lot of speculative assumptions and did not
really bring answers to the question of phonemic coding. We propose
here a new hypothesis based on a low-level model of
sensory-motor interactions. We show that certain very
simple and non language-specific neural devices
allow a population of agents to build signalling systems
without any functional pressure. Moreover, these systems
are phonemically coded. Using a realistic vowel articulatory
synthesizer, we show that the inventories of vowels
have striking similarities with human vowel systems.Pierre-Yves Oudeyer2003-07-16Z2011-03-11T08:55:18Zhttp://cogprints.org/id/eprint/3055This item is in the repository with the URL: http://cogprints.org/id/eprint/30552003-07-16ZThe physical symbol grounding problemThis paper presents an approach to solve the symbol grounding problem within the framework of embodied cognitive science. It will be argued that symbolic structures can be used within the paradigm of embodied cognitive science by adopting an alternative definition of a symbol. In this alternative definition, the symbol may be viewed as a structural coupling between an agent's sensorimotor activations and its environment. A robotic experiment is presented in which mobile robots develop a symbolic structure from scratch by engaging in a series of language games. In this experiment it is shown that robots can develop a symbolic structure with which they can communicate the names of a few objects with a remarkable degree of success. It is further shown that, although the referents may be interpreted differently on different occasions, the objects are usually named with only one form.Paul Vogt2002-01-16Z2011-03-11T08:54:52Zhttp://cogprints.org/id/eprint/2036This item is in the repository with the URL: http://cogprints.org/id/eprint/20362002-01-16ZThe adaptive advantage of symbolic theft over sensorimotor toil: Grounding language in perceptual categoriesUsing neural nets to simulate learning and the genetic algorithm to simulate evolution in a toy world of mushrooms and mushroom-foragers, we place two ways of acquiring categories into direct competition with one another: In (1) "sensorimotor toil,” new categories are acquired through real-time, feedback-corrected, trial and error experience in sorting them. In (2) "symbolic theft,” new categories are acquired by hearsay from propositions – boolean combinations of symbols describing them. In competition, symbolic theft always beats sensorimotor toil. We hypothesize that this is the basis of the adaptive advantage of language. Entry-level categories must still be learned by toil, however, to avoid an infinite regress (the “symbol grounding problem”). Changes in the internal representations of categories must take place during the course of learning by toil. These changes can be analyzed in terms of the compression of within-category similarities and the expansion of between-category differences. These allow regions of similarity space to be separated, bounded and named, and then the names can be combined and recombined to describe new categories, grounded recursively in the old ones. Such compression/expansion effects, called "categorical perception" (CP), have previously been reported with categories acquired by sensorimotor toil; we show that they can also arise from symbolic theft alone. The picture of natural language and its origins that emerges from this analysis is that of a powerful hybrid symbolic/sensorimotor capacity, infinitely superior to its purely sensorimotor precursors, but still grounded in and dependent on them. It can spare us from untold time and effort learning things the hard way, through direct experience, but it remain anchored in and translatable into the language of experience.Angelo CangelosiStevan Harnad2002-01-11Z2011-03-11T08:54:52Zhttp://cogprints.org/id/eprint/2016This item is in the repository with the URL: http://cogprints.org/id/eprint/20162002-01-11ZEvolution of communication and language using signals, symbols and wordsThis paper describes different types of models for the evolution of communication and language. It uses the distinction between signals, symbols, and words for the analysis of evolutionary models of language. In particular, it show how evolutionary computation techniques, such as artificial life, can be used to study the emergence of syntax and symbols from simple communication signals. Initially, a computational model that evolves repertoires of isolated signals is presented. This study has simulated the emergence of signals for naming foods in a population of foragers. This type of model studies communication systems based on simple signal-object associations. Subsequently, models that study the emergence of grounded symbols are discussed in general, including a detailed description of a work on the evolution of simple syntactic rules. This model focuses on the emergence of symbol-symbol relationships in evolved languages. Finally, computational models of syntax acquisition and evolution are discussed. These different types of computational models provide an operational definition of the signal/symbol/word distinction. The simulation and analysis of these types of models will help to understand the role of symbols and symbol acquisition in the origin of language.Angelo Cangelosi2001-11-27Z2011-03-11T08:54:50Zhttp://cogprints.org/id/eprint/1926This item is in the repository with the URL: http://cogprints.org/id/eprint/19262001-11-27ZHumanoid Theory GroundingIn this paper we consider the importance of using a humanoid physical form for a certain proposed kind of robotics, that of theory grounding. Theory grounding involves grounding the theory skills and knowledge of an embodied artificially intelligent (AI) system by developing theory skills and knowledge from the bottom up. Theory grounding can potentially occur in a variety of domains, and the particular domain considered here is that of language. Language is taken to be another problem space in which a system can explore and discover solutions. We argue that because theory grounding necessitates robots experiencing domain information, certain behavioral-form aspects, such as abilities to socially smile, point, follow gaze, and generate manual gestures, are necessary for robots grounding a humanoid theory of language.Christopher G. PrinceEric J. Mislivec2003-07-16Z2011-03-11T08:55:18Zhttp://cogprints.org/id/eprint/3056This item is in the repository with the URL: http://cogprints.org/id/eprint/30562003-07-16ZBootstrapping grounded symbols by minimal autonomous robotsIn this paper an experiment is presented in which two mobile robots develop a shared lexicon of which the meanings are grounded in the real world. The robots start without a lexicon nor shared meanings and play language games in which they generate new meanings and negotiate words for these meanings. The experiment tries to find the minimal conditions under which verbal communication may begin to evolve. The robots are autonomous in terms of computing and cognition, but they are otherwise far simpler than most, if not all animals. It is demonstrated that a lexicon nevertheless can be made to emerge even though there are strong limits on the size and stability of this lexicon.Paul Vogt2000-10-17Z2011-03-11T08:54:25Zhttp://cogprints.org/id/eprint/1033This item is in the repository with the URL: http://cogprints.org/id/eprint/10332000-10-17ZQuantitative Neural Network Model of the Tip-of-the-Tongue Phenomenon Based on Synthesized Memory-Psycholinguistic-Metacognitive ApproachA new three-stage computer artificial neural network model of the tip-of-the-tongue phenomenon is proposed. Each words node is build from some interconnected learned auto-associative two-layer neural networks each of which represents separate words semantic, lexical, or phonological components. The model synthesizes memory, psycholinguistic, and metamemory approaches, bridges speech errors and naming chronometry research traditions, and can explain quantitatively many tip-of-the-tongue effectsPetro M. Gopych2000-03-01Z2011-03-11T08:54:04Zhttp://cogprints.org/id/eprint/554This item is in the repository with the URL: http://cogprints.org/id/eprint/5542000-03-01ZProspects for in-depth story understanding by computerWhile much research on the hard problem of in-depth story understanding by computer was performed starting in the 1970s, interest shifted in the 1990s to information extraction and word sense disambiguation. Now that a degree of success has been achieved on these easier problems, I propose it is time to return to in-depth story understanding. In this paper I examine the shift away from story understanding, discuss some of the major problems in building a story understanding system, present some possible solutions involving a set of interacting understanding agents, and provide pointers to useful tools and resources for building story understanding systems.Erik T. Mueller1999-08-20Z2011-03-11T08:53:44Zhttp://cogprints.org/id/eprint/219This item is in the repository with the URL: http://cogprints.org/id/eprint/2191999-08-20ZBook Review--Ronald Cole (editor-in-chief), Joseph Mariani, Hans Uszkoreit, Annie Zaenen, and Victor Zue, eds., Survey of the State of the Art in Human Language TechnologyThis is a review of Survey of the State of the Art in Human Language Technology, edited by Ronald Cole (editor-in-chief), Joseph Mariani, Hans Uszkoreit, Annie Zaenen, and Victor Zue, published by Cambridge University Press in 1997.Varol Akman1999-04-21Z2011-03-11T08:53:44Zhttp://cogprints.org/id/eprint/218This item is in the repository with the URL: http://cogprints.org/id/eprint/2181999-04-21ZCorrelates of linguistic rhythm in the speech signalSpoken languages have been classified by linguists according to their rhythmic properties, and psycholinguists have relied on this classification to account for infants capacity to discriminate languages. Although researchers have measured many speech signal properties, they have failed to identify reliable acoustic characteristics for language classes. This paper presents instrumental measurements based on a consonant/vowel segmentation for eight languages. The measurements suggest that intuitive rhythm types reflect specific phonological properties, which in turn are signaled by the acoustic/phonetic properties of speech. The data support the notion of rhythm classes and also allow the simulation of infant language discrimination, consistent with the hypothesis that newborns rely on a coarse segmentation of speech. A hypothesis is proposed regarding the role of rhythm perception in language acquisition.Franck RamusMarina NesporJacques Mehler2002-01-11Z2011-03-11T08:54:52Zhttp://cogprints.org/id/eprint/2022This item is in the repository with the URL: http://cogprints.org/id/eprint/20222002-01-11ZModeling the evolution of communication: From stimulus associations to grounded symbolic associationsThis paper describes a model for the evolution of communication systems using simple syntactic rules, such as word combinations. It also focuses on the distinction between simple word-object associations and symbolic relationships. The simulation method combines the use of neural networks and genetic algorithms. The behavioral task is influenced by Savage-Rumbaugh & Rumbaughs (1978) ape language experiments. The results show that languages that use combination of words (e.g. verb-object rule) can emerge by auto-organization and cultural transmission. Neural networks are tested to see if evolved languages are based on symbol acquisition. The implications of this model for Deacons (1997) hypothesis on the role of symbolic acquisition for the origin of language are discussed.Angelo Cangelosi2000-11-12Z2011-03-11T08:54:26Zhttp://cogprints.org/id/eprint/1092This item is in the repository with the URL: http://cogprints.org/id/eprint/10922000-11-12ZWorking with Constrained Systems: A Review of A. K. Joshi's IJCAI-97 Research Excellence Award Acceptance LectureThis is a brief review of Joshi's award acceptance lecture published in <I>AI Magazine</I>. This review appeared in the AI Watch column in <I>Computers and Society</I>, a quarterly magazine.Joseph S. Fulda1998-11-10Z2011-03-11T08:53:44Zhttp://cogprints.org/id/eprint/214This item is in the repository with the URL: http://cogprints.org/id/eprint/2141998-11-10ZAn Analysis of English Punctuation: The Special Case of CommaPunctuation has usually been ignored by researchers in computational linguistics over the years. Recently, it has been realized that a true understanding of written language will be impossible if punctuation marks are not taken into account. This paper contains the details of a computer-aided exercise to investigate English punctuation practice for the special case of comma (the most significant punctuation mark) in a parsed corpus. The study classifies the various ``structural'' uses of the comma according to the syntax-patterns in which a comma occurs. The corpus (Penn Treebank) consists of syntactically annotated sentences with no part-of-speech tag information about individual words.Murat BayraktarBilge SayVarol Akman1998-07-29Z2011-03-11T08:53:44Zhttp://cogprints.org/id/eprint/213This item is in the repository with the URL: http://cogprints.org/id/eprint/2131998-07-29ZChoice Factors in TranslationIn this article, grammatical forms in context are viewed as processual patterns of choice activity. A hierarchy of choice factors is presented, using the example of the present perfect forms in parallel translations from Russian into several languages. To ensure adequacy of comparison, the notions of grammatical contextual complex and universal grammatical integral are introduced and used as the required tertium comparationis . Particular attention is devoted to the interplay of universal and language-specific features in processes of grammatical choice in translation. Les formes grammaticales en contexte sont envisagées comme des modèles procéduraux d'une activité de choix. Une hiérarchie de facteurs sélectifs est établie à partir des formes du passé composé dans des traductions parallèles du russe en plusieurs langues. Pour assurer l'adéquation de la comparaison, on distingue les notions de complèxe grammatical contextuel et de constituant grammatical universel; ces deux notions servent de tertium comparationis. Les processus de choix grammaticaux en traduction impliquent une interaction entre des traits linguistiques universaux et ceux qui sont particuliers aux langues.Vyacheslav B. Kashkin1998-07-06Z2011-03-11T08:53:44Zhttp://cogprints.org/id/eprint/209This item is in the repository with the URL: http://cogprints.org/id/eprint/2091998-07-06ZDashes as Typographical Cues for the Information StructureWe take em-dash as our sample punctuation mark and examine its usage from a discourse perspective, using sentences from well-known corpora. We particularly comment on how dashes can give hints on information structure, focus, and anaphora. Throughout the paper Discourse Representation Theory is used as a framework.Bilge SayVarol Akman2006-08-06Z2011-03-11T08:56:33Zhttp://cogprints.org/id/eprint/5045This item is in the repository with the URL: http://cogprints.org/id/eprint/50452006-08-06ZDescription Theory, LTAGs and Underspecified SemanticsAn attractive way to model
the relation between an underspecified syntactic representation and
its completions is to let the underspecified representation correspond
to a logical description and the completions to the
models of that description. This approach, which underlies the
Description Theory of (Marcus et al. 1983) has been integrated
in (Vijay-Shanker 1992) with a pure unification approach to
Lexicalized Tree-Adjoining
Grammars (Joshi et al.\ 1975, Schabes 1990). We generalize
Description Theory by integrating semantic
information, that is, we propose to tackle both syntactic and
semantic underspecification using descriptions.Reinhard MuskensEmiel Krahmer1998-06-24Z2011-03-11T08:53:44Zhttp://cogprints.org/id/eprint/205This item is in the repository with the URL: http://cogprints.org/id/eprint/2051998-06-24ZThe evolution of a lexicon and meaning in robotic agents through self-organizationThis paper discusses interdisciplinary experiments, combining robotics and evolutionary computational linguistics. The goal of the experiments is to investigate if robotic agents can originate a language, in particular a lexicon. In the experiments two robots engage in a series of so-called language games. Starting from the assumption that the robots know how to communicate and are able to detect some sensory information from the environment, the agents ground conceptual meaning and develop a lexicon. The experiments show that the robots are able to form a shared communication system. The paper investigates the influence of using non-linguistic information in the formation of the lexicon, which takes the form of pointing (1) to indicate the topic of the language game, and (2) to give feedback on the outcome of the game.Paul Vogt1998-11-12Z2011-03-11T08:53:44Zhttp://cogprints.org/id/eprint/216This item is in the repository with the URL: http://cogprints.org/id/eprint/2161998-11-12ZAn Information-Based Treatment of Punctuation in Discourse Representation TheoryPunctuation has so far attracted attention within the linguistics community mostly from a syntactic perspective. In this paper, we give a preliminary account of the information-based aspects of punctuation, drawing our points from assorted, naturally occurring sentences. We present our formal models of these sentences and the semantic contributions of punctuation marks. Our formalism is a simplified analogue of an extension--due to Nicholas Asher--of Discourse Representation Theory.Bilge SayVarol Akman1998-03-24Z2011-03-11T08:53:43Zhttp://cogprints.org/id/eprint/189This item is in the repository with the URL: http://cogprints.org/id/eprint/1891998-03-24ZThe interaction between numerals and nounsThis paper is a descriptive survey of the principal phenomena surrounding cardinal numerals in attribution to nouns, with some concentration on European languages, but within a world-wide perspective. The paper is focussed on describing the syntagmatic distribution and the internal structure of numerals. By contrast, the important topic of the paradigmatic context of numerals, that is how their structure and behavior related to those of quantifiers, determiners, adjectives, and nouns, does not receive systematic discussion here, although many relevant comments are made in passing. A further necessary limitation in scope is the exclusion of forms which are only marginally cardinal numerals, if at all, such as English both, dozen, fourscore, pair, triple and their counterparts in other languages.Jim Hurford1998-05-22Z2011-03-11T08:54:10Zhttp://cogprints.org/id/eprint/665This item is in the repository with the URL: http://cogprints.org/id/eprint/6651998-05-22ZModels of Speaking (To Their Amazement) Meet Speech-Synchronized GesturesThe chapters in this volume have generally accepted the argument that speech-gesture integration is basic to language use. But what explains the integration itself? I will attempt to make the case that it can be understood with the concept of a `growth point' or GP (McNeill & Duncan this volume) It is called a GP since it is a theoretical unit in which principles that explain mental growth -- differentiation, internalization, dialectic, and reorganization -- apply to realtime utterance generation by adults (and children). It is also called a GP since it is meant to be the initial form of a thinking-while-speaking unit out of which a dynamic process of organization emerges. The emergence unpacks the GP into a surface utterance and gesture that articulates its meaning implications.David McNeill1999-06-27Z2011-03-11T08:54:02Zhttp://cogprints.org/id/eprint/544This item is in the repository with the URL: http://cogprints.org/id/eprint/5441999-06-27ZThought as word dynamicsA Hebbian model for speech generation opens a number of paths. A cross-linguistic scheme of functional relationships (inspired by Aristotle) dispenses with distraction by the "parts of speech" distinctions, while bridging the gap between "contents" and "structure" words. A gradient model identifies emotional and rational dynamics and shows speech generation as a process where a speaker's dissatisfaction gets minimised.Paul Jorion1998-06-16Z2011-03-11T08:53:58Zhttp://cogprints.org/id/eprint/460This item is in the repository with the URL: http://cogprints.org/id/eprint/4601998-06-16ZThe Use of Situation Theory in Context ModelingAt the heart of natural language processing is the understanding of context dependent meanings. This paper presents a preliminary model of formal contexts based on situation theory. It also gives a worked-out example to show the use of contexts in lifting, i.e., how propositions holding in a particular context transform when they are moved to another context. This is useful in NLP applications where preserving meaning is a desideratum.Varol AkmanMehmet Surav1998-06-16Z2011-03-11T08:53:43Zhttp://cogprints.org/id/eprint/198This item is in the repository with the URL: http://cogprints.org/id/eprint/1981998-06-16ZCurrent Approaches to Punctuation in Computational LinguisticsSome recent studies in computational linguistics have aimed to take advantage of various cues presented by punctuation marks. This short survey is intended to summarise these research efforts and additionally, to outline a current perspective for the usage and functions of punctuation marks. We conclude by presenting an information-based framework for punctuation, influenced by treatments of several related phenomena in computational linguistics.Bilge SayVarol Akman2011-12-16T00:11:43Z2011-12-16T00:11:43Zhttp://cogprints.org/id/eprint/7709This item is in the repository with the URL: http://cogprints.org/id/eprint/77092011-12-16T00:11:43ZThe Many Functions of Discourse Particles: A Computational Model of Pragmatic InterpretationWe present a connectionist model for the interpretation of discourse
particles in real dialogues that is based on neuronal
principles of categorization (categorical perception, prototype
formation, contextual interpretation). It can be shown that
discourse particles operate just like other morphological and
lexical items with respect to interpretation processes. The description
proposed locates discourse particles in an elaborate
model of communication which incorporates many different
aspects of the communicative situation. We therefore also
attempt to explore the content of the category discourse particle.
We present a detailed analysis of the meaning assignment
problem and show that 80%– 90% correctness for unseen discourse
particles can be reached with the feature analysis provided.
Furthermore, we show that ‘analogical transfer’ from
one discourse particle to another is facilitated if prototypes
are computed and used as the basis for generalization. We
conclude that the interpretation processes which are a part of
the human cognitive system are very similar with respect to
different linguistic items. However, the analysis of discourse
particles shows clearly that any explanatory theory of language
needs to incorporate a theory of communication processes.Gabriele Schelergscheler@gmail.comKerstin Fischer1998-06-16Z2011-03-11T08:53:58Zhttp://cogprints.org/id/eprint/462This item is in the repository with the URL: http://cogprints.org/id/eprint/4621998-06-16ZSituated Nonmonotonic Temporal Reasoning with BABY-SITAfter a review of situation theory and previous attempts at `computational' situation theory, we present a new programming environment, BABY-SIT, which is based on situation theory. We then demonstrate how problems requiring formal temporal reasoning can be solved in this framework. Specifically, the Yale Shooting Problem, which is commonly regarded as a canonical problem for nonmonotonic temporal reasoning, is implemented in BABY-SIT using Yoav Shoham's causal theories.Erkan TinVarol Akman2006-02-05Z2011-03-11T08:56:20Zhttp://cogprints.org/id/eprint/4715This item is in the repository with the URL: http://cogprints.org/id/eprint/47152006-02-05ZCombining Montague Semantics and Discourse RepresentationThis paper embeds the core part of Discourse Representation Theory in the classical theory of types plus a few simple axioms that allow the theory to express key facts about variables and assignments on the object level of the logic. It is shown how the embedding can be used to combine core analyses of natural language phenomena in Discourse Representation Theory with analyses that can be obtained in Montague Semantics.Reinhard Muskens1998-06-15Z2011-03-11T08:53:42Zhttp://cogprints.org/id/eprint/169This item is in the repository with the URL: http://cogprints.org/id/eprint/1691998-06-15ZThe Dilemma of Saussurean CommunicationA Saussurean communication system exists when an entire communicating population uses a single "language" that maps states unambiguously onto symbols and then back into the original states. This paper describes a number of simulations performed with a genetic algorithm to investigate the conditions necessary for such communication systems to evolve. The first simulation shows that Saussurean communication evolves in the simple case where direct selective pressure is placed on individuals to be both good transmitters and good receivers. The second simulation demonstrates that, in the more realistic case where selective pressure is only placed on doing well as a receiver, Saussurean communication fails to evolve. Two methods, inspired by research on the Prisoner's Dilemma, are used to attempt to solve this problem. The third simulation shows that, even in the absence of selective pressure on transmission, Saussurean communication can evolve if individuals interact multiple times with the same communication partner and are given the ability to respond differentially based on past interaction. In the fourth simulation, spatially organized populations are used, and it is shown that this allows Saussurean communication to evolve through kin selection.Michael Oliphant1998-07-06Z2011-03-11T08:53:44Zhttp://cogprints.org/id/eprint/208This item is in the repository with the URL: http://cogprints.org/id/eprint/2081998-07-06ZInformation-Based Aspects of PunctuationWe offer a preliminary account of the information-based aspects of punctuation marks. We give our initial treatment within the Discourse Representation Theory and its segmented version. We hypothesize that this work will be useful in classifying the informational contributions of punctuation marks and bringing them to bear on the semantic characterization of written discourse.Bilge SayVarol Akman1998-07-07Z2011-03-11T08:53:44Zhttp://cogprints.org/id/eprint/210This item is in the repository with the URL: http://cogprints.org/id/eprint/2101998-07-07ZAn Information-Based Treatment of PunctuationPunctuation marks have recently attracted attention within the linguistics community mostly from a syntactic perspective. In this paper, we aim to give a preliminary account of information-based aspects of punctuation marks, drawing our points from examples and links with related phenomena such as intonation. We give our initial treatment within the Discourse Representation Theory.Bilge SayVarol Akman1998-06-19Z2011-03-11T08:53:44Zhttp://cogprints.org/id/eprint/199This item is in the repository with the URL: http://cogprints.org/id/eprint/1991998-06-19ZInformation-Oriented Computation with BABY-SITWhile situation theory and situation semantics provide an appropriate framework for a realistic model-theoretic treatment of natural language, serious thinking on their `computational' aspects has only recently started. Existing proposals mainly offer a Prolog- or Lisp-like programming environment with varying degrees of divergence from the ontology of situation theory. In this paper, we introduce a computational medium (called BABY-SIT) based on situations. The primary motivation underlying BABY-SIT is to facilitate the development and testing of programs in domains ranging from linguistics to artificial intelligence in a unified framework built upon situation-theoretic constructs.Erkan TinVarol Akman1998-06-24Z2011-03-11T08:53:59Zhttp://cogprints.org/id/eprint/478This item is in the repository with the URL: http://cogprints.org/id/eprint/4781998-06-24ZLanguage polygenesis: A probabilistic modelMonogenesis of language is widely accepted, but the conventional argument seems to be mistaken; a simple probabilistic model shows that polygenesis is likely. Other prehistoric inventions are discussed, as are problems in tracing linguistic lineages. Language is a system of representations; within such a system, words can evoke complex and systematic responses. Along with its social functions, language is important to humans as a mental instrument. Indeed, the invention of language,that is the accumulation of symbols to represent emotions, objects, and acts may be the most important event in human evolution, because so many developments follow from it. For example, Edward Sapir speculated that some embryonic form of language must have been available to early man to help him fashion tools from stone (Sapir,1921). Sophisticated biface stone tools date to early Homo erectus some 1.5 million years ago, suggesting a similar age for language. This paper considers whether the invention of language occurred at only one pre-historic site or at several sites. In other words, did language emerge by monogenesis or polygenesis? Early thinkers believed in monogenesis, against a background of divine creation. Perhaps the best known account is the biblical story of Adam giving names to plants and animals in the Garden of Eden. Similar legends are found among many peoples. Modern linguists too assume monogenesis, but on probabilistic grounds (see, for instance, Southworth and Daswani, 1974, p.314). The argument seems to be that the invention of language is an extremely unlikely event, because symbolization involves abstraction and requires synchronized insight by several individuals; therefore, the probability of occurrence at more than one site must be vanishingly small. We have found no explicit quantitative treatment of this question in the literature, but the underlying logic has to be the multiplication of probabilities. If p is small at one site,then p.p for two sites is smaller still, and so on. This reasoning is false, as we show here. The fallacy lies in the focus on two particular sites rather than consideration of all pairs of sites.David A. FreedmanWilliam Wang1998-06-17Z2011-03-11T08:53:58Zhttp://cogprints.org/id/eprint/464This item is in the repository with the URL: http://cogprints.org/id/eprint/4641998-06-17ZSteps toward Formalizing ContextThe importance of contextual reasoning is emphasized by various researchers in AI. (A partial list includes John McCarthy and his group, R. V. Guha, Yoav Shoham, Giuseppe Attardi and Maria Simi, and Fausto Giunchiglia and his group.) Here, we survey the problem of formalizing context and explore what is needed for an acceptable account of this abstract notion.Varol AkmanMehmet Surav2001-11-18Z2011-03-11T08:54:49Zhttp://cogprints.org/id/eprint/1897This item is in the repository with the URL: http://cogprints.org/id/eprint/18972001-11-18ZSubsymbolic Case-Role Analysis
of Sentences with Embedded ClausesA distributed neural network model called SPEC for processing sentences with recursive relative clauses is described. The model is based on separating the tasks of segmenting the input word sequence into clauses, forming the case-role representations, and keeping track of the recursive embeddings into different modules. The system needs to be trained only with the basic sentence constructs, and it generalizes not only to new instances of familiar relative clause structures, but to novel structures as well. SPEC exhibits plausible memory degradation as the depth of the center embeddings increases, its memory is primed by earlier constituents, and its performance is aided by semantic constraints between the constituents. The ability to process structure is largely due to a central executive network that monitors and controls the execution of the entire system. This way, in contrast to earlier subsymbolic systems, parsing is modeled as a controlled high-level process rather than one based on automatic reflex responses.
Risto Miikkulainen2000-07-21Z2011-03-11T08:54:21Zhttp://cogprints.org/id/eprint/885This item is in the repository with the URL: http://cogprints.org/id/eprint/8852000-07-21ZA Timing Model for Fast FrenchModels of speech timing are of both fundamental and applied interest. At the fundamental level, the prediction of time periods occupied by syllables and segments is required for general models of speech prosody and segmental structure. At the applied level, complete models of timing are an essential component of any speech synthesis system.
Previous research has established that a large number of factors influence various levels of speech timing. Statistical analysis and modelling can identify order of importance and mutual influences between such factors. In the present study, a three-tiered model was created by a modified step-wise statistical procedure. It predicts the temporal structure of French, as produced by a single, highly fluent speaker at a fast speech rate (100 phonologically balanced sentences, hand-scored in the acoustic signal). The first tier models segmental influences due to phoneme type and contextual interactions between phoneme types. The second tier models syllable-level influences of lexical vs. grammatical status of the containing word, presence of schwa and the position within the word. The third tier models utterance-final lengthening.
The complete segmental-syllabic model correlated with the original corpus of 1204 syllables at an overall r = 0.846. Residuals were normally distributed. An examination of subsets of the data set revealed some variation in the closeness of fit of the model.
The results are considered to be useful for an initial timing model, particularly in a speech synthesis context. However, further research is required to extend the model to other speech rates and to examine inter-speaker variability in greater detail. Eric KellerBrigitte Zellner1998-06-26Z2011-03-11T08:53:44Zhttp://cogprints.org/id/eprint/207This item is in the repository with the URL: http://cogprints.org/id/eprint/2071998-06-26ZBook Review -- Hans Kamp and Uwe Reyle, From Discourse to Logic: Introduction to Model-theoretic Semantics of Natural Language, Formal Logic and Discourse Representation TheoryThis is a review of From Discourse to Logic: Introduction to Model-theoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory, by Hans Kamp and Uwe Reyle, published by Kluwer Academic Publishers in 1993.Varol Akman1998-06-19Z2011-03-11T08:53:49Zhttp://cogprints.org/id/eprint/331This item is in the repository with the URL: http://cogprints.org/id/eprint/3311998-06-19ZSituated Modeling of Epistemic PuzzlesSituation theory is a mathematical theory of meaning introduced by Jon Barwise and John Perry. It has evoked great theoretical interest and motivated the framework of a few `computational' systems. PROSIT is the pioneering work in this direction. Unfortunately, there is a lack of real-life applications on these systems and this study is a preliminary attempt to remedy this deficiency. Here, we solve a group of epistemic puzzles using the constructs provided by PROSIT.Murat ErsanVarol Akman1998-06-19Z2011-03-11T08:53:58Zhttp://cogprints.org/id/eprint/467This item is in the repository with the URL: http://cogprints.org/id/eprint/4671998-06-19ZTowards Situation-Oriented Programming LanguagesRecently, there have been some attempts towards developing programming languages based on situation theory. These languages employ situation-theoretic constructs with varying degrees of divergence from the ontology of the theory. In this paper, we review three of these programming languages.Erkan TinVarol AkmanMurat Ersan1998-06-24Z2011-03-11T08:53:59Zhttp://cogprints.org/id/eprint/472This item is in the repository with the URL: http://cogprints.org/id/eprint/4721998-06-24ZModeling Context with SituationsThe issue of context arises in assorted areas of Artificial Intelligence. Although its importance is realized by various researchers, there is not much work towards a useful formalization. In this paper, we will present a preliminary model (based on Situation Theory) and give examples to show the use of context in various fields, and the advantages gained by the acceptance of our proposal.Mehmet SuravVarol Akman1998-06-24Z2011-03-11T08:53:44Zhttp://cogprints.org/id/eprint/204This item is in the repository with the URL: http://cogprints.org/id/eprint/2041998-06-24ZSituations and Computation: An Overview of Recent ResearchSerious thinking about the computational aspects of situation theory is just starting. There have been some recent proposals in this direction (viz. PROSIT and ASTL), with varying degrees of divergence from the ontology of the theory. We believe that a programming environment incorporating bona fide situation-theoretic constructs is needed and describe our very recent BABY-SIT implementation. A detailed critical account of PROSIT and ASTL is also offered in order to compare our system with these pioneering and influential frameworks.Erkan TinVarol Akman1998-06-19Z2011-03-11T08:53:44Zhttp://cogprints.org/id/eprint/200This item is in the repository with the URL: http://cogprints.org/id/eprint/2001998-06-19ZComputational Situation TheorySituation theory has been developed over the last decade and various versions of the theory have been applied to a number of linguistic issues. However, not much work has been done in regard to its computational aspects. In this paper, we review the existing approaches towards `computational situation theory' with considerable emphasis on our own research.Erkan TinVarol Akman2006-01-21Z2011-03-11T08:56:20Zhttp://cogprints.org/id/eprint/4708This item is in the repository with the URL: http://cogprints.org/id/eprint/47082006-01-21ZCategorial Grammar and Discourse Representation TheoryIn this paper it is shown how simple texts that can be parsed
in a Lambek Categorial Grammar can also automatically be provided
with a semantics in the form of a Discourse Representation Structure
in the sense of Kamp [1981]. The assignment of meanings to texts
uses the Curry-Howard-Van Benthem correspondence.Reinhard Muskens2000-07-21Z2011-03-11T08:54:21Zhttp://cogprints.org/id/eprint/884This item is in the repository with the URL: http://cogprints.org/id/eprint/8842000-07-21ZPauses and the temporal structure of speechNatural-sounding speech synthesis requires close control over the temporal structure of the speech flow. This includes a full predictive scheme for the durational structure and in particuliar the prolongation of final syllables of lexemes as well as for the pausal structure in the utterance. In this chapter, a description of the temporal structure and the summary of the numerous factors that modify it are presented. In the second part, predictive schemes for the temporal structure of speech ("performance structures") are introduced, and their potential for characterising the overall prosodic structure of speech is demonstrated.Brigitte Zellner1998-06-23Z2011-03-11T08:53:44Zhttp://cogprints.org/id/eprint/202This item is in the repository with the URL: http://cogprints.org/id/eprint/2021998-06-23ZSituated Processing of Pronominal AnaphoraWe describe a novel approach to the analysis of pronominal anaphora in Turkish. A computational medium which is based on situation theory is used as our implementation tool. The task of resolving pronominal anaphora is demonstrated in this environment which employs situation-theoretic constructs for processing.Erkan TinVarol Akman1998-06-23Z2011-03-11T08:53:44Zhttp://cogprints.org/id/eprint/203This item is in the repository with the URL: http://cogprints.org/id/eprint/2031998-06-23ZBABY-SIT: A Computational Medium Based on SituationsWhile situation theory and situation semantics provide an appropriate framework for a realistic model-theoretic treatment of natural language, serious thinking on their `computational' aspects has just started. Existing proposals mainly offer a Prolog- or Lisp-like programming environment with varying degrees of divergence from the ontology of situation theory. In this paper, we introduce a computational medium (called BABY-SIT) based on situations. The primary motivation underlying BABY-SIT is to facilitate the development and testing of programs in domains ranging from linguistics to artificial intelligence in a unified framework built upon situation-theoretic constructs.Erkan TinVarol Akman2001-06-14Z2011-03-11T08:54:40Zhttp://cogprints.org/id/eprint/1558This item is in the repository with the URL: http://cogprints.org/id/eprint/15582001-06-14ZA Contribution to Reference Semantics of Spatial Prepositions: The Visualization Problem and its Solution in VITRAThe cognitive function of mental images with respect to the referential aspect of language is examined and used in the listener model ANTLIMA of the natural language system SOCCER. An operational realization of the reference relation used to recognize instances of spatial concepts in the results of a vision system and also to visualize locative expressions is presented and compared to A. Herskovits' analysis of the semantics of spatial prepositions.
Jörg R.J. Schirra2006-01-21Z2011-03-11T08:56:19Zhttp://cogprints.org/id/eprint/4704This item is in the repository with the URL: http://cogprints.org/id/eprint/47042006-01-21ZAnaphora and the Logic of ChangeThis paper shows how the dynamic interpretation of natural language introduced in work by Hans Kamp and Irene Heim can be modeled in classical type logic. This provides a synthesis between Richard Montague's theory of natural language semantics and the work by Kamp and Heim. Reinhard Muskens2001-11-18Z2011-03-11T08:54:49Zhttp://cogprints.org/id/eprint/1896This item is in the repository with the URL: http://cogprints.org/id/eprint/18962001-11-18ZNatural Language Processing
with Modular Neural Networks and Distributed LexiconAn approach to connectionist natural language processing is proposed, which is based on hierarchically organized modular Parallel Distributed Processing (PDP) networks and a central lexicon of distributed input/output representations. The modules communicate using these representations, which are global and publicly available in the system. The representations are developed automatically by all networks while they are learning their processing tasks. The resulting representations reflect the regularities in the subtasks, which facilitates robust processing in the face of noise and damage, supports improved generalization, and provides expectations about possible contexts. The lexicon can be extended by cloning new instances of the items, that is, by generating a number of items with known processing properties and distinct identities. This technique combinatorially increases the processing power of the system. The recurrent FGREP module, together with a central lexicon, is used as a basic building block in modeling higher level natural language tasks. A single module is used to form case-role representations of sentences from word-by-word sequential natural language input. A hierarchical organization of four recurrent FGREP modules (the DISPAR system) is trained to produce fully expanded paraphrases of script-based stories, where unmentioned events and role fillers are inferred.Risto MiikkulainenMichael G. Dyer1998-04-30Z2011-03-11T08:53:57Zhttp://cogprints.org/id/eprint/443This item is in the repository with the URL: http://cogprints.org/id/eprint/4431998-04-30ZReview of Rosenfield's "The Invention of Memory"Evidence collected by Bartlett, Collingwood, James, Bransford, Jenkins, and Sacks argues against the memory-as-stored-structures hypothesis, the keystone of expert systems and cognitive modeling research.William J. Clancey1998-06-15Z2011-03-11T08:53:43Zhttp://cogprints.org/id/eprint/197This item is in the repository with the URL: http://cogprints.org/id/eprint/1971998-06-15ZRethinking the language bottleneck: Why don't animals learn to communicate?While most work on the evolution of language has been centered on the evolution of syntax, my focus in this paper is instead on more basic features that separate human communication from the systems of communication used by other animals. In particular, I argue that human language is the only existing system of learned arbitrary reference. While innate communication systems are, by definition, directly transmitted genetically, the transmission of a learned learned systems must be indirect. Learners must acquire the system by being exposed its the use in the community. Although it is reasonable that a learner has access to the utterances that are produced, it is less clear how accessible the meaning is that the utterance is intended to convey. This particularly problematic if the system of communication is symbolic -- where form and meaning are linked in a purely conventional way. Given this, I propose that the ability to transmit a learned symbolic system of communication from one generation to the next represents a key milestone in the evolution of language.Michael Oliphant1998-06-15Z2011-03-11T08:53:43Zhttp://cogprints.org/id/eprint/196This item is in the repository with the URL: http://cogprints.org/id/eprint/1961998-06-15ZThe learning barrier: Moving from innate to learned systems of communicationHuman language is a unique ability. It sits apart from other systems of communication in two striking ways: it is syntactic, and it is learned. While most approaches to the evolution of language have focused on the evolution of syntax, this paper explores the computational issues that arise in shifting from a simple innate communication system to an equally simple one that is learned. Associative network learning within an observational learning paradigm is used to explore the computational difficulties involved in establishing and maintaining a simple learned communication system. Because Hebbian learning is found to be sufficient for this task, it is proposed that the basic computational demands of learning are unlikely to account for the rarity of even simple learned communication systems. Instead, it is the problem of *observing* that is likely to be central -- in particular the problem of determining what meaning a signal is intended to convey.Michael Oliphant