--- abstract: "Based on data from a large-scale experiment with human subjects, we conclude that the logarithm of probability to guess a word in context (unpredictability) depends linearly on the word length. This result holds both for poetry and prose, even though with prose, the subjects don't know the length of the omitted word. We hypothesize that this effect reflects a tendency of natural language to have an even information rate." altloc: - http://www.jip.ru/2006/229-236-2006.pdf chapter: ~ commentary: ~ commref: ~ confdates: ~ conference: ~ confloc: ~ contact_email: ~ creators_id: - manin@pobox.com creators_name: - family: Manin given: Dmitrii honourific: '' lineage: '' date: 2006-12-26 date_type: completed datestamp: 2007-11-13 00:51:03 department: ~ dir: disk0/00/00/58/17 edit_lock_since: ~ edit_lock_until: ~ edit_lock_user: ~ editors_id: [] editors_name: [] eprint_status: archive eprintid: 5817 fileinfo: /style/images/fileicons/application_pdf.png;/5817/1/unpred_article_e.pdf full_text_status: public importid: ~ institution: ~ isbn: ~ ispublished: pub issn: ~ item_issues_comment: [] item_issues_count: 0 item_issues_description: [] item_issues_id: [] item_issues_reported_by: [] item_issues_resolved_by: [] item_issues_status: [] item_issues_timestamp: [] item_issues_type: [] keywords: 'Natural language, information theory, information rate, entropy, experiment, word guessing' lastmod: 2011-03-11 08:57:00 latitude: ~ longitude: ~ metadata_visibility: show note: Text is somewhat extended compared to the published version. number: 3 pagerange: 229-236 pubdom: FALSE publication: Journal of Information Processes publisher: Keldysh Institute of Applied Mathematics (KIAM) RAS refereed: TRUE referencetext: "\\bibitem{Shan51}{Shannon~C.E. Prediction and entropy of printed\r\n English. {\\it Bell System Technical Journal}, 1951, vol.~30, pp.~50--64.}\r\n\\bibitem{Shan48}{Shannon~C.E. A mathematical theory of communication. {\\it Bell System Technical Journal}, 1948, vol.~27, pp.~379--423.}\r\n\\bibitem{BurLick55}{Burton~N.G., Licklider~J.C.R. Long-range\r\n constraints in the statistical structure of printed English. {\\it\r\n American Journal of Psychology}, 1955, vol.~68, no.~4, pp.~650--653}\r\n\\bibitem{Fon}{F\\'onagy~I. Informationsgehalt von wort und laut in der\r\n dichtung. In: {\\it Poetics. Poetyka. Поэтика}. Warszawa:~Pa\\'nstwo\r\n Wydawnictwo Naukowe, 1961, pp.~591--605.}\r\n\\bibitem{Kolm65}{Kolmogorov~A. Three approaches to the quantitative\r\n definition of information. {\\it Problems Inform. Transmission},\r\n 1965, vol.~1, pp.~1--7.}\r\n\\bibitem{Yaglom2}{Yaglom~A.M. and Yaglom~I.M. {\\it Probability and\r\n information} Reidel, Dordrecht, 1983.}\r\n\\bibitem{CK78}{Cover~T.M., King~R.C. A convergent gambling estimate of\r\n the entropy of English. {\\it Information Theory, IEEE Transactions\r\n on}, 1978, vol.~24, no.~4, pp.~413--421.}\r\n\\bibitem{Moradi98}{Moradi~H., Roberts~J.A.,\r\n Grzymala-Busse~J.W. Entropy of English text: Experiments with humans\r\n and a machine learning system based on rough sets. {\\it Inf. Sci.},\r\n 1998, vol.~104, no.~1--2, pp.~31--47.}\r\n\\bibitem{Paisley66}{Paisley~W.J. The effects of authorship, topic\r\n structure, and time of composition on letter redundancy in English\r\n text. {\\it J. Verbal. Behav.}, 1966, vol.~5, pp.~28--34.}\r\n\\bibitem{BrownEtAl92}{Brown~P.F., Della~Pietra~V.J., Mercer~R.L.,\r\n Della~Pietra~S.A., Lai~J.C. An estimate of an upper bound for the\r\n entropy of English. {\\it Comput. Linguist.}, 1992, vol.~18, no.~1, pp.~31--40.}\r\n\\bibitem{Teahan96}{Teahan~W.J., Cleary~J.G. The entropy of English\r\n using PPM-based models. In: {\\it DCC '96: Proceedings of the\r\n Conference on Data Compression}, Washington: IEEE Computer Society, 1996, pp.~53--62.}\r\n\\bibitem{LM1}{Leibov~R.G., Manin~D.Yu. An attempt at experimental\r\n poetics [tentative title]. To be published in: {\\it Proc.\r\n Tartu Univ.} [in Russian], Tartu: Tartu University Press}\r\n\\bibitem{ChurchMercer93}{Church~K.W., Mercer~R.L. Introduction to the\r\n special issue on computational linguistics using large corpora. {\\it\r\n Comput. Linguist.}, 1993, vol.~19, no.~1, pp.~1--24.}\r\n\\bibitem{SG96}{T.Sch\\\"urmann and P.Grassberger. Entropy estimation of\r\n symbol sequences. {\\it Chaos}, 1996, vol.~6, no.~3, pp.~414--427.}\r\n\\bibitem{FreqDict}{Sharoff~S., The frequency dictionary for\r\n Russian. {\\it http://www.artint.ru/projects/frqlist/frqlist-en.asp}}\r\n\\bibitem{HockJoseph}{Hock~H.H., Joseph~B.D. Language History, Language\r\n Change, and Language Relationship. Berlin--New York: Mouton de Gruyter, 1996.}\r\n\\bibitem{GenzelCharniak}{Genzel \\& Charniak, 2002. {\\it Entropy rate constancy in text.}\r\nProc. 40th Annual Meeting of ACL, 199--206.}\r\n\\bibitem{Jaeger06}{Anonymous authors (paper under review), 2006. {\\it Speakers optimize\r\ninformation density through syntactic reduction.} To be published.}\r\n\\bibitem{AylettTurk}{Aylett M. and Turk A., 2004. {\\it The Smooth Signal\r\nRedundancy Hypothesis: A Functional Explanation for Relationships\r\nbetween Redundancy, Prosodic Prominence, and Duration in Spontaneous Speech.} Language and Speech, 47(1),\r\n31--56.}\r\n" relation_type: [] relation_uri: [] reportno: ~ rev_number: 29 series: ~ source: ~ status_changed: 2007-11-13 00:51:03 subjects: - comp-sci-lang - ling-comput succeeds: ~ suggestions: ~ sword_depositor: ~ sword_slug: ~ thesistype: ~ title: Experiments on predictability of word in context and information rate in natural language type: journalp userid: 7373 volume: 6