Stepak, Asa M (2004) Frequency Value Grammar and Information Theory. [Preprint]
This is the latest version of this eprint.
Full text available as:
|
PDF
161Kb |
Abstract
I previously laid the groundwork for Frequency Value Grammar (FVG) in papers I submitted in the proceedings of the 4th International Conference on Cognitive Science (2003), Sydney Australia, and Corpus Linguistics Conference (2003), Lancaster, UK. FVG is a formal syntax theoretically based in large part on Information Theory principles. FVG relies on dynamic physical principles external to the corpus which shape and mould the corpus whereas generative grammar and other formal syntactic theories are based exclusively on patterns (fractals) found occurring within the well-formed portion of the corpus. However, FVG should not be confused with Probability Syntax, (PS), as described by Manning (2003). PS is a corpus based approach that will yield the probability distribution of possible syntax constructions over a fixed corpus. PS makes no distinction between well and ill formed sentence constructions and assumes everything found in the corpus is well formed. In contrast, FVG’s primary objective is to distinguish between well and ill formed sentence constructions and, in so doing, relies on corpus based parameters which determine sentence competency. In PS, a syntax of high probability will not necessarily yield a well formed sentence. However, in FVG, a syntax or sentence construction of high ‘frequency value’ will yield a well-formed sentence, at least, 95% of the time satisfying most empirical standards. Moreover, in FVG, a sentence construction of ‘high frequency value’ could very well be represented by an underlying syntactic construction of low probability as determined by PS. The characteristic ‘frequency values’ calculated in FVG are not measures of probability but rather are fundamentally determined values derived from exogenous principles which impact and determine corpus based parameters serving as an index of sentence competency. The theoretical framework of FVG has broad applications beyond that of formal syntax and NLP. In this paper, I will demonstrate how FVG can be used as a model for improving the upper bound calculation of entropy of written English. Generally speaking, when a function word precedes an open class word, the backward n-gram analysis will be homomorphic with the information source and will result in frequency values more representative of co-occurrences in the information source.
Item Type: | Preprint |
---|---|
Keywords: | Information theory, n-grams, Natural Language, entropy,probability syntax, well-formedness, frequency value, corpus,iconicity, formal syntax, cognitive science. |
Subjects: | Neuroscience > Neurolinguistics Computer Science > Statistical Models Computer Science > Language Linguistics > Computational Linguistics Psychology > Psycholinguistics Psychology > Cognitive Psychology Linguistics > Syntax |
ID Code: | 3657 |
Deposited By: | Stepak, Asa M. |
Deposited On: | 05 Jun 2004 |
Last Modified: | 11 Mar 2011 08:55 |
Available Versions of this Item
-
Frequency Value Grammar and Information Theory. (deposited 08 Jun 2004)
- Frequency Value Grammar and Information Theory. (deposited 05 Jun 2004) [Currently Displayed]
Metadata
- ASCII Citation
- Atom
- BibTeX
- Dublin Core
- EP3 XML
- EPrints Application Profile (experimental)
- EndNote
- HTML Citation
- ID Plus Text Citation
- JSON
- METS
- MODS
- MPEG-21 DIDL
- OpenURL ContextObject
- OpenURL ContextObject in Span
- RDF+N-Triples
- RDF+N3
- RDF+XML
- Refer
- Reference Manager
- Search Data Dump
- Simple Metadata
- YAML
Repository Staff Only: item control page