Berkowitz, Eric and Elkhadiri, Mohamed Reda and Sahouri, Tim and Abraham, Michel (2004) Intelligent Content Based Title and Author Name Extraction from Formatted Documents. [Conference Paper]
Full text available as:
|
PDF
50Kb |
Abstract
This paper describes the development of algorithms for extracting the title and the names of the authors from documents available on the World Wide Web. In this paper we describe several algorithms for doing so in a manner designed not to rely on specific stylistic dictates of any document formatting standard. Rather, they are designed to rely on a combination of overt and subtle cues that form a generalized, common standard for placing this information in a document and its easy extraction by readers.
Item Type: | Conference Paper |
---|---|
Keywords: | Document Classification Indexing |
Subjects: | Computer Science > Language Electronic Publishing > Archives |
ID Code: | 3663 |
Deposited By: | Berkowitz, Professor Eric |
Deposited On: | 05 Jun 2004 |
Last Modified: | 11 Mar 2011 08:55 |
Metadata
- ASCII Citation
- Atom
- BibTeX
- Dublin Core
- EP3 XML
- EPrints Application Profile (experimental)
- EndNote
- HTML Citation
- ID Plus Text Citation
- JSON
- METS
- MODS
- MPEG-21 DIDL
- OpenURL ContextObject
- OpenURL ContextObject in Span
- RDF+N-Triples
- RDF+N3
- RDF+XML
- Refer
- Reference Manager
- Search Data Dump
- Simple Metadata
- YAML
Repository Staff Only: item control page