TY - GEN
ID - cogprints3663
UR - http://cogprints.org/3663/
A1 - Berkowitz, Eric
A1 - Elkhadiri, Mohamed Reda
A1 - Sahouri, Tim
A1 - Abraham, Michel
Y1 - 2004///
N2 - This paper describes the development of algorithms for
extracting the title and the names of the authors from
documents available on the World Wide Web. In this
paper we describe several algorithms for doing so in a
manner designed not to rely on specific stylistic dictates of
any document formatting standard. Rather, they are
designed to rely on a combination of overt and subtle cues
that form a generalized, common standard for placing this
information in a document and its easy extraction by
readers.
PB - Omnipress
KW - Document Classification Indexing
TI - Intelligent Content Based Title and Author Name Extraction from Formatted Documents
SP - 119
AV - public
EP - 124
ER -