Nadeau, David and Foster, George (2004) Real-Time Identification of Parallel Texts from Bilingual Newsfeed. [Conference Paper]
Full text available as:
|
PDF
41Kb |
Abstract
Parallel texts are documents that present parallel translations. This paper describes a simple method that can be deployed on a real-time news feed to create an infinitely growing source of parallel texts in French and English. Our experiment was lead on the Canada Newswire news feed. Given some of its intrinsic properties, it was possible to deploy a relatively simple text matching techniques that rely on language independent cognates such numbers, capitalized words, punctuation and new lines characters. On three week of press releases, our system correctly identified the vast majority of parallel press release. It committed only minor errors on repeated news items.
Item Type: | Conference Paper |
---|---|
Keywords: | parallel corpus, machine translation, |
Subjects: | Computer Science > Language |
ID Code: | 4397 |
Deposited By: | Nadeau, David |
Deposited On: | 19 Jun 2005 |
Last Modified: | 11 Mar 2011 08:56 |
References in Article
Select the SEEK icon to attempt to find the referenced article. If it does not appear to be in cogprints you will be forwarded to the paracite service. Poorly formated references will probably not work.
Metadata
- ASCII Citation
- Atom
- BibTeX
- Dublin Core
- EP3 XML
- EPrints Application Profile (experimental)
- EndNote
- HTML Citation
- ID Plus Text Citation
- JSON
- METS
- MODS
- MPEG-21 DIDL
- OpenURL ContextObject
- OpenURL ContextObject in Span
- RDF+N-Triples
- RDF+N3
- RDF+XML
- Refer
- Reference Manager
- Search Data Dump
- Simple Metadata
- YAML
Repository Staff Only: item control page