This site has been permanently archived. This is a static copy provided by the University of Southampton.
---
abstract: 'Parallel texts are documents that present parallel translations. This paper describes a simple method that can be deployed on a real-time news feed to create an infinitely growing source of parallel texts in French and English. Our experiment was lead on the Canada Newswire news feed. Given some of its intrinsic properties, it was possible to deploy a relatively simple text matching techniques that rely on language independent cognates such numbers, capitalized words, punctuation and new lines characters. On three week of press releases, our system correctly identified the vast majority of parallel press release. It committed only minor errors on repeated news items.'
altloc: []
chapter: ~
commentary: ~
commref: ~
confdates: 'August 30th, 2004'
conference: Computational Linguistic in the North-East (CLiNE 2004)
confloc: Montreal
contact_email: ~
creators_id:
- pythonner
- ''
creators_name:
- family: Nadeau
given: David
honourific: ''
lineage: ''
- family: Foster
given: George
honourific: ''
lineage: ''
date: 2004
date_type: published
datestamp: 2005-06-19
department: ~
dir: disk0/00/00/43/97
edit_lock_since: ~
edit_lock_until: ~
edit_lock_user: ~
editors_id: []
editors_name: []
eprint_status: archive
eprintid: 4397
fileinfo: /style/images/fileicons/application_pdf.png;/4397/1/NRC%2D48081.pdf
full_text_status: public
importid: ~
institution: ~
isbn: ~
ispublished: pub
issn: ~
item_issues_comment: []
item_issues_count: 0
item_issues_description: []
item_issues_id: []
item_issues_reported_by: []
item_issues_resolved_by: []
item_issues_status: []
item_issues_timestamp: []
item_issues_type: []
keywords: 'parallel corpus, machine translation, '
lastmod: 2011-03-11 08:56:05
latitude: ~
longitude: ~
metadata_visibility: show
note: ~
number: ~
pagerange: 21-28
pubdom: FALSE
publication: ~
publisher: ~
refereed: TRUE
referencetext: |-
Ma, Xiaoyi, and Liberman, Mark, Y., 1999, BITS: A Method for Bilingual Text Search over the Web, in Machine Translation Summit VII, September.
Resnik, Philip, and Smith, Noah, A., 2003, The Web as a Parallel Corpus, in Computational Linguistics, Special Issue on the Web as Corpus, 29 (3), pp. 349-380
Simard, M., Foster, G. and Isabelle, P., 1992, Using Cognates to Align Sentences in Bilingual Corpora, in Proceedings of the Fourth International Conference on Theoretical and Methodogical Issues in Machine translation, Montreal, pp. 67-81.
relation_type: []
relation_uri: []
reportno: ~
rev_number: 12
series: ~
source: ~
status_changed: 2007-09-12 16:59:28
subjects:
- comp-sci-lang
succeeds: ~
suggestions: ~
sword_depositor: ~
sword_slug: ~
thesistype: ~
title: |-
Real-Time Identification of Parallel Texts from Bilingual
Newsfeed
type: confpaper
userid: 5664
volume: ~