---
abstract: |-
Cet article présente une extension aux algorithmes de
création de surcouche de documents hypertextuels.
Il s’agit de diversifier la granularité de
l’information qu’il est possible de capturer en
utilisant des techniques de traitement du langage
naturel. Une surcouche de document Web (web page wrapper)
est une vue sur des noeuds HTML contenant une information
donnée et désirée. Par exemple, dans une manchette de
journal, une surcouche peut baliser le nom de l’auteur,
la date ou même toutes les références à un lieu ou a
une compagnie quelconque. Nous avons étendu le
fonctionnement d’un algorithme de création de
surcouchage afin de dépasser la limite des noeuds HTML
et d’extraire de l’information du contenu textuel qui
s’y retrouve. Nous appliquons cette technique à la
création automatique de lexiques (liste de mots).
altloc:
- http://www.crtl.ca/cline05/cline05_papers/Nadeau.pdf
chapter: ~
commentary: ~
commref: ~
confdates: August 26th
conference: Computational Linguistics in the North-East
confloc: 'Gatineau, Québec, Canada'
contact_email: ~
creators_id:
- pythonner
creators_name:
- family: Nadeau
given: David
honourific: ''
lineage: ''
date: 2005
date_type: published
datestamp: 2005-11-12
department: ~
dir: disk0/00/00/46/04
edit_lock_since: ~
edit_lock_until: ~
edit_lock_user: ~
editors_id: []
editors_name: []
eprint_status: archive
eprintid: 4604
fileinfo: /style/images/fileicons/application_pdf.png;/4604/1/nadeau05surcouchage%2Dfinal.pdf
full_text_status: public
importid: ~
institution: ~
isbn: ~
ispublished: pub
issn: ~
item_issues_comment: []
item_issues_count: 0
item_issues_description: []
item_issues_id: []
item_issues_reported_by: []
item_issues_resolved_by: []
item_issues_status: []
item_issues_timestamp: []
item_issues_type: []
keywords: 'Web page wrapper, information extraction'
lastmod: 2011-03-11 08:56:13
latitude: ~
longitude: ~
metadata_visibility: show
note: ~
number: ~
pagerange: ~
pubdom: FALSE
publication: ~
publisher: ~
refereed: TRUE
referencetext: |-
Ashish, N. and Knoblock, C., Wrapper Generation for Semi-structured Internet Sources, Workshop on Management of Semistructured Data, 1997.
Bikel, D. M., Miller, S., Schwartz, R., and Weischedel, R., Nymble: a high-performance learning name-finder, Proceedings of the Fifth Conference on Applied Natural Language Processing, 1997.
Califf, M. E. and Mooney, R. J., Bottom-up relational learning of pattern matching rules for information extraction, Journal of Machine Learning Research, vol. 4. 2003.
Cohen, W. and Fan, W., Learning Page-Independent Heuristics for Extracting Data from Web Page, in WWW-99, 1999.
Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., and Slattery, S, Learning to construct knowledge bases from the world wide web, Artificial Intelligence 118:69-113, 2000.
Etzioni, O., Cafarella, M., Downey, D., Popescu A.-M., Shaked, T., Soderland, S., Weld, D.S., and Yates A., Methods for Domain- Independent Information Extraction from the Web: An Experimental Comparison, American Association for Artificial Intelligence (AAAI), 2004.
Freitag, D. and Kushmerick, N., Boosted wrapper induction, In Proc. of the 17th National Conference on Artificial Intelligence AAAI-2000, 2000.
Hong, T. W. and Clark, K. L., Towards a Universal Web Wrapper. 17th International FLAIRS Conference, 2004.
Kushmerick, N. Wrapper Induction for Information Extraction, Ph.D. Dissertation, Department of Computer Science & Engineering, University of Washington, 1997.
Soderland, S., Learning Information Extraction Rules for Semi-Structured and Free Text, Machine Learning, 34(1-3), 1999.
relation_type: []
relation_uri: []
reportno: ~
rev_number: 12
series: ~
source: ~
status_changed: 2007-09-12 17:01:11
subjects:
- comp-sci-lang
succeeds: ~
suggestions: ~
sword_depositor: ~
sword_slug: ~
thesistype: ~
title: Création de surcouche de documents hypertextes et traitement du langage naturel
type: confpaper
userid: 5664
volume: ~