Ferrara, Emilio and Baumgartner, Robert (2011) Design of Automatically Adaptable Web Wrappers. [Conference Paper]
Full text available as:
|
PDF
- Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives. 163Kb |
Abstract
Nowadays, the huge amount of information distributed through the Web motivates studying techniques to be adopted in order to extract relevant data in an efficient and reliable way. Both academia and enterprises developed several approaches of Web data extraction, for example using techniques of artificial intelligence or machine learning. Some commonly adopted procedures, namely wrappers, ensure a high degree of precision of information extracted from Web pages, and, at the same time, have to prove robustness in order not to compromise quality and reliability of data themselves. In this paper we focus on some experimental aspects related to the robustness of the data extraction process and the possibility of automatically adapting wrappers. We discuss the implementation of algorithms for finding similarities between two different version of a Web page, in order to handle modifications, avoiding the failure of data extraction tasks and ensuring reliability of information extracted. Our purpose is to evaluate performances, advantages and draw-backs of our novel system of automatic wrapper adaptation.
Item Type: | Conference Paper |
---|---|
Additional Information: | ISBN: 978-989-8425-40-9 |
Subjects: | Computer Science > Artificial Intelligence |
ID Code: | 7640 |
Deposited By: | Ferrara, Dr. Emilio |
Deposited On: | 01 Oct 2011 00:34 |
Last Modified: | 01 Oct 2011 00:34 |
Metadata
- ASCII Citation
- Atom
- BibTeX
- Dublin Core
- EP3 XML
- EPrints Application Profile (experimental)
- EndNote
- HTML Citation
- ID Plus Text Citation
- JSON
- METS
- MODS
- MPEG-21 DIDL
- OpenURL ContextObject
- OpenURL ContextObject in Span
- RDF+N-Triples
- RDF+N3
- RDF+XML
- Refer
- Reference Manager
- Search Data Dump
- Simple Metadata
- YAML
Repository Staff Only: item control page