This item is a Paper in the Developers track.
Full text not available from this repository.
Abstract
With massive growth in website traffic, extracting valuable information from clickstreams is a challenge as existing tools struggle to scale with web scale data. Apache Hadoop is a system for storing and processing massive amounts of data in parallel on clusters of commodity machines. With Hadoop and MapReduce it becomes feasible to make ad hoc queries over the massive datasets, opening up new possibilities for unearthing insights in web scale data. This talk will consist of two parts. The first part will be a brief introduction to MapReduce and Hive, Hadoop's processing and data warehousing components, and will explain how these technologies are designed to handle big data. The second part will be a demo, showing how Hadoop can be used in practice to mine web logs. Notes: Jeff Hammerbacher is giving a keynote, and was asked to have Cloudera also submit a more technical talk for this developer track. Tom White will be joining Jeff in Madrid. Christophe Bisciglia manages our conference schedule, and does not need to be cited for this submission. Only Tom will be speaking for this talk.
Export Record As...
- HTML Citation
- ASCII Citation
- Resource Map
- OpenURL ContextObject
- EndNote
- BibTeX
- OpenURL ContextObject in Span
- MODS
- DIDL
- EP3 XML
- JSON
- Dublin Core
- Reference Manager
- Eprints Application Profile
- Simple Metadata
- Refer
- METS