creators_name: White, Tom
creators_name: Bisciglia, Christophe
type: conference_item
datestamp: 2009-04-21 08:17:44
lastmod: 2009-04-21 08:17:47
metadata_visibility: show
title: Web data processing with MapReduce and Hadoop
ispublished: pub
full_text_status: none
pres_type: paper
abstract: With massive growth in website traffic, extracting valuable information from clickstreams is a challenge as existing tools struggle to scale with web scale data. Apache Hadoop is a system for storing and processing massive amounts of data in parallel on clusters of commodity machines. With Hadoop and MapReduce it becomes feasible to make ad hoc queries over the massive datasets, opening up new possibilities for unearthing insights in web scale data. This talk will consist of two parts. The first part will be a brief introduction to MapReduce and Hive, Hadoop's processing and data warehousing components, and will explain how these technologies are designed to handle big data. The second part will be a demo, showing how Hadoop can be used in practice to mine web logs. Notes: Jeff Hammerbacher is giving a keynote, and was asked to have Cloudera also submit a more technical talk for this developer track. Tom White will be joining Jeff in Madrid. Christophe Bisciglia manages our conference schedule, and does not need to be cited for this submission. Only Tom will be speaking for this talk.
date: 2009-04
event_title: 18th International World Wide Web Conference
event_location: Madrid, Spain
event_dates: April 20th-24th, 2009
event_type: conference
refereed: TRUE
citation: White, Tom <http://www2009.eprints.org/view/author/White=3ATom=3A=3A.html> and Bisciglia, Christophe <http://www2009.eprints.org/view/author/Bisciglia=3AChristophe=3A=3A.html> (2009) Web data processing with MapReduce and Hadoop. In: 18th International World Wide Web Conference, April 20th-24th, 2009, Madrid, Spain.