title: Web data processing with MapReduce and Hadoop creator: White, Tom creator: Bisciglia, Christophe description: With massive growth in website traffic, extracting valuable information from clickstreams is a challenge as existing tools struggle to scale with web scale data. Apache Hadoop is a system for storing and processing massive amounts of data in parallel on clusters of commodity machines. With Hadoop and MapReduce it becomes feasible to make ad hoc queries over the massive datasets, opening up new possibilities for unearthing insights in web scale data. This talk will consist of two parts. The first part will be a brief introduction to MapReduce and Hive, Hadoop's processing and data warehousing components, and will explain how these technologies are designed to handle big data. The second part will be a demo, showing how Hadoop can be used in practice to mine web logs. Notes: Jeff Hammerbacher is giving a keynote, and was asked to have Cloudera also submit a more technical talk for this developer track. Tom White will be joining Jeff in Madrid. Christophe Bisciglia manages our conference schedule, and does not need to be cited for this submission. Only Tom will be speaking for this talk. date: 2009-04 type: Conference or Workshop Item type: PeerReviewed identifier: White, Tom and Bisciglia, Christophe (2009) Web data processing with MapReduce and Hadoop. In: 18th International World Wide Web Conference, April 20th-24th, 2009, Madrid, Spain. relation: http://www2009.eprints.org/220/