creators_name: White, Tom creators_name: Bisciglia, Christophe type: conference_item datestamp: 2009-04-21 08:17:44 lastmod: 2009-04-21 08:17:47 metadata_visibility: show title: Web data processing with MapReduce and Hadoop ispublished: pub full_text_status: none pres_type: paper abstract: With massive growth in website traffic, extracting valuable information from clickstreams is a challenge as existing tools struggle to scale with web scale data. Apache Hadoop is a system for storing and processing massive amounts of data in parallel on clusters of commodity machines. With Hadoop and MapReduce it becomes feasible to make ad hoc queries over the massive datasets, opening up new possibilities for unearthing insights in web scale data. This talk will consist of two parts. The first part will be a brief introduction to MapReduce and Hive, Hadoop's processing and data warehousing components, and will explain how these technologies are designed to handle big data. The second part will be a demo, showing how Hadoop can be used in practice to mine web logs. Notes: Jeff Hammerbacher is giving a keynote, and was asked to have Cloudera also submit a more technical talk for this developer track. Tom White will be joining Jeff in Madrid. Christophe Bisciglia manages our conference schedule, and does not need to be cited for this submission. Only Tom will be speaking for this talk. date: 2009-04 event_title: 18th International World Wide Web Conference event_location: Madrid, Spain event_dates: April 20th-24th, 2009 event_type: conference refereed: TRUE citation: White, Tom and Bisciglia, Christophe (2009) Web data processing with MapReduce and Hadoop. In: 18th International World Wide Web Conference, April 20th-24th, 2009, Madrid, Spain.