Johanna Walker – Web Science MOOC

What are Web Observatories?

Johanna Walker — Wed, 26 Feb 2014 08:07:04 +0000

“Do you know what a web observatory is?” demanded our lecturer. “Er – a place where you watch what’s happening on the Web?” volunteered a classmate. Basically, although we’d heard of ‘Web observatory methods’ while preparing for our dissertations, it became clear that few of us had thought through what these might entail beyond a vague idea of sitting around checking out other people’s Facebook activity. In fact, it’s a ‘global data resource for the advancement of economic and social prosperity.’

In the words of that same lecturer, “To keep pace with the Web’s growing scale and scope, Web Science research demands the development of new theories, the availability and interpretation of relevant data, effective and scalable multilevel analytical methods, and considerable computational infrastructure.” So the Observatory is an online, mixed-methods, interdisciplinary environment for collaboration and sharing focusing on data about the Web. It provides tools and methodologies to examine data and activity. And to make it more complicated, it’s also a web of observatories, with 15 different Observatories co-ordinated by the Web Science Trust at Southampton.

The type of questions that concern Web Observatories range from inward-looking interrogation, such as what is the taxonomy of a Web Observatory, to outward-facing investigation, such as how can the Web itself be used as a tool to study ‘real world’ events.

One such web methodology is ‘Living Analytics’, based on the analysis of real time interaction between people online. One use of this looks at collaborative filtering – a way of creating personalised recommendations – and suggests that this outperforms matrix factorisation techniques on several dimensions, including accuracy. A little bit more ambitious than sitting around on Facebook, then.

The post What are Web Observatories? appeared first on Web Science MOOC.

Crowd Sourcing

Johanna Walker — Mon, 17 Feb 2014 09:31:20 +0000

Crowd (or citizen) journalism emerged as an effective source of news during the Arab Spring, and during the London riots of 2011 became understood as a trustworthy one.

Crowdfunding came of age with product success such as the Pebble watch, and the Securities and Exchange Commission relaxing rules on equity crowdfunding.

Crowdmapping is also big, especially in post-conflict and developing areas with poor public infrastructure – after the Haiti earthquake, with no maps of Port-au-Prince available, Haitians and international workers pieced together a crowdsourced map to aid the recovery.

Commercial crowdsourcing, or crowdworking, however, has a less noble and exciting image. For some, there is something slightly dubious about it. People being paid for ‘likes’. The market price for design and writing services being driven down to below the breadline. Amazon’s crowdworking platform, Mechanical Turk is even named after a colossal fraud. So – is this the truth?

Firstly, crowdworking actually describes a very broad range of activities. A BBC reporter has written a thoroughly entertaining journey through an attempt to experience a variety of different kinds of crowdwork. It’s definitely worth reading to get an idea of the sheer breadth of forms of work. Essentially, though, the concept is an on-demand work force. Some approaches, such as fiverr are an eclectic market place, where sellers advertise their service and price. (These range from the vague, ‘I will illustrate anything’ to the generic ‘I will write one search engine optimised article’ to the more specific ‘I will create 2x vintage retro logo badges in 24h’.) Others, such as Mechanical Turk, are microwork platforms. MTurk focuses on ‘Human Intelligence Tasks’ (HITS). These are ‘micro’ tasks, which generally take less than a minute and need to be performed thousands of times. It is particularly popular for database completion tasks, such as attaching photographer credits to photos, or finding a company URL. It uses duplication of tasks to ensure accuracy, for example, when translating a word the same word will be shown to several different HITs workers, and the most commonly suggested translation will be used.

MTurk runs a qualifications systems, whereby HITS requesters will specify that only HITS workers with a certain number of previous HITS under the belt and a certain level of approval rating can perform the task. And one of the most interesting usages of MTurk is for academic research. It has many applications – assessing responses to visual stimuli and experiments into behavioural economics and psychological motivations to name but two. In the world or web research, it can also be used to recruit people to a study, such as the Microsoft Research study on depression and Twitter users that sourced a ground truth dataset of 69,000 people through MTurk. Would this even have been possible another way? Perhaps there is a virtuous side to this maligned form of crowdsourcing after all.

The post Crowd Sourcing appeared first on Web Science MOOC.

Firehose

Johanna Walker — Mon, 17 Feb 2014 09:30:50 +0000

It’s not a stream of water, its a stream of data. Rather, if you think of your Twitter feed as a (more or less) gently flowing brooklet of data, the massive totality of Twitter data at any given moment is akin to the water brutally gushing out of a firehose. This firehose streams data – historical and/or real time tweets – to partners who have a commercial or intellectual use for it.

Who would have such a use? Academics, for one. A number of recent papers presented at the Web Science Conference have examined the efficiency and accuracy of tweets in spreading information. The University of Southampton has created a searchable ‘Tweepository’ of archived data.

It’s not just Twitter which has a firehose, it’s almost any social networking site (including those that might not spring to mind as social networks, such as Tumblr and WordPress). Which brings us to our second group of consumers – commercial organisations. Yandex, a Russian search engine, hopes to improve the efficacy and accuracy of its search results by adding Facebook posts to their results. Klout examines individuals’ social networking and ranks them on various dimensions, assessing who has the greatest networking ‘clout’. It’s essentially a market research company, and these individuals will often find themselves recipients of freebies from marketing departments.

So, if you’re feeling like you have an excellent idea for an app, can you just wander along and ask Twitter for access to their firehose? Unless you’re Sergei or Bill, not usually. However, you can purchase certain data sets from another group of users, firehose resellers. These provide added value services, such as parsing data from several firehoses together and adding enrichment metadata such as language and geolocation.

So there are a large and growing number of uses. How large? In the first half of 2013 Twitter made $32million licensing data. Here’s just a few examples in action.

The post Firehose appeared first on Web Science MOOC.

Introducing Web Science “Rough Guides”

Johanna Walker — Mon, 17 Feb 2014 09:30:16 +0000

The World Wide Web is a pretty big thing. Just how big no-one is exactly sure, but in 2008 Google hit the milestone of indexing 1 trillion pages. That’s a lot of information. Add to that all the metadata, all the ways of commercialising information, all the data analysis, and all the ways we think about these things and it becomes clear that knowledge of and about the Web is simply extraordinarily vast.

Therefore, its not really surprising that a number of people on the MOOC have told us, “I thought I was an experienced IT person but I’d never heard of X, or Y or Z”. We’ve also heard the same from some of the people on the Web Science MSc. So, we’ve started a series of blog posts which are simply Rough Guides to some of these interesting web concepts, models, theories or products. Hopefully they will be a thought-provoking, stimulating, entertaining and slightly random but essentially enlightening adjunct to the course.

If you have suggestions for subjects we should cover in the Rough Guides – or you would like to write one yourself – please let us know in the comments.

The post Introducing Web Science “Rough Guides” appeared first on Web Science MOOC.