The post New Web Science Projects appeared first on Web Science MOOC.
]]>The Web Science MOOC features the current research projects involving a number of staff and students at the University of Southampton. The next generation of work in this rapidly developing area is now coming onto the radar.
Last month we held “Web Science Research Week” at the University of Southampton. A number of projects involving staff, students and external partners were kickstarted on the Monday, and then the progress made was reported in a series of presentations to a large audience at the Royal Society on the final day.
Chris Phethean has written up the National Archives project report
Gareth Beeston has reviewed the Personal Data project
My review of the final day’s presentations
More examples will follow in a later post!
The post New Web Science Projects appeared first on Web Science MOOC.
]]>The post Looking forward to week 5: digital economy appeared first on Web Science MOOC.
]]>Week 5 has come around very quickly, and we are all very much looking forward to the discussions. Do feel free to comment and interact with us and other learners within the Futurelearn platform, via Twitter using #FLwebsci, on this blog, or in the G+ community. We will all benefit greatly from the thoughts and examples that are shared.
The Digital Economy is a vast topic in its own right, and we make no claim to cover all its bases here in just a few hours! We have focused the week on specific areas of research that are currently in progress by staff and students at the University of Southampton:
Ian Brown takes a critical look beneath the current hype around “big data”
Reuben Binns examines the evolving issues around privacy and the sharing of “personal data”
Lisa Harris focuses on “professional data” and what it means for recruitment and new business development
Chris Phethean appraises the nature of the value that social media creates for businesses.
Lorraine Warren ends the week by reflecting on how digital age has presented new opportunities for venture creation.
Drumroll….!
The post Looking forward to week 5: digital economy appeared first on Web Science MOOC.
]]>The post What are Web Observatories? appeared first on Web Science MOOC.
]]>In the words of that same lecturer, “To keep pace with the Web’s growing scale and scope, Web Science research demands the development of new theories, the availability and interpretation of relevant data, effective and scalable multilevel analytical methods, and considerable computational infrastructure.” So the Observatory is an online, mixed-methods, interdisciplinary environment for collaboration and sharing focusing on data about the Web. It provides tools and methodologies to examine data and activity. And to make it more complicated, it’s also a web of observatories, with 15 different Observatories co-ordinated by the Web Science Trust at Southampton.
The type of questions that concern Web Observatories range from inward-looking interrogation, such as what is the taxonomy of a Web Observatory, to outward-facing investigation, such as how can the Web itself be used as a tool to study ‘real world’ events.
One such web methodology is ‘Living Analytics’, based on the analysis of real time interaction between people online. One use of this looks at collaborative filtering – a way of creating personalised recommendations – and suggests that this outperforms matrix factorisation techniques on several dimensions, including accuracy. A little bit more ambitious than sitting around on Facebook, then.
The post What are Web Observatories? appeared first on Web Science MOOC.
]]>The post Open Hypermedia and the Web appeared first on Web Science MOOC.
]]>Tim Berners-Lee, the main architect of the World Wide Web (W3), developed the system while working for CERN, the European Organisation for Nuclear Research in the late 1980s. W3 was developed to overcome difficulties with managing information exchange via the Internet. At the time finding data on the Internet required pre-existing knowledge gained through various time-consuming methods: the use of specialised clients, mailing lists, newsgroups,hard copies of link lists, and word of mouth.
At CERN, a large number of physicists and other staff needed to share large amounts of data and had begun to employ the Internet to do this. Although the Internet was acknowledged as a valuable means of sharing data, towards the end of the 1980s the need to develop simpler, more reliable methods encouraged the creation of new protocols using distributed hypermedia as a model.
Developments in Open Hypermedia Systems (OHSS) had gained pace throughout the 80s; a number of stand-alone systems had been prototyped and early attempts at a standardised vocabulary had been made [1]. OHSS facilitate key features: a separation of link databases (‘linkbases’) from documents, and hypermedia functions enabled for third party applications with potential accessibility within heterogeneous environments.
Two key systems; Hyper-G, developed by a team at the Technical University of Graz, Austria [1], and Microcosm, originating at the University of Southampton [5] were at the heart of pioneering approaches to hypermedia. Like W3, they were launched in 1990, but within 10 years both were outpaced by the formers overwhelming popularity. Ease of use, the management of link integrity and content reference, and the ‘openness’ of the underlying technology were contributing factors to W3′s success. However, both Hyper-G’s and Microcosm’s approach to linking media continue to have relevance for the future development of the Web.
In 1988 a group of hypertext developers met at the Dexter Inn, New Hampshire to create a terminology for interchangeable and interoperable hypertext standards. About 10 different contemporary hypertext systems were analysed and commonalities between them were described. Essentially each of the systems provided “the ability to create, manipulate, and/or examine a network of information-containing nodes interconnected by relational links.”[6]
The Dexter Model did not attempt to specify implementation protocols, but provided a vital reference model for future developments of hypertext and hypermedia. The Model identified a ‘component’ as a single presentation field which contained the basic content of a hypertext network: text, graphics, images, and/or animation. Each component was assigned a ‘Unique Identifier’ (UID), and ‘links’ that interconnected components were resolved to one or many UIDs to provide ‘link integrity’
By the mid-80s Berners-Lee saw the potential for extending the principle of computer-based information management across the CERN network in order to provide access to project documentation and make explicit the ‘hidden’ skills of personnel as well as the ‘true’ organisational structure. He proposed that this system should meet a number of requirements: remote access across networks, heterogeneity, and the ability to add ‘private links’ and annotations to documents. Berners-Lee’s key insights were that ”Information systems start small and grow”, and that the system must be sufficiently flexible to “allow existing systems to be linked together without requiring any central control or coordination”.
His proposal also stressed the different interests of “academic hypertext research” and the practical requirements of his employer. He recognised that many CERN employees were using “primitive terminals” and were not concerned with the niceties of “advanced window styles” and interface design [2].
Towards the end of 1990, work was completed on the first iteration of W3, which included a new Hypertext Markup Language (HTML), an ‘httpd’ server, and the Webs first browser, which included an editor function as well as a viewer. The underlying protocols were made freely available and within a few years the technology had been used and adapted by a wide variety of Internet enthusiasts who helped to spread W3 technology to wider audiences.
Aimed at providing solutions to perceived problems in contemporary hypermedia systems, Microcosm was launched as an “open model for hypermedia with dynamic linking” [5] in January 1990. The Microcosm team identified that existing hypermedia systems, although useful in closed settings, did not communicate with other applications, used proprietary document formats, were not easily authored, and as they were distributed on read-only media, did not allow users to add links and annotations.
While Microcosm used read-only media (CD-ROMs and laser-discs) to host components within an authored environment, it separated these ‘data objects’ from linkbases housed on remote servers. This local area network-based system allowed all users, authors and readers, to add advanced, n-ary (multi-directional) links to multiple generic objects. Microcosm was also able to process a range of documents and had some potential for interoperability due its modular structure, which enabled it to offer a degree of interoperability with W3 browsers [7].
While recognising the significance of W3, the Microcosm team identified some weakness, especially in the manner HTML managed links. Rather than storing links separately, W3 embedded links in documents which resulted in the inability to annotate or edit web documents, and suffered from ‘dangling’ or missing links when documents were deleted or URLs changed. In addition, HTML was limited in how links could be made, there were a small number of allowable tags and only single-ended, unidirectional links could be authored. To counter these link integrity issues the Microcosm team developed the Distributed Link Service (DLS) which enabled the integration of linkbase technology into a W3 environment [3].
Using the DLS, W3 servers could access linkbases and enabled user authored generic as well as specific links. Generic link authoring allows users to create links that connect any mention of phrases within sets of documents, and allows bi-directional links within documents.
Hyper-G offered a number of solutions to the linking issues identified by others working in hypermedia systems development. In a similar manner to Microcosm, Hyper-G stored links in link databases. This allowed users to attach their own links to read-only documents, multiple links to documents or anchors within text or any other media object could be made, users could readily see what objects were linked to, and links could be followed backwards so users could see “what links to what”. Unlike Microcosm, the system use an advanced probabilistic flood (‘P-Flood’) algorithm which managed updates to remote documents and linkbases ensuring link integrity and consistency essentially informing links when documents have been deleted and changed.
Like W3, Hyper-G was a client-server system with its own protocol (HG-CSP) and markup language (HTF). Hyper-G browsers integrated with Internet services W3, WAIS and Gopher, supported a range of objects (text, images, audio, video and 3D environments) and integrated authoring functionality with support for collaboration.
Hyper-G was a highly advanced system that successfully applied key hypermedia principles to managing data on the Internet. As web usability expert, Jakob Nielsen asserted, it offered “some sorely needed structure for the Wild Web” [8].
Despite acknowledged limitations, W3 retained its position as the defacto means of traversing the Internet, and continued to grow and spread its influence. The reasons for this are relatively straightforward.
W3 was free and relatively easy to use; anyone with a computer, a modem and a phone line could set up their own servers, build web sites and start publishing on the Internet without having to pay fees or enter into contractual relationships.
Although limited in terms of hypermedia capability, these shortcomings were not serious enough to prevent users taking advantage of its data sharing and simple linking functions. Dangling links could be ignored, as search engines allowed users to find other resources, and improved browsers allowed users to keep track of their browsing history, and backtrack through visited pages.
In contrast, Microcosm and Hyper-G were developed, in their early stages at least, as local systems. This enabled them to employ superior technology to manage complex linking operations much more effectively than W3. However, this focus led to systems that were significantly more complex to manage than W3, and presented difficulties for scaling up to the wider Internet. In addition it was not clear which parts, if any, were free for use. Both systems promoted commercial versions early in their development which had the unintended effect of stifling adoption beyond an initial core group of users.
W3 has developed into a sophisticated system that provides many of the functions of an open hypermedia system that were lacking in its early stages of development. Attempts to integrate hypermedia systems with W3 [3],[4],[9] and find solutions to linking and data storage issues influenced the development of the open standard Extensible Markup language (XML) and XPath, XPointer and XLink syntaxes. While HTML describes documents and the links between them, XML contains descriptive data that add to or replace the content of web documents. XPath, XPointer and XLink describe addressable elements, arbitrary ranges, and connections between anchors within XML documents respectively.
XML may be combined with Resource Description Framework (RDF) and Web Ontology Language (OWL) protocols to store descriptive data that produce web content in more useful ways than with simple HTML. These protocols allow web content to be machine-readable, allowing applications to interrogate data and automate many web activities that have previously only been executable by human readers. These protocols are seen as precursors for the ‘Semantic Web’, a new development of W3 that links data points with multi-directional relationships rather than uni-directional links to documents [10].
[1] Keith Andrews, Frank Kappe, and Hermann Maurer. The Hyper-G Network Information System. In J. UCS The Journal of Universal Computer Science, pages 206–220. Springer, 1996.
[2] Tim Berners-Lee. Information Management: A Proposal. CERN, 1989.
[3] Les A Carr, David C DeRoure, Wendy Hall, and Gary J Hill. The Distributed Link Service: A Tool for Publishers, Authors and Readers. 1995.
[4] Hugh Davis, Andy Lewis, and Antoine Rizk. Ohp: A Draft Proposal for a Standard Open Hypermedia Protocol (Levels 0 and 1: Revision 1.2-13th March. 1996). In 2nd Workshop on Open Hypermedia Systems, Washington, 1996.
[5] Andrew M Fountain, Wendy Hall, Ian Heath, and Hugh C Davis. Microcosm: An Open Model for Hypermedia with Dynamic Linking. In ECHT, pages 298–311, 1990.
[6] Frank Halasz, Mayer Schwartz, Kaj Grønbæk, and Randall H Trigg. The Dexter Hypertext Reference Model. Communications of the ACM, 37(2):30–39, 1994.
[7] Wendy Hall, Hugh Davis, and Gerard Hutchings. Rethinking Hypermedia: the Microcosm Approach, Volume 67. Kluwer Academic Publishers Dordrecht, 1996.
[8] Hermann Maurer. Hyperwave – The Next Generation Web Solution, Institute for Information Processing and Computer Supported Media, Graz University of Technology, [Online: http://www.iicm.tugraz.at/hgbook Accessed 5 December 2013].
[9] Dave E Millard, Luc Moreau, Hugh C Davis, and Siegfried Reich. Fohm: A Fundamental Open Hypertext Model for Investigating Interoperability Between Hypertext Domains. In Proceedings of the Eleventh ACM on Hypertext and Hypermedia, pages 93–102. ACM, 2000.
[10] Nigel Shadbolt, Wendy Hall, and Tim Berners-Lee. The Semantic Web Revisited. Intelligent Systems, IEEE, 21(3):96–101, 2006.
Originally published on Tim O Riordan’s blog and reproduced with permission.
The post Open Hypermedia and the Web appeared first on Web Science MOOC.
]]>The post Crowd Sourcing appeared first on Web Science MOOC.
]]>Crowdfunding came of age with product success such as the Pebble watch, and the Securities and Exchange Commission relaxing rules on equity crowdfunding.
Crowdmapping is also big, especially in post-conflict and developing areas with poor public infrastructure – after the Haiti earthquake, with no maps of Port-au-Prince available, Haitians and international workers pieced together a crowdsourced map to aid the recovery.
Commercial crowdsourcing, or crowdworking, however, has a less noble and exciting image. For some, there is something slightly dubious about it. People being paid for ‘likes’. The market price for design and writing services being driven down to below the breadline. Amazon’s crowdworking platform, Mechanical Turk is even named after a colossal fraud. So – is this the truth?
Firstly, crowdworking actually describes a very broad range of activities. A BBC reporter has written a thoroughly entertaining journey through an attempt to experience a variety of different kinds of crowdwork. It’s definitely worth reading to get an idea of the sheer breadth of forms of work. Essentially, though, the concept is an on-demand work force. Some approaches, such as fiverr are an eclectic market place, where sellers advertise their service and price. (These range from the vague, ‘I will illustrate anything’ to the generic ‘I will write one search engine optimised article’ to the more specific ‘I will create 2x vintage retro logo badges in 24h’.) Others, such as Mechanical Turk, are microwork platforms. MTurk focuses on ‘Human Intelligence Tasks’ (HITS). These are ‘micro’ tasks, which generally take less than a minute and need to be performed thousands of times. It is particularly popular for database completion tasks, such as attaching photographer credits to photos, or finding a company URL. It uses duplication of tasks to ensure accuracy, for example, when translating a word the same word will be shown to several different HITs workers, and the most commonly suggested translation will be used.
MTurk runs a qualifications systems, whereby HITS requesters will specify that only HITS workers with a certain number of previous HITS under the belt and a certain level of approval rating can perform the task. And one of the most interesting usages of MTurk is for academic research. It has many applications – assessing responses to visual stimuli and experiments into behavioural economics and psychological motivations to name but two. In the world or web research, it can also be used to recruit people to a study, such as the Microsoft Research study on depression and Twitter users that sourced a ground truth dataset of 69,000 people through MTurk. Would this even have been possible another way? Perhaps there is a virtuous side to this maligned form of crowdsourcing after all.
The post Crowd Sourcing appeared first on Web Science MOOC.
]]>The post Firehose appeared first on Web Science MOOC.
]]>Who would have such a use? Academics, for one. A number of recent papers presented at the Web Science Conference have examined the efficiency and accuracy of tweets in spreading information. The University of Southampton has created a searchable ‘Tweepository’ of archived data.
It’s not just Twitter which has a firehose, it’s almost any social networking site (including those that might not spring to mind as social networks, such as Tumblr and WordPress). Which brings us to our second group of consumers – commercial organisations. Yandex, a Russian search engine, hopes to improve the efficacy and accuracy of its search results by adding Facebook posts to their results. Klout examines individuals’ social networking and ranks them on various dimensions, assessing who has the greatest networking ‘clout’. It’s essentially a market research company, and these individuals will often find themselves recipients of freebies from marketing departments.
So, if you’re feeling like you have an excellent idea for an app, can you just wander along and ask Twitter for access to their firehose? Unless you’re Sergei or Bill, not usually. However, you can purchase certain data sets from another group of users, firehose resellers. These provide added value services, such as parsing data from several firehoses together and adding enrichment metadata such as language and geolocation.
So there are a large and growing number of uses. How large? In the first half of 2013 Twitter made $32million licensing data. Here’s just a few examples in action.
The post Firehose appeared first on Web Science MOOC.
]]>The post Introducing Web Science “Rough Guides” appeared first on Web Science MOOC.
]]>Therefore, its not really surprising that a number of people on the MOOC have told us, “I thought I was an experienced IT person but I’d never heard of X, or Y or Z”. We’ve also heard the same from some of the people on the Web Science MSc. So, we’ve started a series of blog posts which are simply Rough Guides to some of these interesting web concepts, models, theories or products. Hopefully they will be a thought-provoking, stimulating, entertaining and slightly random but essentially enlightening adjunct to the course.
If you have suggestions for subjects we should cover in the Rough Guides – or you would like to write one yourself – please let us know in the comments.
The post Introducing Web Science “Rough Guides” appeared first on Web Science MOOC.
]]>The post Web Science Seminar at University College London appeared first on Web Science MOOC.
]]>The World Wide Web has proven to be an invaluable collaboration tool in the physical, biological, and social sciences. The unique aspects of the Web have led to the definition of its own branch of science. “Web Science” has become a recognised research area and numerous universities have added its study to their curriculum.
In this talk Bebo will provide an evaluation of the state of Web Science and discuss some of the diverse and fascinating results that Web Science researchers have discovered.
Refreshments provided. All welcome!
The post Web Science Seminar at University College London appeared first on Web Science MOOC.
]]>The post Welcome to the Web Science MOOC! appeared first on Web Science MOOC.
]]>And follow us on Twitter where we share useful links and contribute to ongoing conversations.
There are videos of Web Science events and other relevant activities on our YouTube channel.
If you look back through this blog you can check out some of the discussions and reflections that took place on the first Web Science MOOC – you can also connect your own blog if you would like your own #FLwebsci posts to be aggregated into this one.
We hope you enjoy the MOOC!
The post Welcome to the Web Science MOOC! appeared first on Web Science MOOC.
]]>The post Just one week to go… appeared first on Web Science MOOC.
]]>You can sign up via FutureLearn
The post Just one week to go… appeared first on Web Science MOOC.
]]>