Trust and Privacy Management in Cloud Platforms
Scenario Name: Trust and Privacy Management in Cloud Platforms
Scenario Authors: Jose Manuel Gómez-Pérez, Intelligent Software Components (iSOCO) S.A.
Scenario Summary: Cloud services are already present in our everyday life, so we will focus on one of the most popular ones: Gmail. Frank (scenario inspired by an editorial by Frank Sullivan in
I Wandered Lonely as a Cloud, In Computing in Science and Engineering, vol. 10, no. 3, May/June, 2008), a happy Gmail user who lives in Washington, receives a message from a friend at Dartmouth. In the email Frank’s friend announced that he planned to fly to Washington for an afternoon appointment and thought it would be a nice idea to drop by Frank’s place for a chat followed by lunch before the other meeting. Frank used Gmail to send a reply to his friend, so several advertisements appeared in the form of Google sponsored links. The advertisements showed that the mail exchange had to do with a flight to Washington DC from Dartmouth, that the trip had something to do with computers, and that lunch had been planned in advance.
Obviously, Frank’s mail data had been scanned and processed and probably this activity is compliant with the contract subscribed by Frank when he created his Gmail account, but what prevents those same data from e.g. being redistributed to other services in the cloud? Perhaps Google is liable with respect to eventual misuse of the data and advertising is allowed by the contract, but how can Frank be sure that his privacy is not being violated in general by the cloud services he uses? In the cloud, virtualization hinders the transparency required to allow Frank to inspect what is being done with his data, assuring him that the context in which such data are being used is the one expected by him and not any other and that the data are being shared with whom, when, and for Frank’s intended purposes and no others.
Frank continues using his Gmail account but he keeps on receiving personalized advertisements on different aspects of his daily life e.g. shopping, sport events he likes to attend, job offers, treatment offers for very specific health problems he suffers from, etc., which deeply irritates him. So, he decides to close that account and open a new one somewhere else. Then, Frank realizes he has two problems. First, how to preserve the part of his digital identity he is still interested in so that e.g. email from friends or colleagues can reach him even if the old account is removed. And second, he does not want all his email data being left somewhere to be manipulated by the owners of the service to do who knows what with them. Luckily for Frank, the law obliges email services in the cloud to destroy all copies of Frank’s mail data upon contract termination. But this is a very hard problem. It requires that the location of all data is properly identified and kept traceable.
Furthermore, what about replies or cc’s sent to other users of Gmail and other mail and non-mail services in the cloud who have an account? Frank’s email data will not be deleted from those repositories and it will still be possible to scan, analyze, and use the data for different purposes. Additionally, what parts within Frank’s emails are sensitive pieces of information for his privacy and to what granularity should their provenance be kept?
Users:
- Individuals (Frank), companies (especially SMEs), and governmental bodies, who use the services provided by cloud platforms.
- Lawyers and policy makers of the companies providing such services.
Requirements for Provenance: When users put their data in the hands of third parties in the cloud, one relevant aspect for such data is the attribution of whatever is done with them. It is important that the owner of the data, Frank (or, is Google the real owner?) is provided with the necessary means to identify what has been done with his data, whether e.g. it has been used for profiling, advertising, etc. Such means must also identify the entities involved in the processes which have manipulated such data. According to the EU directive 95/46, individuals have the right to know and understand the processes their data go through with the purpose of making systems like cloud platforms accountable. In the example, provenance information should be attached to the actions executed by cloud services on Frank’s data so that it becomes possible to reason about the accountability of such services with respect to the contract subscribed by him. Additionally, proof that such attribution is trustworthy should be provided by means of e.g. digital signature of the provenance logs.
This would allow cloud companies like Google or Amazon to justify the compliance with respect to a contract of the processes they undertook or did not undertake with the data of individuals like Frank and to increase trustworthiness and stimulate take up by other companies (typically SMEs) and public administrations. Such auditing functionalities should take into account the sheer size of the provenance graphs to represent and the type of users towards justification is aimed. In the case of Frank, each email exchange, forward, cc, processing step or redistribution of email data to other Google services like Calendar, Docs, Scholar, etc. (and potentially other cloud companies) would typically result into a new provenance artifact, like email threads or user profiles produced by statistic analysis of email repositories, which are associated to several sources and entities like email senders and recipients, the services manipulating such data, and the companies benefiting from the resulting user profiles for e.g. advertising.
Multiplied by the zillions of users of cloud services potentially involved like Gmail, this can return into enormous provenance graphs, whose analysis either automatically or manually is extremely complex and impractical. Thus, adaptive abstractions e.g. overlays for analysis of the provenance information are required that allow such contextual analysis at different levels of detail, dealing only with the amount of detail necessary for the task at hand and sensitive to the skills and expertise of the user doing such analysis. For example, Frank, who is interested in knowing if the treatment of his email data has been compliant with Gmail’s contract, is not knowledgeable of Google’s infrastructure. Thus, the justification of the use of Frank’s email data done by Gmail should be aware of this and abstract Frank from that level of detail at the infrastructure layer, while providing him with enough provenance information to produce a conveying proof of appropriate use.
Provenance Questions:
- What is being done with my data once submitted to the cloud?
- What proof of compliance can I get that my data are not abused? How does this match the contract subscribed with the cloud service?
- Who is responsible for misuse of my data? What actors were involved and with what roles?
- What legal framework applies to the cloud services manipulating my data? What countries have a jurisdiction on them?
- To what extend can I trust that my data is being properly used? How can this be quantified?
- In what processes and contexts have my data being involved? How do such processes effect on my privacy?
Technolgies Used: Web technologies, virtualization platforms and protocols, web services, service composition and execution, databases, distributed information repositories, data mining, digital signatures and certificates.
Background: The preeminence of Internet as a disruptive platform has given rise to new business models such as Google-like advertising models and especially Software as a Service (
SaaS?). In the light of this trend, IT managers have started to servitize their products to provide added value and productize their services so that they can be delivered more efficiently and at lower costs. The provisioning of resources (either data or services) over the Web appears as the silver-bullet for delivering IT services, minimizing costs while maximizing the potential market.
In this respect, expanding cloud computing models are driven by an expectation of seamless connectivity of resources to provide for anytime, anywhere resource access and services. However, uptake of cloud solutions can only happen if trust and privacy of user (individuals, companies, or government bodies) information and of the data owned by such user entities is ensured and guaranteed by cloud platforms in the form of contract policies whose fulfilment can be automatically verified. Furthermore, the cloud removes the physical notion of national boundaries for the services manipulating user data and, as a consequence, it also removes the boundaries for their exposure to abuse (
Suri, N., Clarke, J. The Borderless Trust Element. INCO-TRUST Workshop, New York, May 2010). Both resources and cyber abuse thus become unlimited by the absence of physical boundaries, diluting the responsibility of such abuse and hampering the identification and enforcement of the appropriate legal framework for their prosecution.
Cloud technologies are by nature heterogeneous and change at a fast pace. Therefore, trust and privacy management must be done at the level of the data itself, addressing the entire data chain ranging between data acquisition, data dissemination, data storage, and data usage, constituting an interesting scenario for the analysis and evaluation of data provenance.
--
JoseManuel - 15 May 2010
to top