Flexible Data Access in a Cloud based on Freshness Requirements

Authors
Laura Cristiana Voicu, Heiko Schuldt, Yuri Breitbart, Hans-Jörg Schek
Type
In Proceedings
Date
2010/7
Appears in
Proceedings of the 3rd International Conference on Cloud Computing (IEEE CLOUD 2010)
Location
Miami, FL, USA
Publisher
IEEE
Abstract
Data clouds are newly emerging environments in which commercial providers manage large volumes of data with individual quality of service (QoS) guarantees per customer. These guarantees mainly include keeping several replicas of each data item in different distributed data centers for availability purposes. However, as the cost of maintaining several updateable replicas per data object is very high, cloud providers rather offer only a limited number of synchronously updated replicas (i.e., replicas that are always up-to-date) together with several read-only replicas that are updated in a lazy way and thus might hold stale data. QoS agreements may also include the maintenance of dedicated archives (copies of data which are frozen at some point in time). Stale data allow cloud providers to offer a variety of read operations with different semantics, e.g., read the most recent data, read data not older than / not younger than some timestamp t, or read data produced between t1 and t2, or read data exactly as of t. These read operations can be supported by a read-only site using a stale replica. In this paper we present our approach to cloud data management, based on a recent protocol for data grids. We discuss in detail how the refresh of individual replicas is provided in a completely distributed way. Finally, we present the results of a performance evaluation in a data cloud setting.