Cloud Data Management

Cloud Computinghas become a very attractive paradigm for outsourcing entire applications. As data management is an integral part of the application landscape, data management in the Cloud is increasingly gaining attention. Work at the databases and information systems group addresses important data management issues in Cloud environments. In particular, this includes:

  1. Flexible, policy-based modular data management in the Cloud (PolarDBMS): In the previous years, Cloud computing has attracted a large variety of applications which are completely deployed on resources of Cloud providers. As data management is an essential part of these applications, Cloud providers have to deal with many different requirements for data management, depending on the characteristics and guarantees these applications are supposed to have. The objective of a Cloud provider is to support these diverse requirements in a cost-effective way with a basic set of customizable modules and protocols that can be (dynamically) combined. Thus, for the application providers, it is essential that the needs of their applications are provided in a cost-optimized manner. In our work, we are developing PolarDBMS, a flexible and dynamically adaptable system for managing data in the Cloud. PolarDBMS derives policies from application and service objectives. Based on these policies, it automatically deploys the most efficient and cost-optimized set of modules and protocols and monitor their compliance. If necessary, the PolarDBMS concept allows to dynamically exchange modules and/or their customization.

  2. Cost-based replicated data management in the Cloud: Distributed and replicated data management in the Cloud is governed by the CAP theorem. As a consequence of this theorem, many Cloud providers sacrifice data consistency for a high degree of availability. However, with the pay-as-you-go cost model of the Cloud where the use of each resource is charged at a very fine-grained level, the costs of an application are introduced as additional parameter for optimization. This can include the infrastructure costs (i.e., the resources needed for guaranteeing a certain degree of data consistency), but also inconsistency costs (i.e., costs that incur when a relaxed level of consistency is provided). In this work, we re-consider existing protocols for distributed data management and develop new protocols for cost-aware, dynamic and adaptable replicated data management in a Cloud environment.

  3. Data Archiving in the Cloud (Archiving-as-a-Service, AaaS): With the advent of data Clouds that come with nearly unlimited storage capacity combined with low storage costs, the well-established update-in-place paradigm for data management is more and more replaced by a multi-version approach. Especially in a Cloud environment with several geographically distributed data centers that act as replica sites, this allows to keep old versions of data and thus to provide a rich set of read operations with different semantics (e.g., read most recent version, read version not older than, read data as of, etc.). A combination of multi-version data management, replication, and partitioning allows to redundantly store several or even all versions of data items without significantly impacting each single site. However, in order to avoid that single sites in such partially replicated data Clouds are overloaded when processing archive queries that access old versions, query optimization has to jointly consider version selection and load balancing (site selection). In our work, we address novel cost-aware index approaches (called ARCTIC) for version and site selection for a broad range of query types including both fresh data and archive data.


Research Projects

Thesis Projects