ClouDMan: Cost-based Data Management in Cloud Environments (Finished)

During the last years, Clouds have increasingly become very attractive environments for deploying different types of applications. The main reason for this popularity is the 'pay-as-you-go' cost model of the Cloud, combined with its almost unlimited scalability and high availability. From the perspective of organizations or companies using the Cloud, the pay-as-you go cost model allows to only pay for the resources actually used. Traditional problems of over-provisioning (i.e., when the IT resources -usually complete compute centers- were designed for a much higher expected load than what was actually faced, which led to additional, unnecessary costs for the organization/company) or under-provisioning (i.e., when due to the lack of IT resources customers had to be turned away) is fortunately belonging to the past. Cloud environments are highly elastic which means that they provide a vast amount of resources that can be used by Cloud customers on very short notice, thus guaranteeing that the underlying IT environment adapts and dynamically scales to the actual needs. Elastic behavior, almost unlimited scalability, and in particular high availability has strong consequences for data management in the Cloud. A high degree of availability is provided by geographically replicating data inside a Cloud, i.e., by using resources at different sites of a Cloud provider. This, in turn, necessitates distributed transactions to guarantee data to be consistent.

While distributed transaction management and replication management have been subject to intensive research in the past decades, the Cloud comes with a new dimension that necessitates to reconsider and rethink current approaches, algorithms and protocols: the cost dimension. As a consequence of the pay-as-you go cost model of the Cloud, each resource and its usage comes with a price tag, usually at a very fine-grained level. Users of the Cloud have to pay, for instance, for each megabyte of storage used, for each CPU cycle, for incoming and outgoing megabytes of data traffic, and even for each message placed in a queue hosted by a Cloud provider. Even worse, these prices not only differ between Cloud providers, they may also (significantly) differ between different data centers of the same Cloud provider.

Hence, the consideration of i.) data consistency, ii.) performance, and iii.) cost opens new areas for research in distributed data management and new possibilities for optimizing existing protocols.

The objective of the ClouDMan project is to investigate new approaches to Cost-based Data Partitioning and to Policy-based Data Management. The former aspect, Cost-based Data Partitioning, takes into account that different sites of a Cloud provider come with different pricing schemes. Therefore, optimizing replicated data management with regard to consistency, performance, and cost needs to seamlessly consider data placement, in addition to the number of replicas and the protocol for propagating updates to replicated data. The second aspect, Policy-based Data Management, takes into account that many applications come with dedicated requirements and restrictions on data placement, performance, cost, or consistency such as `data may not be stored outside the country of its origin', `data management has to be provided as cheaply as possible' and/or `1-copy serializability has to be provided'. The goal is to automatically select the best suited protocol for meeting the requirements and constraints for replicated data management in the Cloud, on the basis of the specified policies.

Start / End Dates

01.11.2013 - 30.04.2015

Funding Agencies

Swiss National Science Foundation (SNF)

Funding

86'080.- CHF

Staff

Research Topics

Cloud Data Management

Publications

2017

Ilir Fetai, Alexander Stiemer, Heiko Schuldt
QuAD: A Quorum Protocol for Adaptive Data Management in the Cloud
Proceedings of the 2017 IEEE International Conference on Big Data (Big Data 2017) , Boston, MA, USA 2017/12

2016

Alexander Stiemer, Ilir Fetai and Heiko Schuldt
Analyzing the Performance of Data Replication and Data Partitioning in the Cloud: the Beowulf Approach
Proceedings of the 4th International Workshop on Scalable Cloud Data Management (SCDM 2016) - co-located with IEEE Big Data 2016, Washington, D.C., USA 2016/12
Ilir Fetai
Cost- and Workload-based Data Management in the Cloud
PhD Thesis, Department of Mathematics and Computer Science, University of Basel 2016/9
Filip-Martin Brinkmann, Ilir Fetai, Heiko Schuldt
SLA-basierte Konfiguration eines modularen Datenbanksystems für die Cloud
D. Fasel, A. Meier (Hrsg.): Big Data: Grundlagen, Systeme und Nutzungspotenziale (Edition HMD), 2016/7

2015

Alexander Stiemer, Ilir Fetai and Heiko Schuldt
Comparison of Eager and Quorum-based Replication in a Cloud Environment
Proceedings of the 2015 IEEE International Conference on Big Data (Big Data 2015), Santa Clara, CA, USA 2015/10
Ilir Fetai, Damian Murezzan and Heiko Schuldt
Workload-Driven Adaptive Data Partitioning and Distribution – The Cumulus Approach
The Proceedings of the 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA 2015/10

2014

Ilir Fetai, Filip-Martin Brinkmann and Heiko Schuldt
PolarDBMS:Towards a Cost-Effective and Policy-Based Data Management in the Cloud
Proceedings of the 6th International Workshop on Cloud Data Management (CloudDB 2014), Chicago, IL, USA 2014/3