Advance Reservation in Workflows (PhD Thesis, finished)
As Service Oriented Architectures (SOAs) are becoming widely deployed in a variety of domains, e.g., in e-Commerce or e-Science, the focus is shifting more and more from mere deployment and integration issues to other, non-functional aspects like Quality of Service (QoS), for example to reserve storage, network or computational resources in advance. A SOA separates functions into distinct units (services), which can be distributed over a network and can be combined and reused to create larger-scale applications (workflows). A widespread language used for defining such workflows in WSDL/SOAP-based SOAs is the Business Process Execution Language (BPEL). So-called Scientific Workflows are generally distinguished from Business workflows by means of two criteria: they involve vast amounts of data, along with computationally expensive processing steps. While QoS may be a requirement for some applications, e.g., when results are needed in real-time or generally "as fast as possible", any kind of service (or process) execution can benefit from the predictability that such QoS contracts (or Service Level Agreements, SLAs) can provide. Assuming that such an agreement covers, for example, computational power, i.e., CPU times or shares, service consumers can weigh cost against speed of execution, based on the individual requirements. Service providers may be able to achieve an (economically) optimal resource usage by careful negotiation. Suppose that a user wants to run a scientific workflow taking advantage of QoS criteria, where SLAs with the service providers are established by Advance Reservations (AR). While this task is still relatively easy for individual service invocations scheduled to start at a given point in time, to use ARs in a composed service workflow which consists of a partially ordered set of service invocations, one needs to answer the following two questions: For how long should a reservation for a particular service be made? Since the service implementation is on the provider side, it is generally the service provider that has to make this information available. Note that the provider also has to take measures to enforce this prediction, such as controlling CPU usage. When should a reservation for a particular service start? In a workflow setting, individual service calls will usually depend on the execution or output of previous operations -- so anticipating the start time in turn resolves to answering the previous question. Our objective is to create a system, called DWARFS (Distributed Workflow engine with Advance Reservation Functionality Support), that can support QoS at workflow level. This involves research topics at several abstraction layers, namely: Enforcement of SLAs at the provider side, i.e., development of components that allow service providers to control the resources required for the execution of services, so that individual service invocations are actually in-line with committed guarantees. Models and strategies for providing QoS and establishing SLAs at workflow level: from a customer perspective, the Workflow execution engine is a service (and agreement) provider, while in essence it is acting as a proxy that itself has to take the customer role for negotiating agreements with the providers of the target services. This poses challenging questions including, but not limited to, the (semantic) evaluation of the agreement terms and matchmaking of the possible providers, re-negotiation strategies especially for failure handling, possibly redundant reservation strategies for extremely important processes, etc. Distributed workflow orchestration: whereas any workflow execution is by definition decentralized in the sense that the operations take place at various independent providers, our goal is to also distribute the orchestration engine itself. To name just a few assets, a decentralized system helps avoid bottlenecks, hot-spots and single points of failure that a centralized execution engine could potentially create. In addition, especially in scientific workflows where large data volumes are transported during the orchestration, overall performance also may benefit from having the execution engines in proximity to the target services. Challenges in this area are mostly related to state and data management, for example making sure that the distributed engines behave coherently (as a centralized engine would) with regard to individual processes' states, and, in conjunction with advance reservations, "clever" placement of the data needed during the process execution, thus making optimal use of the infrastructure and resulting in the best possible execution performance.