HomeIMembersILinksIWorkshop & Events
News

Delos DLMS Prototype

Delos Conference: eHealthDL
Management of and Access to Virtual Electronic Health Records PDF

Delos Conference: DataStream in DL
Integration of Reliable Sensor Data Stream Management into Digital Libraries PDF

Guidelines for OSIRIS/ISIS integration planned in JPA3
The OSIRIS Process Support Middleware and the ISIS Process-Based Digital Library Application PDF
[TechnicalOverview]
[Objectives]
[Integration activities]
[First 18 Months Programme JPA1]
[Second 18 Months Programme
JPA2]
[Deliverables]
Digital Library Architecture
Objectives of our Workpackage

Citizens of the future should be able, through the medium of better designed digital libraries to gain access to a myriad of forms of knowledge from anywhere and at any time and in an efficient and user-friendly fashion. But for this to happen those digital libraries will need to arrive at a common infrastructure which is highly scalable, customizable and adaptive. From a technical viewpoint, this infrastructure has to support state-of-the-art and promising innovative models and techniques, and frameworks to develop and evaluate digital libraries, and has to be highly customizable, configurable and adaptive. To this end, various activities and developments have to be seamlessly integrated into a coherent whole to develop such a generic and modular digital library infrastructure. This includes the following architectural approaches, processes, and activities.

Architectural Approaches

Peer-to-Peer Architectures: Information providers within digital libraries are highly autonomous. Data and documents cannot be integrated into a single source. Hence, mechanisms to retain this autonomy and to loosely couple information providers are needed. This also includes the user clients so as to facilitate some collaborative data sharing among them (e.g., for annotations and recommendations about DL contents). Peer-to-peer (P2P) architectures allow for such loosely coupled integration. Different aspects of peer-to-peer systems (e.g. indexes, and P2P application platforms) have to be combined and integrated into an infrastructure for digital libraries.

Grid Architectures: Certain services within digital libraries are complex and computationally intensive (e.g., calculation of certain features of multimedia documents to support content-based similarity search). Grid computing architectures allow for sophisticated load balancing strategies within a cluster of components. Following the idea of a service grid, and the handling of the control of shared resources, similar concepts have to be integrated into an infrastructure for digital libraries.

Service-oriented Architectures: When access to data and documents is provided by dedicated services, appropriate mechanisms to describe the semantics and usage of such services have to be put in place. In the context of web services, descriptions of services using service description languages, are stored in service registries. These elements have to be integrated as building blocks into a digital library. Moreover, common service interfaces have to be defined based on existing standards to facilitate service composition.

Processes

Workflow Management: Applications within digital libraries must consider the autonomy and distribution of information providers. Hence, accessing information means combining existing services into mega-applications, i.e. workflow processes. The same is true for applications aiming at managing and controlling the consistency of a digital library. Different aspects of workflow management have to be integrated: such as self-configuration and flexibility, both at the application and at the systems level, and high availability and scalability.

Publish/Subscribe Techniques, Evolution: Services within a digital library have to be made available to the public and have to be accessed by service repositories. Publish/subscribe techniques are a means to make information within digital libraries available and to refresh derived and replicated information sources. Digital libraries are long-lasting institutions, so they have to anticipate changes in the software as well as in the schemata, the ontologies and similar data. The infrastructure for digital libraries therefore has to provide mechanisms to distribute and co-ordinate updates to these components, and to manage the software configuration in such a dynamic environment.

Replication and Freshness of Data: In order to increase the efficiency in accessing information within a digital library, information will be replicated at several places. There will also be duplicates due to independent upload of the same publication at different nodes. However, when changes occur (new data is provided or information is updated), this has to be reflected in all replicas, leading to a trade-off between update costs and freshness of data. Sophisticated mechanisms to trade, in an application-specific way, update costs for the freshness of data need to be provided by the digital library infrastructure.

Mobile Information Components: The combination of wireless and wired connectivity in a pervasive computing environment with increasingly small and powerful mobile devices, such as laptops, personal digital assistants, handheld PCs, and smart phones, enables a wide range of new digital library applications. This additional flexibility has to be supported by the underlying digital library middleware. In particular, profiling and proxy management have to be integrated into the infrastructure. Additionally, mobile devices will require sophisticated visualisation techniques to present digital publications adequately on limited displays.

Functions

Data and documents within a digital library are made available by dedicated services. These services allow for the definition of building blocks, which are tailored to the type of data and documents and implement, for instance, appropriate index structures.

XML Storage and Access: Effective and efficient access methods for documents in XML stores will provide the basis for mediation within P2P information architectures. The emerging language for annotation of digital content is XML. Its power to annotate any document challenges the techniques to store, index and access them. Progress is required in areas such as IR techniques over XML sources, shredding and selective indexing for fast retrieval, clustering, replication management, and transformation aimed at improved transport and platform specific delivery.

Multimedia Access: A key task of a digital library is the maintenance and retrieval of documents of various types and for different search scenarios. Due to the distributed nature, services from different nodes must be able to interact with each other. To this end, well-designed service interfaces are required to ease integration of different providers, e.g., of feature extraction algorithms, indexing services, and retrieval engines. Note: techniques to extract features, and maintain and index such features are covered by other clusters.

Digital Rights Management: Publishers will be reluctant to provide content if there is no digital rights policy enforceable. This includes support for business models such as pay-per-view or subscriptions. Peer-to-Peer (P2P) architectures have a notoriously bad reputation in this respect from their application in the music industry context and this situation needs to be overcome to facilitate business development. Replication within digital libraries adds to the complexity of this aspect.

Security and Certification: The use of Digital Rights Management in autonomous environments such as digital libraries based on peer-to-peer architectures requires the authentication and authorization of users, authors, content providers, and reviewers as well as digital signatures for documents, to ensure the consistency, quality, and reliability of digital libraries.

Finally, the results of the different activities of the architecture cluster need to be integrated and evaluated in concrete settings. To this end, demonstrator systems and building blocks have to be combined. The cluster will check all the solutions that will be developed in a sample application domain, that is currently being analysed by some of the cluster members and that appears to be well suited to evaluate different digital library platforms and infrastructures.

Medical Information Systems: E-Health Warehouses can be considered as special digital libraries. Patient-related information is available from different distributed and autonomous information providers. In order to provide electronic patient records, this information must be integrated at application level. Several issues addressed in the joint research activities are important: appropriate service interfaces of building blocks, workflow management for application development, P2P infrastructures, replication and freshness, security and certification.

top
Digital Libraray Architecture Integration Activities

The integration activities in the Digital Library Architectures cluster will address network and basic services architectures that allow integrated access to distributed digital libraries. Thus, the following objectives are addressed:

• Development of surveys that collect the most significant contributions and promises in Digital Libraries Architectures

• Developments of prototype software modules and components for web services, multiple service composition and management, wireless connectivity

• Test of the solutions on a prototype ongoing application.

In order to achieve these goals, the following research activities will be supported by the cluster:

Surveying the State of the Art: Fundamental architectural problems in digital libraries cover the adoption of new networking architectures, definition and adoption of new standards, integration of system components into a cohesive and consistent digital library application and workflow process, integration with new emerging transmission media. The cluster will survey these subjects, identifying emerging solutions, technologies and promising scientific results. In particular the cluster will produce surveys on:

• Replication and Freshness of Data: Mechanisms for digital libraries.

• Security and Certification: Mechanisms for digital libraries

• Network Architectures including Peer to Peer and Grid architectures (differences, common features, synthesis, application scenarios)

• Collection level descriptions to enhance information discovery and assist with information management within service registries.

Development of Peer-to-Peer and Grid Architectures: Peer-to-peer (P2P) architectures allow for loosely coupled integration. Different aspects of peer-to-peer systems (indexes, P2P application platforms, etc.) have to be combined and integrated into an infrastructure for digital libraries. On the other hand, grid computing architectures allow for sophisticated load balancing strategies within a cluster of components. Following the idea of a service grid, and the handling of the control over shared resources, similar concepts have to be integrated into an infrastructure for digital libraries. The cluster will develop a demonstrator system that implements those relevant aspects that have been identified as important for the effective exchange of digital library contents.

Development of Service-oriented Architectures and Workflow Management Facilities: The cluster will develop a feasibility study on common protocols for generic service models that provides appropriate descriptions of the available services. Web services are to be integrated as building blocks into digital libraries both to provide access to individual services and to define common services. For applications that utilize digital libraries of autonomous information providers or applications that manage and control the consistency of a digital library, existing services are to be integrated into workflow processes. Both at the application and at the systems level, different aspects of workflow management like self-configuration and flexibility, high availability and scalability, must be included. The cluster will develop a prototype system that demonstrates the feasibility of these solutions in the context of distributed digital libraries.

Development of Mobile Information Components:

The cluster will design and develop a demonstrator that embeds innovative solutions to the most compelling requirements for the access to digital libraries using a combination of wired and wireless connectivity. The demonstrator will embed a specific middleware that will adapt the content to the limitations of handheld devices, for the specific context of digital libraries.

Development of a Medical Information System that exploits Innovative Solutions: The cluster will check all the subjects of investigation referred to in the previous items within the framework of a medical application, currently ongoing at UMIT, and using real data from clinical applications.

top
First 18 Months Programme (JPA1)

Information architectures for digital libraries are now evolving in parallel with architectures for peer-to-peer (P2P) systems, Grid-enabled environments and institutional repositories. In some cases, developments have proceeded in a fragmented way perhaps on a local basis (within an organization), in particular domains and sectors (within museums, libraries and archives) and within disciplinary boundaries (in bio-medicine or the performing arts). Synergies between initiatives are becoming apparent both at the technical level and in terms of the broad operating principles being adopted by the parties involved. For example, many of these developing architectures are predominantly service-oriented; they are adopting emerging Web Services standards and are becoming increasingly user-focused in their presentation. A main contribution of this workpackage will be to facilitate the development and integration of building blocks for digital libraries. This requires both the identification and specification of service interfaces and the definition of a generic digital library architecture that is highly customizable to individual application domains and national requirements. In the first 18 months we will concentrate on the following directions

• New approaches to the architecture for an “intelligent management” of digital libraries

• Enabling the coordinated Development of Information Architectures by an adoption of a set of common standards & protocols.

• Managing information dynamics and mobility

New Approaches to the “Intelligent Management” of Digital Libraries

We can distinguish three different approaches to DL architectures. The first is the (Web) service architecture (SA) and includes the related standards for describing, finding and invoking services. The SA leads to a new way of building distributed information systems, specifically DLs. Many applications do not need complete answers or fully fresh data. This brought up new distributed data management concepts subsumed under distributed Peer-to-Peer (P2P) data management. In parallel to this approaches a third direction, the Grid computing evolved and the associated Open Grid Service Architecture (OGSA) is leading to the development of increasingly complex computer systems. These Grid architectures show high potential for the management, discovery, and load-balanced use of distributed digital library services, together with application-specific security aspects. Grid-enabled environments with distributed processing capabilities, the utilization of remote resources and dynamic online experimentation have encouraged the consideration of new approaches to information system management.

It is necessary to study these approaches in detail and to evaluate their advantages and disadvantages using adequate benchmarks that do not exist at present. Rather the fields are quite separated. This important step must be done during the first 18 months and will enable us to develop new intelligent architectures for a future DL architecture. The principles of autonomic computing , applied at the level of digital library applications and services, even extend the Grid architecture as a possible solution to addressing these challenges. This strand investigates the applicability of this approach to digital libraries through a mix of assessment and dissemination. Links may be possible to proposals following the Grid computing theme identified for the second call. The summary of outputs consists of:

• An evaluation of P2P, Grid and Service architectures to identify the benefits of each architecture for digital library applications. The evaluation includes the joint development of a benchmark and executions of the benchmark on selected different implementations of different architectures.

• Starting a joint demonstrator development integrating the benefits of the service architecture, Grid technology and P2P data management into digital library infrastructures.

Enabling the Co-ordinated Development of Information Architectures

This track describes the mechanisms, processes and tools that will be required to ensure the co-ordinated development of large-scale DL architectures. It is based on the principle of working towards common models, frameworks and platforms which span national boundaries, sectors and communities and which will be widely adopted and implemented. It sets out to build on significant existing work such as major national and international DL infrastructure initiatives, e.g. the UK JISC Information Environment , key standards developments e.g. Open Archives Initiative , and emerging service models e.g. Web Services . The outputs are designed to jointly facilitate the continued and sustainable development of a range of common models and frameworks.

These services will be based on established and emerging standards and protocols. The Web Services model is currently being promoted as a common foundation for e-service development however, a feasibility study is proposed to scope and assess the validity and durability of this approach and others, across a range of networked environments, sectors and disciplines. There is existing development of proof-of-concept demonstrators and pilot services to illustrate the potential of a set of shared infrastructure services. Examples include a service / collection description registry (e.g. JISC IESR), metadata schema registries (e.g. CORES, MEG), a cross-search broker service (e.g. Xgrain – EDINA), a resolver service (e.g. ZBLSA – EDINA), an ontology server (link to SEMKOS IP), and an authentication and authorization service (ATHENS), and a platform for the management of large-scale information spaces as e.g. ISIS/OSIRIS (ETHZ).

The summary of outputs includes:

• A comparison and feasibility study on the adoption of a set of common standards & protocols.

• Starting a joint development of infrastructures as demonstrator systems following selected standards & protocols

Managing Information Dynamics and Mobility

The combination of wireless and wired connectivity in a pervasive computing environment with increasingly small and powerful mobile devices, such as laptops, personal digital assistants, handheld PCs, and smart phones, enables a wide range of new digital library applications. We see an ever-increasing number of information providers and data sources, reaching from traditional databases and large document collections, information sources contained in web pages, down to information systems in mobile devices and embedded information in mobile “smart” objects. This leads us to considerably greater dynamics of information and as a consequence to the need for the infrastructure to keep track of dynamic information and mobility: information changes, information is replicated or is derived. New information providers and services appear any time. Clients connect and disconnect any time. After re-connection they want relevant refreshed information. The infrastructure envisaged here must be much more sophisticated compared to the state-of-the-art middleware in that it automatically performs – among many tasks - the following:

• It starts maintenance processes in order to keep replications and search engines consistent in case of new information or changes in the information.

• It keeps track of mobile clients, their location and their context and propagates relevant information after re-connections.

• In view of many concurrent processes the infrastructure executes them in a decentralized, peer-to-peer fashion and avoids central components as much as possible in order to be scalable and reliable.

This direction exhibits a close relationship to the WP on Information Access and Visualization, since it must provide the basic services for accessing and managing digital libraries via various mobile and non-mobile devices.

Summary of outputs:

• A quantitative evaluation of different concepts for synchronization and connection management and a synthesis of various approaches for the infrastructure of the dynamic digital library with mobile components

• Definition and implementation of basic services needed for supporting information dynamics and mobility

• A workshop on mobile information components for e-health monitoring as application of DL in Medicine

top
Second 18 Months Programme (JPA2)

The overall goal of this workpackage is to analyze, develop, and integrate architectures for digital libraries.

Strategic Objectives

During the next 18 months (January 2005 – June 2006) the primary goals of the first 18 months of the project will continue to be actively pursued, with the last two topics being addressed in more detail during the first six months of 2005.

• Developing new approaches to the architecture for an “intelligent management” of digital libraries

• Enabling the coordinated development of information architectures by an adoption of a set of common standards and protocols

• Managing information dynamics and mobility

Workpackage Activities and Integration

General Digital Library Architecture issues will continue to be considered in JPA2, with the main focus aiming at the development of a DL reference model. At the same time, two new tasks in JPA2 will more concentrate on digital library architectures for special purposes. The main focus will be on Digital Library Architecture for e-Health applications. One of the goals of WP1 will therefore be the investigation and provision of a dependable platform. Stream data play an important role in e-Health applications. So, the platform must also support stream processing based on the integration of stream operators and web services. In addition to the tasks already described in JPA1, the following JPA2 tasks will be pursued.

T1.4 A Reference Model for Digital Library Management Systems.

The objective of this Task is to introduce a reference model for Digital Library Management Systems (DLMS), i.e. a formal and conceptual framework describing the characteristics of these particular kind of information systems. Task activities will address the following issues:

• Evaluation and survey on architectural frameworks for digital libraries.

• Current digital library systems: user requirements vs provided functionality.

• User requirements.

• Survey on Current Digital Library Systems and Gap Analysis.

• Definition of a Reference Model for DLMSs.

T1.5 Design, Implementation and Evaluation of Multimedia Annotations for Users’ Collaboration.

The main goal of this Task is the implementation and the evaluation of an annotation digital library service, based on the design guidelines and on the existing partners available tools. In particular, task activities will address the following issues: Design and Implementation of the annotation digital library service. The most important requirements that should be fulfilled by the annotation service are the following:

• Support of nested annotations, i.e. not only documents or document parts can be annotated, but also other annotations can be annotated.

• Possibility to reference each annotation by an handle (e.g. the Uniform Resource Identifier (URI) could be one of the schemes to be supported).

• Faithfully represent the sign (e.g., textual, graphical, referential or a combination of these) and the meaning of an annotation, so that different annotation types can be supported.

• Support different scopes of annotations (e.g. private, public, shared).

Definition of a set of APIs to allow the access to this service from different digital libraries. The overall goal of this task will be accomplished by integrating also contributions from WP4 and WP7.

T1.6 Management of and Access to Virtual Electronic Health Records.

This Task is the logical continuation of T1.1 of JPA1. It evaluates the architecture and adds in it aspects of electronic health records, which represent an important application field for digital libraries. The realization of these goals requires an infrastructure that is highly dependable and reliable. Moreover, the infrastructure has to allow for the transparent access to distributed data, and to efficiently schedule the access to computationally intensive services by applying sophisticated load balancing strategies using GRID technology. Common standards as investigated in JPA1, task 2 are equally important. Task activities will address the following issues:

• Identification of the basic building blocks to access distributed artifacts and to intelligently search within a set of these artifacts.

• Provision of a dependable platform that supports the integration of these building blocks into processes (e.g., based on the ETH/UMIT hyper database prototype system OSIRIS – Open Service Infrastructure for Reliable and Integrated process Support), thereby realizing a virtual electronic patient record.

T1.7 Integration of Data Stream Management into an eHealth Digital Library.

This task is a continuation of Tasks 1.1 and 1.3 of JPA1, considering socially relevant applications of future digital libraries to e-health and health monitoring. Continuous data streams generated by (wearable) sensors have to be processed online in order to detect critical situations. In addition to the stream operators, also traditional discrete (web) services, e.g., services that do not operate on continuous input data, have to be integrated. Task activities will address the following issues:

• Survey on use cases from tele-monitoring applications containing the combination of stream operators and web services.

• Specification and implementation of join and search operators for data streams.

Expected Results

Reference Architecture: The reference that will result from the Task desribed above will strongly contribute to enhance and improve the quality of the DLMSs since it will specify their expected features and properties. It will also lay the foundations for establishing what we have achieved until now, where we want to move in the future, what we should do and how we can evaluate priorities and measure advances.

Design, Implementation and Evaluation of Multimedia Annotations for Users’ Collaboration: Implementation of an annotation digital library service, based on the design guidelines and on the existing partners available tools, and defining a set of API to allow the access to this service from different digital libraries. Integrating this service into the DAFFODIL and BRICKS digital library systems.

Management of and Access to Virtual Electronic Health Records: Implementation of sample building blocks and processes in combination with the specialized applications of HITT/TILAK.

Integration of Data Stream Management: Prototype implementation of an infrastructure for Workflow processes including stream processing supporting the integration of stream operators and web services.


Task Reports

Task Report T1.4 Reference Model, doc

Task Report T1.5 Annotations, doc

Task Report T1.6 Health Record, pdf

Task Report T1.7 Data Streams, pdf

Task Report T1.8 DelosDLMS, pdf

top
last update 17.2.2006
webmaster info@kursiv-berlin.de