WISSKI - Scientific communication infrastructure
Project duration: 2009 – 2011/12
Project sponsorship: Deutsche Forschungsgemeinschaft (DFG)
Initiated by the Germanisches Nationalmuseum, Nuremberg, the Zoologisches Forschungsmuseum Alexander Koenig, Bonn (ZFMK), and the Lehrstuhl Informatik 8 (Department of Computer Science) for artificial intelligence at the Friedrich-Alexander University Nürnberg-Erlangen, the project intends to develop a system based on internet technology that supports the scientific communication and documentation of our cultural heritage, using the ISO-certified CIDOC-CRM standard for internal data management and analysis. The aim is to demonstrate how ontology-supported knowledge processing should be implemented and what advantages would be gained in comparison to previous attempts at the exchange and integration of data.
The starting point for the project was the question asked in the GNM as well as the ZFMK of how today’s extensive data collections assembled in scientific projects and managed in very specific databases could be preserved and kept serviceable for future projects.
Object-based research projects conducted in museums generate comprehensive collections of primary information that form the basis for further research. These collections, presented in the form of catalogues or corpora, generally are so extensive that for cost reasons only a selection of the information gathered may be considered for publication, losing a considerable amount of information at the completion of the respective project – or rather making access extremely difficult – due to the lack of digital utilization concepts.
Until a few years ago the answer to the preservation of information was easy: the paper records of a project were archived and made available for further research in the archive. With the ever advancing progress in information and communication technology (ICT), the situation as well as expectations have changed. For one, archiving digital information faces serious difficulties since digital formats age quickly and are no longer readable after a few years. However, we are not only dealing with archival preservation but also with updating and provision of data via IaC-technologies and thus with their lasting embedding into active research.
The WissKI project attempts to work out generally applicable solutions for this desideratum, envisioning the construction of internet based, professionally moderated information portals, carried by the respective research group, whose content is generated by museum objects. These portals may be employed as cooperative tools for information products or publications for different projects. One of the first examples of these cooperative information portals are the well-known Wiki platforms (Wikipedia/Wikimedia/Guttenplug).
The success of these portals is determined by several factors. On the one hand, the success of the Wiki concept in the area of online encyclopedias lies in the “democratic” editing process (the participation concept), on the other – and in a high measure –, in the simple mechanism of knowledge compilation and representation. In regard to SciCI, the question arises how such concepts may be transferred to a cooperatively managed scientific work platform and what special scientific requirements may apply.
Not to reinvent the wheel, the project starts with DRUPAL, an already existing modular, built-up Content Management System. Within the framework of DRUPAL, new modules will be developed which support the cooperatively generated museum data and the publication of scientific corpora in the internet in such a way that this information may remain current through the collaboration of the scientific community. At the same time, this information forms the primary scientific foundation for a discussion and publication platform which in due course will facilitate the installation of decentralized museum competence centers.
Generally, the Wikipedia concept of Web-content publishing and knowledge linking will be supported, enhanced by classic, current and new Content Technologies (in-depth Content Analysis and Semantic Annotation of Content).
The enhancements are:
- Construction of digital rights andmoderation components
- Securing the identity of the author
- Securing the authenticity of the information
- Construction of quoting ability of the contributions
- Digital Preservation, NESTOR
- In-Depth analysis via CIDOC-CRM (ISO 21127)
- Sustainability of information and input (post-utilization) through complete embedding of diverse project data
- Retrieval of digital texts and images for a future virtual European or national digital library
- Content-related and technical conversion of international standards, such as OWL, RDF, DC and OAI-PMH
As a net-based system there are Semantic Web technologies available for conversion. In order to be able to use the CIDOC-CRM, an implementation had to be realized together with the Erlangen CRM/OWL (ECRM) on the basis of the Web Ontology Language (OWL), developed by the World Wide Web Consortium (W3C). Ontologies in computer science are based on formal logic and are used to formally define, categorize, describe and infer knowledge. The Erlangen CRM uses the OWL dialect Description Logics (DL), which corresponds to the description logic SHOIN(D). The syntax of OWL is legible to machines as well as people (with appropriate knowledge). The formally defined semantic allows for the modelling of concepts with clear meanings. Limiting itself to SHOIN(D) allows OWL-DL high expressivity while at the same time remaining completely computable and determinable. Contrary to other logic-based systems, completeness of the information at hand is not assumed (Open World Assumption), i.e. unknown facts are not declared as incorrect. Since knowledge about cultural heritage is always incomplete, this characteristic of OWL is of great advantage.
Ontologies are distinguished by their expandability. One method of expansion is the formation of sub-concepts or sub-characteristics that inherit all characteristics of their parents and at the same time may be expanded by further characteristics.
WissKI envisions a three level ontology model which would enable it to expand the CIDOC-CRM if necessary. A reference ontology is created as superordinate first level to which the lower levels refer in their concepts. A sub-ontology is introduced on the second level, performing systematically relevant first expansions, for example primitive data types not otherwise specified.
The third level will display so-called application ontologies that specify concepts and characteristics needed for a particular application (specific content-supported project). If for example the system is meant to serve the documentation in the museum, then the concept of an inventory number as sub-concept will be introduced as concept E42 Identifier of the CIDOC-CRM, supplied with respective restrictions. The system automatically adapts the data collection tools for each application ontology of which any number may be established so that a high flexibility and granularity will be achieved. The data acquisition is supported by the integration of standard data. The entities of these ontologies represent concrete data which are collectively provided in the form of so-called triples (subject-predicate-object declarative sentences) in a Triple-Store (specific form of a database). With regard to the CIDOC-CRM as reference ontology the collectively stored data in the Triple-Store form a single, homogenous data inventory despite the diverse orientations of the application ontologies. This database can be automatically validated and conclusions drawn from it, i.e. the information content is linked. Through this three level modeling, SciCI (WissKI) supports a trans-disciplinary approach that differs from an interdisciplinary approach in so far that not only information is being exchanged but the methodological concepts of the different disciplines are made explicit.
Naturally, the system offers interfaces that facilitate use of the data in the semantic web. Through these interfaces a loss-free data exchange with external systems may be realized which use the CRM as reference ontology.
Dr. Siegfried Krause (project management)
Georg Hohmann M. A.
Mag. rer. nat. Gerald Hiebel, scientific personnel