NFDI4Ing Mission Statement
Aus dem Abstract https://doi.org/10.5281/zenodo.4015201:
“NFDI4Ing brings together the engineering communities and fosters the management of engineering research data. The consortium represents engineers from all walks of the profession. It offers a unique method-oriented and user-centred approach in order to make engineering research data FAIR – findable, accessible, interoperable, and re-usable.”
Please click on to join the discussion on the topics discussed in the sections!
NFDI4ING is funded by DFG Project Number 442146713
The Mission:
NFDI4Ing builds the bridges
- To foster quality-controlled measures for research data management in the engineering sciences by providing methodologies and services
- For the long-term preservation of research data in repositories, so that future generations of researchers can also benefit from it
- For the semantic interoperability of metadata through terminology services
- To NFDI through coordinated identity infrastructures and service specifications, and
- NFDI4Ing alltogether
- To bring together the engineering communities
- For training and education of future scientists to foster the cultural change in the engineering sciences
NFDI4Ing is Guided by the Following Principles:
- Openness:
- All outcomes of NFDI4Ing are open to the public
- All engineering and related scientific disciplines are invited to participate and to contribute
- Open source and open licenses for research data management methodologies and processes
- Open standards and open formats
- Open science and open sharing of research (meta)data, whenever possible
- Community-drivenness:
- NFDI4Ing complies with the requirements of the engineering disciplines
- FAIR principles including long term sustainability
- Reliability:
- All NFDI4Ing services are expected to be production-ready
- Practicability and feasibility:
- In all scales of organizational structures (from single scientists up to international organizations)
- Simplicity vs. Complexity
- Harmonization:
- Across data standards, policies, technologies, infrastructure, and scientific disciplines
- Connectivity and technical scalability of NFDI4Ing services with related RDM services and engineering software
- Wide range of data objects: integration and support various types of data, software and other research artefacts
Technical Interoperability
Based on New European Interoperability Framework, https://dx.doi.org/10.2799/78681
Technical interoperability is commonly defined as the “ability of different information technology systems and software applications to communicate and exchange data”. This definition may be also completed by adding the “ability to accept data from each other and perform a given task in an appropriate and satisfactory manner without the need for extra operator intervention”. That is, it is sometimes completed with an aspect focused on the complete automation of such data exchange.
In the context of this document, we are referring not only to the exchange of data (across scientific experiments, organisations or even communities), but also of other research artefacts that are commonly used in research (software, workflows, protocols, hardware designs, etc.). According to the European Interoperability Framework (EIF), technical interoperability covers “the applications and infrastructures linking systems and services, including interface specifications, interconnection services, data integration services, data presentation and exchange, and secure communication protocols”.
Some examples of technical interoperability aspects and different models for describing data and metadata that have been used in the state of the art are summarised in the following figures:
- Linked Data and Research Object models (W3C RDF, OWL)
- Microservice architecture (e.g. REST, SOAP)
- Scientific workflows
Problems and Needs
At the level of technical interoperability, some of the usual problems should be considered:
- When trying to work with infrastructures or services across communities, authentication and authorisation often needs to be performed separately for each community/service. Even though there are technical means and industry-based standards (e.g., SAML, OpenID) to overcome this, authentication often involves transferring personal information between identity provider and service provider, and authorisation is very hard to harmonise based on centrally-maintained user attributes. While authentication services can be hosted by organizations in a federated environment, authentication infrastcutures should also consider individual researchers without a home organization.
- Research data may be made available in multiple general-purpose formats (CSV, Excel, database dumps, JSON, XML, HDF5 etc.) or community-based models (Darwin Core, FITS, NetCDF, shapefiles, openDRIVE, openSCENARIO), which are usually hard to align when reusing datasets across communities. In the case of general-purpose formats, semantic interoperability problems also appear because of the lack of agreement in attributes or column headers, the absence of headers or adequate documentation, etc.
- Coarse-grained or fine-grained research data from other communities may be difficult to find, given the lack of knowledge about how to query their repositories.
- Multiple service providers for different types of PIDs exist (ORCID, ROR, DOI, Handle, ARK) and their usage varies depending on the community and implementations in services.
As a result of this analysis, these are some of the needs that can be identified at the level of technical interoperability
- There is a need for support for the process of authenticating to and obtaining the rights to use the services offered by NFDI4Ing in a way that is as unobtrusive as possible [Reference: NFDI Task Force Tools “AAI”] and that is independent of any single community.
- There is a need for NFDI4Ing to provide a trust (and sustainability) framework across scientific communities, collaborations and infrastructures. For the user this means that what works today will work tomorrow, only better [Reference: NFDI Section Infrastructure Commons].
- There is a need for simpler tools that allow dealing seamlessly with data available in multiple generic or community-based formats.
- When searching for research data (or other research objects) that may be reusable across communities, such data may need to be discovered at different levels of granularity: high level / coarse-grained.
- There is a need to have a common and well-understood PID policy across communities.
Recommendations
Some of the recommendations that can be made in this respect are:
- Use open specifications, where available, to ensure technical interoperability when establishing NFDI4Ing services.
- Define a common security and privacy framework and establish processes for NFDI4Ing services to ensure secure and trustworthy data exchange between all involved parties.
- Use an AAI process for NFDI4Ing that is common across communities, easy to implement by resource providers and easy to understand by users, e.g. DFN-AAI.
- Create Service-Level Agreements for all NFDI4Ing resource providers that are easy to understand by users from different communities.
- Enable easy access to data sources available in different formats, either generic or community-based, to facilitate overcoming their heterogeneity and allow integrating data across communities, and to tools enabling the usage of the data.
- Make coarse-grained and fine-grained dataset (and other research object) search tools available. Consider a range of general-purpose and domain-specific/specialised search tools, exploiting general-purpose and domain-specific metadata.
- Create a clear NFDI4Ing PID policy, accommodating an appropriate PID usage, recognising that established practises are at different levels of maturity for different resources and new PID types may emerge.
Semantic Interoperability
Semantic interoperability can be defined as “the ability of computer systems to transmit data with unambiguous, shared meaning. Semantic interoperability is a requirement to enable machine computable logic, inferencing, knowledge discovery, and data federation between information systems”. (FAIRsFAIR deliverable D2.1 Report on FAIR requirements for persistence and interoperability 2019. https://zenodo.org/record/3557381)
That is, semantic interoperability is achieved when the information transferred has, in its communicated form, all of the meaning required for the receiving system to interpret it correctly, even when the algorithms used by the receiving system are unknown to the sending system. Syntactic interoperability (which is commonly associated with technical interoperability) is sometimes identified as a prerequisite to semantic interoperability. It ensures that the precise format and meaning of exchanged data and information is preserved and understood throughout exchanges between parties, in other words ‘what is sent is what is understood’.
Semantic interoperability is established by shared semantic artefacts (ontologies, thesauri) across the communities, which allow homogenising the interpretation and treatment of the exchanged data, and all of its associated resources.
Problems and Needs
At the level of semantic interoperability, some of the usual problems should be considered:
- There is a generalised lack of common explicit definitions about the terms that are used by user communities. This is especially a problem in the case of trying to share resources across communities.
- Not only term definitions are usually lacking, but also common semantic artefacts across communities (e.g. general ontologies that can be shared). And in case that they exist, these artefacts may not be sufficiently well documented.
- The previous problem is exacerbated by the fact that there is a generalised lack of common reference repositories or registries of semantic artefacts (e.g. ontology catalogues). Only some communities are actively maintaining such resources (e.g. Schema.org).
- Data collections are usually poorly documented, in terms of the metadata that is made available for them. Besides, there is no common metadata schema across communities, what results in different ones being used in different communities (e.g. DCAT, DDI, DataCite, RDA Metadata Standards Catalog, FAIRSharing).
- Depending on the discipline, there is a lack or over-abundance of metadata models that allow the description, functional preservation and ultimately re-use of the data stored.
- In some communities, there is lack of expertise and skills related to semantics, what influences negatively in the availability and use of common definitions, semantic artefacts, reference repositories, etc. This aspect is sometimes known as the “human interoperability” problem.
As a result of this analysis, these are some of the needs that can be identified at the level of semantic interoperability:
- Need for principled approaches and tools for ontology and metadata schema creation, maintenance, governance and use. Different communities are using different tools and representation models for their semantic artefacts. It is not uncommon to see UML models being used as standardised models for such representation, lacking sometimes the needed formality to describe terms and their relationships.
- Need for harmonisation across disciplines or types of data. It should be possible for a user of one community to add metadata to existing items (data and semantic artefacts) according to their own research discipline practices (e.g. a social scientist can add DDI-based metadata for a dataset coming from an environmental scientist). Allow a researcher from a discipline to transform metadata (or data) from one discipline’s format/annotations to another.
- Need for federated access over existing research data repositories (both inside a discipline and across disciplines). How to support discovery of data on the basis of a high-level description, and possibly also on more details like concepts related to observations and variables?
Recommendations
Some of the recommendations that can be done in this respect are:
- Generate clear and precise definitions for the concepts used, as well as their metadata and data schemas. Make them publicly available, referenced by a persistent identifier and shared in NFDI4Ing. Use a shared classification for research disciplines (e.g. DFG’s subject area classification).
- Ensure that every semantic artefact that is being maintained in NFDI4Ing has sufficient associated documentation, with clear examples of usage and conceptual diagrams.
- Make semantic artefacts FAIR with open format and license.
- Use a common repository of semantic artefacts, and a governance framework for such a repository.
- Allow extensibility options for disciplinary metadata that is typical for some research communities, for users/researchers to add annotations according to the established practices of their communities.
- Enable recording of sufficient provenance information on annotations and versioning support.
- Use a simple vocabulary for allowing discovery over existing federated research data and metadata (extension of DCAT-AP, DDI 4 Core, or DataCite core schema).
- Beside data also consider other types of resources used in science, such as software, methods, scientific workflows, laboratory protocols, hardware designs, etc.
- Create clear protocols and building blocks for the federation/harvesting of semantic artefacts catalogues.