FSD participates in three tasks within two work packages. These tasks aim at ensuring the interoperability of metadata and data, developing common multilingual vocabularies, and trust and quality assurance of research data.
One challenge in creating common services is the diversity and varying practices of different fields of science. This can be seen in the third work package of SSHOC in a task entitled Data and Metadata Interoperability Hub, where we chart interoperability issues and solutions across SSHOC member organisations. FSD leads the project, and we published our first report in July.
The goal of the report was to find out what kind of interoperability problems there are for research data and metadata in SSHOC member organisations. We also tried to find metadata and data standards and formats that can be recommended for all organisations.
We interviewed 16 people from six research infrastructures and four fields: social sciences, language sciences, arts and humanities, and heritage sciences. The interview findings were supplemented with desk research on the number of records and data formats on the websites of repositories.
This was not the first time interoperability was assessed. The research infrastructures participating in SSHOC have each developed common practices and standards and FSD, for example, has been active in the development of metadata practices in CESSDA. However, this time there were more organisations from different research infrastructures involved.
Different data but similar problems
When interviewing the informants, it quickly became clear that the data formats used in the fields vary a great deal. In social sciences, the data are often in the form of a data matrix or text, while language sciences use a lot of text and voice data. Images are common in the humanities, and in heritage sciences, a dataset can consist of objects or 3D models, among others.
Despite the differences in data types, the interoperability problems were similar in all participating organisations. The most common of these included the conversion needs related to the use of proprietary file formats, loss of information caused by conversions and problems with format versions.
There is also a great deal of variety between and within the fields in terms of metadata standards. For instance, the DDI standard often used in social sciences is usually not extensive enough to describe data in the humanities and hardly suitable at all to record the metadata of heritage objects. The metadata needs of individual organisations also vary significantly.
The most common metadata interoperability problems had to do with differing interpretations of metadata concepts, incompatibility of older standards with the newer ones, and loss of information when converting from rich metadata format to a less rich one.
All in all, fewer interoperability problems were reported than we expected. One reason for this might be that organisations have reacted to the problems by developing their practices around them. On the other hand, we also observed differences between the organisations in their maturity levels in terms of how well they take interoperability into consideration.
One size does not fit all
The report confirmed what was noticed in previous interoperability projects; there is no single metadata standard or data format suitable for all fields and situations. A common metadata standard, for example, has to be very bare-bones to be suitable for all fields.
To ensure the F and I, findability and interoperability of the FAIR principles, we ended up recommending Dublin Core and a slightly modified DataCite as common metadata standards for all members. It is recommended that other standards used by the organisations can at least be converted into one of these standards.
Because the fields and organisations have different needs, we also made separate recommendations for metadata standards for each community and data formats by data type. The recommendations are based on the most used standards and formats in the communities.
Whichever standard and format organisations use, we recommend documenting it transparently. Many interoperability problems are avoided with thorough documentation.
Common solutions through cooperation
The first report of the third work package shows that developing common services is not always straightforward. Different communities not only have their own needs but also their established practices. However, the purpose of the project is not to fit everyone into the same mould but to create tools and services beneficial for everyone.
Cooperating and gaining perspective into the processes and challenges of other communities and organisations is useful also because the problems encountered are often similar, and someone might have already found a solution that works for everyone. The next objective of the Data and Metadata Interoperability Hub is to chart solutions to metadata and data interoperability problems.
Further information:
» Development Manager Mari Kleemola
» SSHOC D3.1 Report on SSHOC (meta)data interoperability problems
» SSHOC project website
Henri Ala-Lahti
Information Services Specialist
etunimi.sukunimi [at] tuni.fi
This blog entry is also available in Finnish:
SSHOC-hanke selvitti (meta)datan yhteensopivuusongelmia Tietoarkiston johdolla.